Prediction Dataset
- class datarobot.models.PredictionDataset(project_id, id, name, created, num_rows, num_columns, forecast_point=None, predictions_start_date=None, predictions_end_date=None, relax_known_in_advance_features_check=None, data_quality_warnings=None, forecast_point_range=None, data_start_date=None, data_end_date=None, max_forecast_date=None, actual_value_column=None, detected_actual_value_columns=None, contains_target_values=None, secondary_datasets_config_id=None)
A dataset uploaded to make predictions
Typically created via project.upload_dataset
- Attributes:
- idstr
the id of the dataset
- project_idstr
the id of the project the dataset belongs to
- createdstr
the time the dataset was created
- namestr
the name of the dataset
- num_rowsint
the number of rows in the dataset
- num_columnsint
the number of columns in the dataset
- forecast_pointdatetime.datetime or None
For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series predictions documentation for more information.
- predictions_start_datedatetime.datetime or None, optional
For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_datedatetime.datetime or None, optional
For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.- relax_known_in_advance_features_checkbool, optional
(New in version v2.15) For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- data_quality_warningsdict, optional
(New in version v2.15) A dictionary that contains available warnings about potential problems in this prediction dataset. Available warnings include:
- has_kia_missing_values_in_forecast_windowbool
Applicable for time series projects. If True, known in advance features have missing values in forecast window which may decrease prediction accuracy.
- insufficient_rows_for_evaluating_modelsbool
Applicable for datasets which are used as external test sets. If True, there is not enough rows in dataset to calculate insights.
- single_class_actual_value_columnbool
Applicable for datasets which are used as external test sets. If True, actual value column has only one class and such insights as ROC curve can not be calculated. Only applies for binary classification projects or unsupervised projects.
- forecast_point_rangelist[datetime.datetime] or None, optional
(New in version v2.20) For time series projects only. Specifies the range of dates available for use as a forecast point.
- data_start_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The minimum primary date of this prediction dataset.
- data_end_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The maximum primary date of this prediction dataset.
- max_forecast_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The maximum forecast date of this prediction dataset.
- actual_value_columnstring, optional
(New in version v2.21) Optional, only available for unsupervised projects, in case dataset was uploaded with actual value column specified. Name of the column which will be used to calculate the classification metrics and insights.
- detected_actual_value_columnslist of dict, optional
(New in version v2.21) For unsupervised projects only, list of detected actual value columns information containing missing count and name for each column.
- contains_target_valuesbool, optional
(New in version v2.21) Only for supervised projects. If True, dataset contains target values and can be used to calculate the classification metrics and insights.
- secondary_datasets_config_id: string or None, optional
(New in version v2.23) The Id of the alternative secondary dataset config to use during prediction for Feature discovery project.
- classmethod get(project_id, dataset_id)
Retrieve information about a dataset uploaded for predictions
- Parameters:
- project_id:
the id of the project to query
- dataset_id:
the id of the dataset to retrieve
- Returns:
- dataset: PredictionDataset
A dataset uploaded to make predictions
- Return type:
- delete()
Delete a dataset uploaded for predictions
Will also delete predictions made using this dataset and cancel any predict jobs using this dataset.
- Return type:
None