Prediction Dataset

class datarobot.models.PredictionDataset(project_id, id, name, created, num_rows, num_columns, forecast_point=None, predictions_start_date=None, predictions_end_date=None, relax_known_in_advance_features_check=None, data_quality_warnings=None, forecast_point_range=None, data_start_date=None, data_end_date=None, max_forecast_date=None, actual_value_column=None, detected_actual_value_columns=None, contains_target_values=None, secondary_datasets_config_id=None)

A dataset uploaded to make predictions

Typically created via project.upload_dataset

Attributes:
idstr

the id of the dataset

project_idstr

the id of the project the dataset belongs to

createdstr

the time the dataset was created

namestr

the name of the dataset

num_rowsint

the number of rows in the dataset

num_columnsint

the number of columns in the dataset

forecast_pointdatetime.datetime or None

For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series predictions documentation for more information.

predictions_start_datedatetime.datetime or None, optional

For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.

predictions_end_datedatetime.datetime or None, optional

For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.

relax_known_in_advance_features_checkbool, optional

(New in version v2.15) For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

data_quality_warningsdict, optional

(New in version v2.15) A dictionary that contains available warnings about potential problems in this prediction dataset. Available warnings include:

has_kia_missing_values_in_forecast_windowbool

Applicable for time series projects. If True, known in advance features have missing values in forecast window which may decrease prediction accuracy.

insufficient_rows_for_evaluating_modelsbool

Applicable for datasets which are used as external test sets. If True, there is not enough rows in dataset to calculate insights.

single_class_actual_value_columnbool

Applicable for datasets which are used as external test sets. If True, actual value column has only one class and such insights as ROC curve can not be calculated. Only applies for binary classification projects or unsupervised projects.

forecast_point_rangelist[datetime.datetime] or None, optional

(New in version v2.20) For time series projects only. Specifies the range of dates available for use as a forecast point.

data_start_datedatetime.datetime or None, optional

(New in version v2.20) For time series projects only. The minimum primary date of this prediction dataset.

data_end_datedatetime.datetime or None, optional

(New in version v2.20) For time series projects only. The maximum primary date of this prediction dataset.

max_forecast_datedatetime.datetime or None, optional

(New in version v2.20) For time series projects only. The maximum forecast date of this prediction dataset.

actual_value_columnstring, optional

(New in version v2.21) Optional, only available for unsupervised projects, in case dataset was uploaded with actual value column specified. Name of the column which will be used to calculate the classification metrics and insights.

detected_actual_value_columnslist of dict, optional

(New in version v2.21) For unsupervised projects only, list of detected actual value columns information containing missing count and name for each column.

contains_target_valuesbool, optional

(New in version v2.21) Only for supervised projects. If True, dataset contains target values and can be used to calculate the classification metrics and insights.

secondary_datasets_config_id: string or None, optional

(New in version v2.23) The Id of the alternative secondary dataset config to use during prediction for Feature discovery project.

classmethod get(project_id, dataset_id)

Retrieve information about a dataset uploaded for predictions

Parameters:
project_id:

the id of the project to query

dataset_id:

the id of the dataset to retrieve

Returns:
dataset: PredictionDataset

A dataset uploaded to make predictions

Return type:

PredictionDataset

delete()

Delete a dataset uploaded for predictions

Will also delete predictions made using this dataset and cancel any predict jobs using this dataset.

Return type:

None