Prediction dataset

class datarobot.models.PredictionDataset

Bases: APIObject

A dataset uploaded to make predictions

Typically created via project.upload_dataset

Variables:
  • id (str) – the id of the dataset

  • project_id (str) – the id of the project the dataset belongs to

  • created (str) – the time the dataset was created

  • name (str) – the name of the dataset

  • num_rows (int) – the number of rows in the dataset

  • num_columns (int) – the number of columns in the dataset

  • forecast_point (datetime.datetime or None) – For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series predictions documentation for more information.

  • predictions_start_date (datetime.datetime or None, optional) – For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.

  • predictions_end_date (datetime.datetime or None, optional) – For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.

  • relax_known_in_advance_features_check (Optional[bool]) – (New in version v2.15) For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

  • data_quality_warnings (dict, optional) –

    (New in version v2.15) A dictionary that contains available warnings about potential problems in this prediction dataset. Available warnings include:

    • has_kia_missing_values_in_forecast_window (bool)

      Applicable for time series projects. If True, known in advance features have missing values in forecast window which may decrease prediction accuracy.

    • insufficient_rows_for_evaluating_models (bool)

      Applicable for datasets which are used as external test sets. If True, there is not enough rows in dataset to calculate insights.

    • single_class_actual_value_column (bool)

      Applicable for datasets which are used as external test sets. If True, actual value column has only one class and such insights as ROC curve can not be calculated. Only applies for binary classification projects or unsupervised projects.

  • forecast_point_range (list[datetime.datetime] or None, optional) – (New in version v2.20) For time series projects only. Specifies the range of dates available for use as a forecast point.

  • data_start_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The minimum primary date of this prediction dataset.

  • data_end_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The maximum primary date of this prediction dataset.

  • max_forecast_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The maximum forecast date of this prediction dataset.

  • actual_value_column (string, optional) – (New in version v2.21) Optional, only available for unsupervised projects, in case dataset was uploaded with actual value column specified. Name of the column which will be used to calculate the classification metrics and insights.

  • detected_actual_value_columns (list of dict, optional) – (New in version v2.21) For unsupervised projects only, list of detected actual value columns information containing missing count and name for each column.

  • contains_target_values (Optional[bool]) – (New in version v2.21) Only for supervised projects. If True, dataset contains target values and can be used to calculate the classification metrics and insights.

  • secondary_datasets_config_id (string or None, optional) – (New in version v2.23) The Id of the alternative secondary dataset config to use during prediction for Feature discovery project.

classmethod get(project_id, dataset_id)

Retrieve information about a dataset uploaded for predictions

Parameters:
  • project_id (str) – the id of the project to query

  • dataset_id (str) – the id of the dataset to retrieve

Returns:

dataset – A dataset uploaded to make predictions

Return type:

PredictionDataset

delete()

Delete a dataset uploaded for predictions

Will also delete predictions made using this dataset and cancel any predict jobs using this dataset.

Return type:

None