Training Predictions
- class datarobot.models.training_predictions.TrainingPredictionsIterator(client, path, limit=None)
Lazily fetches training predictions from DataRobot API in chunks of specified size and then iterates rows from responses as named tuples. Each row represents a training prediction computed for a dataset’s row. Each named tuple has the following structure:
Notes
Each
PredictionValue
dict contains these keys:- label
describes what this model output corresponds to. For regression projects, it is the name of the target feature. For classification and multiclass projects, it is a label from the target feature.
- value
the output of the prediction. For regression projects, it is the predicted value of the target. For classification and multiclass projects, it is the predicted probability that the row belongs to the class identified by the label.
Each
PredictionExplanations
dictionary contains these keys:- labelstring
describes what output was driven by this prediction explanation. For regression projects, it is the name of the target feature. For classification projects, it is the class whose probability increasing would correspond to a positive strength of this prediction explanation.
- featurestring
the name of the feature contributing to the prediction
- feature_valueobject
the value the feature took on for this row. The type corresponds to the feature (boolean, integer, number, string)
- strengthfloat
algorithm-specific explanation value attributed to feature in this row
ShapMetadata
dictionary contains these keys:- shap_remaining_totalfloat
The total of SHAP values for features beyond the
max_explanations
. This can be identically 0 in all rows, if max_explanations is greater than the number of features and thus all features are returned.- shap_base_valuefloat
the model’s average prediction over the training data. SHAP values are deviations from the base value.
- warningsdict or None
SHAP values calculation warnings (e.g. additivity check failures in XGBoost models). Schema described as
ShapWarnings
.
ShapWarnings
dictionary contains these keys:- mismatch_row_countint
the count of rows for which additivity check failed
- max_normalized_mismatchfloat
the maximal relative normalized mismatch value
Examples
import datarobot as dr # Fetch existing training predictions by their id training_predictions = dr.TrainingPredictions.get(project_id, prediction_id) # Iterate over predictions for row in training_predictions.iterate_rows() print(row.row_id, row.prediction)
- Attributes:
- row_idint
id of the record in original dataset for which training prediction is calculated
- partition_idstr or float
The ID of the data partition that the row belongs to. “0.0” corresponds to the validation partition or backtest 1.
- predictionfloat or str or list of str
The model’s prediction for this data row.
- prediction_valueslist of dictionaries
An array of dictionaries with a schema described as
PredictionValue
.- timestampstr or None
(New in version v2.11) an ISO string representing the time of the prediction in time series project; may be None for non-time series projects
- forecast_pointstr or None
(New in version v2.11) an ISO string representing the point in time used as a basis to generate the predictions in time series project; may be None for non-time series projects
- forecast_distancestr or None
(New in version v2.11) how many time steps are between the forecast point and the timestamp in time series project; None for non-time series projects
- series_idstr or None
(New in version v2.11) the id of the series in a multiseries project; may be NaN for single series projects; None for non-time series projects
- prediction_explanationslist of dict or None
(New in version v2.21) The prediction explanations for each feature. The total elements in the array are bounded by
max_explanations
and feature count. Only present if prediction explanations were requested. Schema described asPredictionExplanations
.- shap_metadatadict or None
(New in version v2.21) The additional information necessary to understand SHAP based prediction explanations. Only present if explanation_algorithm equals datarobot.enums.EXPLANATIONS_ALGORITHM.SHAP was added in compute request. Schema described as
ShapMetadata
.
- class datarobot.models.training_predictions.TrainingPredictions(project_id, prediction_id, model_id=None, data_subset=None, explanation_algorithm=None, max_explanations=None, shap_warnings=None)
Represents training predictions metadata and provides access to prediction results.
Notes
Each element in
shap_warnings
has the following schema:- partition_namestr
the partition used for the prediction record.
- valueobject
the warnings related to this partition.
The objects in
value
are:- mismatch_row_countint
the count of rows for which additivity check failed.
- max_normalized_mismatchfloat
the maximal relative normalized mismatch value.
Examples
Compute training predictions for a model on the whole dataset
import datarobot as dr # Request calculation of training predictions training_predictions_job = model.request_training_predictions(dr.enums.DATA_SUBSET.ALL) training_predictions = training_predictions_job.get_result_when_complete() print('Training predictions {} are ready'.format(training_predictions.prediction_id)) # Iterate over actual predictions for row in training_predictions.iterate_rows(): print(row.row_id, row.partition_id, row.prediction)
List all training predictions for a project
import datarobot as dr # Fetch all training predictions for a project all_training_predictions = dr.TrainingPredictions.list(project_id) # Inspect all calculated training predictions for training_predictions in all_training_predictions: print( 'Prediction {} is made for data subset "{}"'.format( training_predictions.prediction_id, training_predictions.data_subset, ) )
Retrieve training predictions by id
import datarobot as dr # Getting training predictions by id training_predictions = dr.TrainingPredictions.get(project_id, prediction_id) # Iterate over actual predictions for row in training_predictions.iterate_rows(): print(row.row_id, row.partition_id, row.prediction)
- Attributes:
- project_idstr
id of the project the model belongs to
- model_idstr
id of the model
- prediction_idstr
id of generated predictions
- data_subsetdatarobot.enums.DATA_SUBSET
data set definition used to build predictions. Choices are:
- datarobot.enums.DATA_SUBSET.ALL
for all data available. Not valid for models in datetime partitioned projects.
- datarobot.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT
for all data except training set. Not valid for models in datetime partitioned projects.
- datarobot.enums.DATA_SUBSET.HOLDOUT
for holdout data set only.
- datarobot.enums.DATA_SUBSET.ALL_BACKTESTS
for downloading the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
- explanation_algorithmdatarobot.enums.EXPLANATIONS_ALGORITHM
(New in version v2.21) Optional. If set to shap, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
- max_explanationsint
(New in version v2.21) The number of top contributors that are included in prediction explanations. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns.
- shap_warningslist
(New in version v2.21) Will be present if
explanation_algorithm
was set to datarobot.enums.EXPLANATIONS_ALGORITHM.SHAP and there were additivity failures during SHAP values calculation.
- classmethod list(project_id)
Fetch all the computed training predictions for a project.
- Parameters:
- project_idstr
id of the project
- Returns:
- A list of
TrainingPredictions
objects
- A list of
- classmethod get(project_id, prediction_id)
Retrieve training predictions on a specified data set.
- Parameters:
- project_idstr
id of the project the model belongs to
- prediction_idstr
id of the prediction set
- Returns:
TrainingPredictions
object which is ready to operate with specified predictions
- iterate_rows(batch_size=None)
Retrieve training prediction rows as an iterator.
- Parameters:
- batch_sizeint, optional
maximum number of training prediction rows to fetch per request
- Returns:
- iterator
TrainingPredictionsIterator
an iterator which yields named tuples representing training prediction rows
- iterator
- get_all_as_dataframe(class_prefix='class_', serializer='json')
Retrieve all training prediction rows and return them as a pandas.DataFrame.
- Returned dataframe has the following structure:
row_id : row id from the original dataset
prediction : the model’s prediction for this row
class_<label> : the probability that the target is this class (only appears for classification and multiclass projects)
timestamp : the time of the prediction (only appears for out of time validation or time series projects)
forecast_point : the point in time used as a basis to generate the predictions (only appears for time series projects)
forecast_distance : how many time steps are between timestamp and forecast_point (only appears for time series projects)
series_id : he id of the series in a multiseries project or None for a single series project (only appears for time series projects)
- Parameters:
- class_prefixstr, optional
The prefix to append to labels in the final dataframe. Default is
class_
(e.g., apple -> class_apple)- serializerstr, optional
Serializer to use for the download. Options:
json
(default) orcsv
.
- Returns:
- dataframe: pandas.DataFrame
- download_to_csv(filename, encoding='utf-8', serializer='json')
Save training prediction rows into CSV file.
- Parameters:
- filenamestr or file object
path or file object to save training prediction rows
- encodingstring, optional
A string representing the encoding to use in the output file, defaults to ‘utf-8’
- serializerstr, optional
Serializer to use for the download. Options:
json
(default) orcsv
.