Training Predictions API

class datarobot.models.training_predictions.TrainingPredictionsIterator(client, path, limit=None)

Lazily fetches training predictions from DataRobot API in chunks of specified size and then iterates rows from responses as named tuples. Each row represents a training prediction computed for a dataset’s row. Each named tuple has the following structure:

Notes

Each PredictionValue dict contains these keys:

label
describes what this model output corresponds to. For regression projects, it is the name of the target feature. For classification and multiclass projects, it is a label from the target feature.
value
the output of the prediction. For regression projects, it is the predicted value of the target. For classification and multiclass projects, it is the predicted probability that the row belongs to the class identified by the label.

Examples

import datarobot as dr

# Fetch existing training predictions by their id
training_predictions = dr.TrainingPredictions.get(project_id, prediction_id)

# Iterate over predictions
for row in training_predictions.iterate_rows()
    print(row.row_id, row.prediction)

Attributes

row_id (int) id of the record in original dataset for which training prediction is calculated
partition_id (str or float) id of the data partition that the row belongs to
prediction (float) the model’s prediction for this data row
prediction_values (list of dictionaries) an array of dictionaries with a schema described as PredictionValue
class datarobot.models.training_predictions.TrainingPredictions(project_id, prediction_id, model_id=None, data_subset=None)

Represents training predictions metadata and provides access to prediction results.

Examples

Compute training predictions for a model on the whole dataset

import datarobot as dr

# Request calculation of training predictions
training_predictions_job = model.request_training_predictions(dr.enums.DATA_SUBSET.ALL)
training_predictions = training_predictions_job.get_result_when_complete()
print('Training predictions {} are ready'.format(training_predictions.prediction_id))

# Iterate over actual predictions
for row in training_predictions.iterate_rows():
    print(row.row_id, row.partition_id, row.prediction)

List all training predictions for a project

import datarobot as dr

# Fetch all training predictions for a project
all_training_predictions = dr.TrainingPredictions.list(project_id)

# Inspect all calculated training predictions
for training_predictions in all_training_predictions:
    print(
        'Prediction {} is made for data subset "{}"'.format(
            training_predictions.prediction_id,
            training_predictions.data_subset,
        )
    )

Retrieve training predictions by id

import datarobot as dr

# Getting training predictions by id
training_predictions = dr.TrainingPredictions.get(project_id, prediction_id)

# Iterate over actual predictions
for row in training_predictions.iterate_rows():
    print(row.row_id, row.partition_id, row.prediction)

Attributes

project_id (str) id of the project the model belongs to
model_id (str) id of the model
prediction_id (str) id of generated predictions
classmethod list(project_id)

Fetch all the computed training predictions for a project.

Parameters:

project_id : str

id of the project

Returns:

A list of TrainingPredictions objects

classmethod get(project_id, prediction_id)

Retrieve training predictions on a specified data set.

Parameters:

project_id : str

id of the project the model belongs to

prediction_id : str

id of the prediction set

Returns:

TrainingPredictions object which is ready to operate with specified predictions

iterate_rows(batch_size=None)

Retrieve training prediction rows as an iterator.

Parameters:

batch_size : int, optional

maximum number of training prediction rows to fetch per request

Returns:

iterator : TrainingPredictionsIterator

an iterator which yields named tuples representing training prediction rows

get_all_as_dataframe(class_prefix='class_')

Retrieve all training prediction rows and return them as a pandas.DataFrame.

Returned dataframe has the following structure:
  • row_id : row id from the original dataset
  • prediction : the model’s prediction for this row
  • class_<label> : the probability that the target is this class (only appears for classification and multiclass projects)
Parameters:

class_prefix : str, optional

The prefix to append to labels in the final dataframe. Default is class_ (e.g., apple -> class_apple)

Returns:

dataframe: pandas.DataFrame

download_to_csv(filename, encoding='utf-8')

Save training prediction rows into CSV file.

Parameters:

filename : str or file object

path or file object to save training prediction rows

encoding : string, optional

A string representing the encoding to use in the output file, defaults to ‘utf-8’