Predictions
Making predictions is an asynchronous process. This means that when starting
predictions with Model.request_predictions()
you will receive a PredictJob
object in return for tracking
the process responsible for fulfilling your request.
You can use this object to get information about the predictions generation process before it has finished and be rerouted to the predictions themselves when the process is finished. To do so, use the PredictJob class.
Start making predictions
Before actually requesting predictions, you should upload the dataset you wish to predict via
Project.upload_dataset
. Previously uploaded datasets can be viewed using Project.get_datasets
.
When uploading the dataset you can provide the path to a local file, a file object, raw file content,
a pandas.DataFrame
object, or the URL to a publicly available dataset.
To start predicting on new data using a finished model, use Model.request_predictions()
.
It creates a new prediction generation process and returns a PredictJob
object tracking this process.
With it, you can monitor an existing PredictJob
and retrieve generated predictions when the corresponding
PredictJob
is finished.
import datarobot as dr
project_id = '5506fcd38bd88f5953219da0'
model_id = '5506fcd98bd88f1641a720a3'
project = dr.Project.get(project_id)
model = dr.Model.get(
project=project_id,
model_id=model_id,
)
# As of v3.0, in addition to passing a ``dataset_id``, you can pass in a ``dataset``, ``file``, ``file_path`` or
# ``dataframe`` to `Model.request_predictions`.
predict_job = model.request_predictions(file_path='./data_to_predict.csv')
# Alternative version uploading the dataset from a local path and passing it by its id
dataset_from_path = project.upload_dataset('./data_to_predict.csv')
predict_job = model.request_predictions(dataset_id=dataset_from_path.id)
# Alternative version: upload the dataset as a file object and pass it by using its dataset id
with open('./data_to_predict.csv') as data_to_predict:
dataset_from_file = project.upload_dataset(data_to_predict)
predict_job = model.request_predictions(dataset_id=dataset_from_file.id) # OR predict_job = model.request_predictions(dataset_id=dataset_from_file.id)
Listing Predictions
Use Predictions.list()
to return a list of predictions generated on a project:
import datarobot as dr
predictions = dr.Predictions.list('58591727100d2b57196701b3')
print(predictions)
>>>[Predictions(prediction_id='5b6b163eca36c0108fc5d411',
project_id='5b61bd68ca36c04aed8aab7f',
model_id='5b61bd7aca36c05744846630',
dataset_id='5b6b1632ca36c03b5875e6a0'),
Predictions(prediction_id='5b6b2315ca36c0108fc5d41b',
project_id='5b61bd68ca36c04aed8aab7f',
model_id='5b61bd7aca36c0574484662e',
dataset_id='5b6b1632ca36c03b5875e6a0'),
Predictions(prediction_id='5b6b23b7ca36c0108fc5d422',
project_id='5b61bd68ca36c04aed8aab7f',
model_id='5b61bd7aca36c0574484662e',
dataset_id='55b6b1632ca36c03b5875e6a0')
]
You can pass following parameters to filter the result:
model_id
: A string used to filter returned predictions bymodel_id
.dataset_id
A string used to filter returned predictions bydataset_id
.
Get an existing PredictJob
Use PredictJob.get
method to retrieve an existing job. This will give you
a PredictJob
matching the latest status of the job if it has not completed.
If predictions have finished building, PredictJob.get
will raise a PendingJobFinished
exception.
import time
import datarobot as dr
predict_job = dr.PredictJob.get(
project_id=project_id,
predict_job_id=predict_job_id,
)
predict_job.status
>>> 'queue'
# wait for generation of predictions (in a very inefficient way)
time.sleep(10 * 60)
predict_job = dr.PredictJob.get(
project_id=project_id,
predict_job_id=predict_job_id,
)
>>> dr.errors.PendingJobFinished
# now the predictions are finished
predictions = dr.PredictJob.get_predictions(
project_id=project.id,
predict_job_id=predict_job_id,
)
Get generated predictions
After predictions are generated, use PredictJob.get_predictions
to get newly-generated predictions.
If predictions have not yet been finished, it will raise a JobNotFinished
exception.
import datarobot as dr
predictions = dr.PredictJob.get_predictions(
project_id=project.id,
predict_job_id=predict_job_id,
)
Retrieve results
If you just want to get generated predictions from a PredictJob
, use PredictJob.get_result_when_complete
.
This function polls the status of the predictions generation process until it has finished, and
then will return predictions.
dataset = project.get_datasets()[0]
predict_job = model.request_predictions(dataset.id)
predictions = predict_job.get_result_when_complete()
Get previously generated predictions
If you don’t have a PredictJob
, there are two more ways to retrieve predictions from the
Predictions
interface:
Get all prediction rows as a
pandas.DataFrame
object:
import datarobot as dr
preds = dr.Predictions.get("5b61bd68ca36c04aed8aab7f", prediction_id="5b6b163eca36c0108fc5d411")
df = preds.get_all_as_dataframe()
df_with_serializer = preds.get_all_as_dataframe(serializer='csv')
Download all prediction rows to a file as a CSV:
import datarobot as dr
preds = dr.Predictions.get("5b61bd68ca36c04aed8aab7f", prediction_id="5b6b163eca36c0108fc5d411")
preds.download_to_csv('predictions.csv')
preds.download_to_csv('predictions_with_serializer.csv', serializer='csv')
Training predictions
The training predictions interface allows you to compute and retrieve out-of-sample predictions for a model using the original project dataset. The predictions can be computed for all the rows, or restricted to validation or holdout data. As the predictions generated will be out-of-sample, they can be expected to have different results than if the project dataset were re-uploaded as a prediction dataset.
Quick reference
Training predictions generation is an asynchronous process. This means that when starting
predictions with datarobot.models.Model.request_training_predictions()
you will receive back a
datarobot.models.TrainingPredictionsJob
for tracking the process responsible for fulfilling your request.
Actual predictions may be obtained with the help of a
datarobot.models.training_predictions.TrainingPredictions
object returned as the result of
the training predictions job.
There are three ways to retrieve training predictions:
Iterate prediction rows one by one as named tuples:
import datarobot as dr
# Calculate new training predictions on all dataset
training_predictions_job = model.request_training_predictions(dr.enums.DATA_SUBSET.ALL)
training_predictions = training_predictions_job.get_result_when_complete()
# Fetch rows from API and print them
for prediction in training_predictions.iterate_rows(batch_size=250):
print(prediction.row_id, prediction.prediction)
Get all prediction rows as a
pandas.DataFrame
object:
import datarobot from dr
# Calculate new training predictions on holdout partition of dataset
training_predictions_job = model.request_training_predictions(dr.enums.DATA_SUBSET.HOLDOUT)
training_predictions = training_predictions_job.get_result_when_complete()
# Fetch training predictions as data frame
dataframe = training_predictions.get_all_as_dataframe()
Download all prediction rows to a file as a CSV document:
import datarobot from dr
# Calculate new training predictions on all dataset
training_predictions_job = model.request_training_predictions(dr.enums.DATA_SUBSET.ALL)
training_predictions = training_predictions_job.get_result_when_complete()
# Fetch training predictions and save them to file
training_predictions.download_to_csv('my-training-predictions.csv')