Prediction Explanations¶
To compute prediction explanations you need to have feature impact computed for a model, and predictions for an uploaded dataset computed with a selected model.
Computing prediction explanations is a resource-intensive task, but you can configure it with maximum explanations per row and prediction value thresholds to speed up the process.
Quick Reference¶
import datarobot as dr
# Get project
my_projects = dr.Project.list()
project = my_projects[0]
# Get model
models = project.get_models()
model = models[0]
# Compute feature impact
feature_impacts = model.get_or_request_feature_impact()
# Upload dataset
dataset = project.upload_dataset('./data_to_predict.csv')
# Compute predictions
predict_job = model.request_predictions(dataset.id)
predict_job.wait_for_completion()
# Initialize prediction explanations
pei_job = dr.PredictionExplanationsInitialization.create(project.id, model.id)
pei_job.wait_for_completion()
# Compute prediction explanations with default parameters
pe_job = dr.PredictionExplanations.create(project.id, model.id, dataset.id)
pe = pe_job.get_result_when_complete()
# Iterate through predictions with prediction explanations
for row in pe.get_rows():
print(row.prediction)
print(row.prediction_explanations)
# download to a CSV file
pe.download_to_csv('prediction_explanations.csv')
List Prediction Explanations¶
You can use the PredictionExplanations.list()
method to return a list of prediction
explanations computed for a project’s models:
import datarobot as dr
prediction_explanations = dr.PredictionExplanations.list('58591727100d2b57196701b3')
print(prediction_explanations)
>>> [PredictionExplanations(id=585967e7100d2b6afc93b13b,
project_id=58591727100d2b57196701b3,
model_id=585932c5100d2b7c298b8acf),
PredictionExplanations(id=58596bc2100d2b639329eae4,
project_id=58591727100d2b57196701b3,
model_id=585932c5100d2b7c298b8ac5),
PredictionExplanations(id=58763db4100d2b66759cc187,
project_id=58591727100d2b57196701b3,
model_id=585932c5100d2b7c298b8ac5),
...]
pe = prediction_explanations[0]
pe.project_id
>>> u'58591727100d2b57196701b3'
pe.model_id
>>> u'585932c5100d2b7c298b8acf'
You can pass following parameters to filter the result:
model_id
– str, used to filter returned prediction explanations by model_id.limit
– int, limit for number of items returned, default: no limit.offset
– int, number of items to skip, default: 0.
List Prediction Explanations Example:
project_id = '58591727100d2b57196701b3'
model_id = '585932c5100d2b7c298b8acf'
dr.PredictionExplanations.list(project_id, model_id=model_id, limit=20, offset=100)
Initialize Prediction Explanations¶
In order to compute prediction explanations you have to initialize it for a particular model.
dr.PredictionExplanationsInitialization.create(project_id, model_id)
Compute Prediction Explanations¶
If all prerequisites are in place, you can compute prediction explanations in the following way:
import datarobot as dr
project_id = '5506fcd38bd88f5953219da0'
model_id = '5506fcd98bd88f1641a720a3'
dataset_id = '5506fcd98bd88a8142b725c8'
pe_job = dr.PredictionExplanations.create(project_id, model_id, dataset_id,
max_explanations=2, threshold_low=0.2, threshold_high=0.8)
pe = pe_job.get_result_when_complete()
Where:
max_explanations
are the maximum number of prediction explanations to compute for each row.threshold_low
andthreshold_high
are thresholds for the value of the prediction of the row. Prediction explanations will be computed for a row if the row’s prediction value is higher thanthreshold_high
or lower thanthreshold_low
. If no thresholds are specified, prediction explanations will be computed for all rows.
Retrieving Prediction Explanations¶
You have three options for retrieving prediction explanations.
Note
PredictionExplanations.get_all_as_dataframe()
and
PredictionExplanations.download_to_csv()
reformat
prediction explanations to match the schema of CSV file downloaded from UI (RowId,
Prediction, Explanation 1 Strength, Explanation 1 Feature, Explanation 1 Value, …,
Explanation N Strength, Explanation N Feature, Explanation N Value)
Get prediction explanations rows one by one as
PredictionExplanationsRow
objects:
import datarobot as dr
project_id = '5506fcd38bd88f5953219da0'
prediction_explanations_id = '5506fcd98bd88f1641a720a3'
pe = dr.PredictionExplanations.get(project_id, prediction_explanations_id)
for row in pe.get_rows():
print(row.prediction_explanations)
Get all rows as pandas.DataFrame
:
import datarobot as dr
project_id = '5506fcd38bd88f5953219da0'
prediction_explanations_id = '5506fcd98bd88f1641a720a3'
pe = dr.PredictionExplanations.get(project_id, prediction_explanations_id)
prediction_explanations_df = pe.get_all_as_dataframe()
Download all rows to a file as CSV document:
import datarobot as dr
project_id = '5506fcd38bd88f5953219da0'
prediction_explanations_id = '5506fcd98bd88f1641a720a3'
pe = dr.PredictionExplanations.get(project_id, prediction_explanations_id)
pe.download_to_csv('prediction_explanations.csv')
Adjusted Predictions In Prediction Explanations¶
In some projects such as insurance projects, the prediction adjusted by exposure is more useful compared with raw prediction. For example, the raw prediction (e.g. claim counts) is divided by exposure (e.g. time) in the project with exposure column. The adjusted prediction provides insights with regard to the predicted claim counts per unit of time. To include that information, set exclude_adjusted_predictions to False in correspondent method calls.
import datarobot as dr
project_id = '5506fcd38bd88f5953219da0'
prediction_explanations_id = '5506fcd98bd88f1641a720a3'
pe = dr.PredictionExplanations.get(project_id, prediction_explanations_id)
pe.download_to_csv('prediction_explanations.csv', exclude_adjusted_predictions=False)
prediction_explanations_df = pe.get_all_as_dataframe(exclude_adjusted_predictions=False)
Multiclass/Clustering Prediction Explanation Modes¶
When calculating prediction explanations for the multiclass or clustering model you need to specify
which classes should be explained in each row. By default we only explain the predicted class but
it can be set with the mode parameter of PredictionExplanations.create
import datarobot as dr
project_id = '5506fcd38bd88f5953219da0'
model_id = '5506fcd98bd88f1641a720a3'
dataset_id = '5506fcd98bd88a8142b725c8'
# Explain predicted and second-best class results in each row
pe_job = dr.PredictionExplanations.create(project_id, model_id, dataset_id,
mode=dr.models.TopPredictionsMode(2))
pe = pe_job.get_result_when_complete()
# Explain results for classes "setosa" and "versicolor" in each row
pe_job = dr.PredictionExplanations.create(project_id, model_id, dataset_id,
mode=dr.models.ClassListMode(["setosa", "versicolor"]))
pe = pe_job.get_result_when_complete()
SHAP based prediction explanations¶
You can request SHAP based prediction explanations using previously uploaded scoring dataset for models that support SHAP. Unlike for XEMP prediction explanations you do not need to have feature impact computed for a model, and predictions for an uploaded dataset.
See datarobot.models.ShapMatrix.create()
reference for a description of the types of
parameters that can be passed in.
import datarobot as dr
project_id = '5ea6d3354cfad121cf33a5e1'
model_id = '5ea6d38b4cfad121cf33a60d'
project = dr.Project.get(project_id)
model = dr.Model.get(project=project_id, model_id=model_id)
# check if model supports SHAP
model_capabilities = model.get_supported_capabilities()
print(model_capabilities.get('supportsShap'))
>>> True
# upload dataset to generate prediction explanations
dataset_from_path = project.upload_dataset('./data_to_predict.csv')
shap_matrix_job = ShapMatrix.create(project_id=project_id, model_id=model_id, dataset_id=dataset_from_path.id)
shap_matrix_job
>>> Job(shapMatrix, status=inprogress)
# wait for job to finish
shap_matrix = shap_matrix_job.get_result_when_complete()
shap_matrix
>>> ShapMatrix(id='5ea84b624cfad1361c53f65d', project_id='5ea6d3354cfad121cf33a5e1', model_id='5ea6d38b4cfad121cf33a60d', dataset_id='5ea84b464cfad1361c53f655')
# retrieve SHAP matrix as pandas.DataFrame
df = shap_matrix.get_as_dataframe()
# list as available SHAP matrices for a project
shap_matrices = dr.ShapMatrix.list(project_id)
shap_matrices
>>> [ShapMatrix(id='5ea84b624cfad1361c53f65d', project_id='5ea6d3354cfad121cf33a5e1', model_id='5ea6d38b4cfad121cf33a60d', dataset_id='5ea84b464cfad1361c53f655')]
shap_matrix = shap_matrices[0]
# retrieve SHAP matrix as pandas.DataFrame
df = shap_matrix.get_as_dataframe()