Prediction explanations

class datarobot.PredictionExplanationsInitialization

Represents a prediction explanations initialization of a model.

Variables:
  • project_id (str) – id of the project the model belongs to

  • model_id (str) – id of the model the prediction explanations initialization is for

  • prediction_explanations_sample (list of dict) – a small sample of prediction explanations that could be generated for the model

classmethod get(project_id, model_id)

Retrieve the prediction explanations initialization for a model.

Prediction explanations initializations are a prerequisite for computing prediction explanations, and include a sample what the computed prediction explanations for a prediction dataset would look like.

Parameters:
  • project_id (str) – id of the project the model belongs to

  • model_id (str) – id of the model the prediction explanations initialization is for

Returns:

prediction_explanations_initialization – The queried instance.

Return type:

PredictionExplanationsInitialization

Raises:

ClientError – If the project or model does not exist or the initialization has not been computed.

classmethod create(project_id, model_id)

Create a prediction explanations initialization for the specified model.

Parameters:
  • project_id (str) – id of the project the model belongs to

  • model_id (str) – id of the model for which initialization is requested

Returns:

job – an instance of created async job

Return type:

Job

delete()

Delete this prediction explanations initialization.

class datarobot.PredictionExplanations

Represents prediction explanations metadata and provides access to computation results.

Examples

prediction_explanations = dr.PredictionExplanations.get(project_id, explanations_id)
for row in prediction_explanations.get_rows():
    print(row)  # row is an instance of PredictionExplanationsRow
Variables:
  • id (str) – id of the record and prediction explanations computation result

  • project_id (str) – id of the project the model belongs to

  • model_id (str) – id of the model the prediction explanations are for

  • dataset_id (str) – id of the prediction dataset prediction explanations were computed for

  • max_explanations (int) – maximum number of prediction explanations to supply per row of the dataset

  • threshold_low (float) – the lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset

  • threshold_high (float) – the high threshold, above which a prediction must score in order for prediction explanations to be computed for a row in the dataset

  • num_columns (int) – the number of columns prediction explanations were computed for

  • finish_time (float) – timestamp referencing when computation for these prediction explanations finished

  • prediction_explanations_location (str) – where to retrieve the prediction explanations

  • source (str) – For OTV/TS in-training predictions. Holds the portion of the training dataset used to generate predictions.

classmethod get(project_id, prediction_explanations_id)

Retrieve a specific prediction explanations metadata.

Parameters:
  • project_id (str) – id of the project the explanations belong to

  • prediction_explanations_id (str) – id of the prediction explanations

Returns:

prediction_explanations – The queried instance.

Return type:

PredictionExplanations

classmethod create(project_id, model_id, dataset_id, max_explanations=None, threshold_low=None, threshold_high=None, mode=None)

Create prediction explanations for the specified dataset.

In order to create PredictionExplanations for a particular model and dataset, you must first:

  • Compute feature impact for the model via datarobot.Model.get_feature_impact()

  • Compute a PredictionExplanationsInitialization for the model via datarobot.PredictionExplanationsInitialization.create(project_id, model_id)

  • Compute predictions for the model and dataset via datarobot.Model.request_predictions(dataset_id)

threshold_high and threshold_low are optional filters applied to speed up computation. When at least one is specified, only the selected outlier rows will have prediction explanations computed. Rows are considered to be outliers if their predicted value (in case of regression projects) or probability of being the positive class (in case of classification projects) is less than threshold_low or greater than thresholdHigh. If neither is specified, prediction explanations will be computed for all rows.

Parameters:
  • project_id (str) – id of the project the model belongs to

  • model_id (str) – id of the model for which prediction explanations are requested

  • dataset_id (str) – id of the prediction dataset for which prediction explanations are requested

  • threshold_low (Optional[float]) – the lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset. If neither threshold_high nor threshold_low is specified, prediction explanations will be computed for all rows.

  • threshold_high (Optional[float]) – the high threshold, above which a prediction must score in order for prediction explanations to be computed. If neither threshold_high nor threshold_low is specified, prediction explanations will be computed for all rows.

  • max_explanations (Optional[int]) – the maximum number of prediction explanations to supply per row of the dataset, default: 3.

  • mode (PredictionExplanationsMode, optional) – mode of calculation for multiclass models, if not specified - server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).

Returns:

job – an instance of created async job

Return type:

Job

classmethod create_on_training_data(project_id, model_id, dataset_id, max_explanations=None, threshold_low=None, threshold_high=None, mode=None, datetime_prediction_partition=None)

Create prediction explanations for the the dataset used to train the model. This can be retrieved by calling dr.Model.get().featurelist_id. For OTV and timeseries projects, datetime_prediction_partition is required and limited to the first backtest (‘0’) or holdout (‘holdout’).

In order to create PredictionExplanations for a particular model and dataset, you must first:

  • Compute Feature Impact for the model via datarobot.Model.get_feature_impact()/

  • Compute a PredictionExplanationsInitialization for the model via datarobot.PredictionExplanationsInitialization.create(project_id, model_id).

  • Compute predictions for the model and dataset via datarobot.Model.request_predictions(dataset_id).

threshold_high and threshold_low are optional filters applied to speed up computation. When at least one is specified, only the selected outlier rows will have prediction explanations computed. Rows are considered to be outliers if their predicted value (in case of regression projects) or probability of being the positive class (in case of classification projects) is less than threshold_low or greater than thresholdHigh. If neither is specified, prediction explanations will be computed for all rows.

Parameters:
  • project_id (str) – The ID of the project the model belongs to.

  • model_id (str) – The ID of the model for which prediction explanations are requested.

  • dataset_id (str) – The ID of the prediction dataset for which prediction explanations are requested.

  • threshold_low (Optional[float]) – The lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset. If neither threshold_high nor threshold_low is specified, prediction explanations will be computed for all rows.

  • threshold_high (Optional[float]) – The high threshold, above which a prediction must score in order for prediction explanations to be computed. If neither threshold_high nor threshold_low is specified, prediction explanations will be computed for all rows.

  • max_explanations (Optional[int]) – The maximum number of prediction explanations to supply per row of the dataset (default: 3).

  • mode (PredictionExplanationsMode, optional) – The mode of calculation for multiclass models. If not specified, the server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).

  • datetime_prediction_partition (str) – Options: ‘0’, ‘holdout’ or None. Used only by time series and OTV projects to indicate what part of the dataset will be used to generate predictions for computing prediction explanation. Current options are ‘0’ (first backtest) and ‘holdout’. Note that only the validation partition of the first backtest will be used to generation predictions.

Returns:

job – An instance of created async job.

Return type:

Job

classmethod list(project_id, model_id=None, limit=None, offset=None)

List of prediction explanations metadata for a specified project.

Parameters:
  • project_id (str) – id of the project to list prediction explanations for

  • model_id (Optional[str]) – if specified, only prediction explanations computed for this model will be returned

  • limit (int or None) – at most this many results are returned, default: no limit

  • offset (int or None) – this many results will be skipped, default: 0

Returns:

prediction_explanations

Return type:

list[PredictionExplanations]

get_rows(batch_size=None, exclude_adjusted_predictions=True)

Retrieve prediction explanations rows.

Parameters:
  • batch_size (int or None, optional) – maximum number of prediction explanations rows to retrieve per request

  • exclude_adjusted_predictions (bool) – Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

Yields:

prediction_explanations_row (PredictionExplanationsRow) – Represents prediction explanations computed for a prediction row.

is_multiclass()

Whether these explanations are for a multiclass project or a non-multiclass project

is_unsupervised_clustering_or_multiclass()

Clustering and multiclass XEMP always has either one of num_top_classes or class_names parameters set

get_number_of_explained_classes()

How many classes we attempt to explain for each row

get_all_as_dataframe(exclude_adjusted_predictions=True)

Retrieve all prediction explanations rows and return them as a pandas.DataFrame.

Returned dataframe has the following structure:

  • row_id : row id from prediction dataset

  • prediction : the output of the model for this row

  • adjusted_prediction : adjusted prediction values (only appears for projects that utilize prediction adjustments, e.g. projects with an exposure column)

  • class_0_label : a class level from the target (only appears for classification projects)

  • class_0_probability : the probability that the target is this class (only appears for classification projects)

  • class_1_label : a class level from the target (only appears for classification projects)

  • class_1_probability : the probability that the target is this class (only appears for classification projects)

  • explanation_0_feature : the name of the feature contributing to the prediction for this explanation

  • explanation_0_feature_value : the value the feature took on

  • explanation_0_label : the output being driven by this explanation. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.

  • explanation_0_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this explanation

  • explanation_0_per_ngram_text_explanations : Text prediction explanations data in json formatted string.

  • explanation_0_strength : the amount this feature’s value affected the prediction

  • explanation_N_feature : the name of the feature contributing to the prediction for this explanation

  • explanation_N_feature_value : the value the feature took on

  • explanation_N_label : the output being driven by this explanation. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.

  • explanation_N_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this explanation

  • explanation_N_per_ngram_text_explanations : Text prediction explanations data in json formatted string.

  • explanation_N_strength : the amount this feature’s value affected the prediction

For classification projects, the server does not guarantee any ordering on the prediction values, however within this function we sort the values so that class_X corresponds to the same class from row to row.

Parameters:

exclude_adjusted_predictions (bool) – Optional, defaults to True. Set this to False to include adjusted prediction values in the returned dataframe.

Returns:

dataframe

Return type:

pandas.DataFrame

download_to_csv(filename, encoding='utf-8', exclude_adjusted_predictions=True)

Save prediction explanations rows into CSV file.

Parameters:
  • filename (str or file object) – path or file object to save prediction explanations rows

  • encoding (string, optional) – A string representing the encoding to use in the output file, defaults to ‘utf-8’

  • exclude_adjusted_predictions (bool) – Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

get_prediction_explanations_page(limit=None, offset=None, exclude_adjusted_predictions=True)

Get prediction explanations.

If you don’t want use a generator interface, you can access paginated prediction explanations directly.

Parameters:
  • limit (int or None) – the number of records to return, the server will use a (possibly finite) default if not specified

  • offset (int or None) – the number of records to skip, default 0

  • exclude_adjusted_predictions (bool) – Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

Returns:

prediction_explanations

Return type:

PredictionExplanationsPage

delete()

Delete these prediction explanations.

class datarobot.models.prediction_explanations.PredictionExplanationsRow

Represents prediction explanations computed for a prediction row.

Notes

PredictionValue contains:

  • label : describes what this model output corresponds to. For regression projects, it is the name of the target feature. For classification projects, it is a level from the target feature.

  • value : the output of the prediction. For regression projects, it is the predicted value of the target. For classification projects, it is the predicted probability the row belongs to the class identified by the label.

PredictionExplanation contains:

  • label : described what output was driven by this explanation. For regression projects, it is the name of the target feature. For classification projects, it is the class whose probability increasing would correspond to a positive strength of this prediction explanation.

  • feature : the name of the feature contributing to the prediction

  • feature_value : the value the feature took on for this row

  • strength : the amount this feature’s value affected the prediction

  • qualitative_strength : a human-readable description of how strongly the feature affected the prediction. A large positive effect is denoted ‘+++’, medium ‘++’, small ‘+’, very small ‘<+’. A large negative effect is denoted ‘—’, medium ‘–’, small ‘-’, very small ‘<-‘.

Variables:
  • row_id (int) – which row this PredictionExplanationsRow describes

  • prediction (float) – the output of the model for this row

  • adjusted_prediction (float or None) – adjusted prediction value for projects that provide this information, None otherwise

  • prediction_values (list) – an array of dictionaries with a schema described as PredictionValue

  • adjusted_prediction_values (list) – same as prediction_values but for adjusted predictions

  • prediction_explanations (list) – an array of dictionaries with a schema described as PredictionExplanation

class datarobot.models.prediction_explanations.PredictionExplanationsPage

Represents a batch of prediction explanations received by one request.

Variables:
  • id (str) – id of the prediction explanations computation result

  • data (list[dict]) – list of raw prediction explanations; each row corresponds to a row of the prediction dataset

  • count (int) – total number of rows computed

  • previous_page (str) – where to retrieve previous page of prediction explanations, None if current page is the first

  • next_page (str) – where to retrieve next page of prediction explanations, None if current page is the last

  • prediction_explanations_record_location (str) – where to retrieve the prediction explanations metadata

  • adjustment_method (str) – Adjustment method that was applied to predictions, or ‘N/A’ if no adjustments were done.

classmethod get(project_id, prediction_explanations_id, limit=None, offset=0, exclude_adjusted_predictions=True)

Retrieve prediction explanations.

Parameters:
  • project_id (str) – id of the project the model belongs to

  • prediction_explanations_id (str) – id of the prediction explanations

  • limit (int or None) – the number of records to return; the server will use a (possibly finite) default if not specified

  • offset (int or None) – the number of records to skip, default 0

  • exclude_adjusted_predictions (bool) – Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

Returns:

prediction_explanations – The queried instance.

Return type:

PredictionExplanationsPage

class datarobot.models.ShapMatrix

Represents SHAP based prediction explanations and provides access to score values.

Variables:
  • project_id (str) – id of the project the model belongs to

  • shap_matrix_id (str) – id of the generated SHAP matrix

  • model_id (str) – id of the model used to

  • dataset_id (str) – id of the prediction dataset SHAP values were computed for

Examples

import datarobot as dr

# request SHAP matrix calculation
shap_matrix_job = dr.ShapMatrix.create(project_id, model_id, dataset_id)
shap_matrix = shap_matrix_job.get_result_when_complete()

# list available SHAP matrices
shap_matrices = dr.ShapMatrix.list(project_id)
shap_matrix = shap_matrices[0]

# get SHAP matrix as dataframe
shap_matrix_values = shap_matrix.get_as_dataframe()
classmethod create(cls, project_id, model_id, dataset_id)

Calculate SHAP based prediction explanations against previously uploaded dataset.

Parameters:
  • project_id (str) – id of the project the model belongs to

  • model_id (str) – id of the model for which prediction explanations are requested

  • dataset_id (str) – id of the prediction dataset for which prediction explanations are requested (as uploaded from Project.upload_dataset)

Returns:

job – The job computing the SHAP based prediction explanations

Return type:

ShapMatrixJob

Raises:
  • ClientError – If the server responded with 4xx status. Possible reasons are project, model or dataset don’t exist, user is not allowed or model doesn’t support SHAP based prediction explanations

  • ServerError – If the server responded with 5xx status

classmethod list(cls, project_id)

Fetch all the computed SHAP prediction explanations for a project.

Parameters:

project_id (str) – id of the project

Returns:

A list of ShapMatrix objects

Return type:

List of ShapMatrix

Raises:
classmethod get(cls, project_id, id)

Retrieve the specific SHAP matrix.

Parameters:
  • project_id (str) – id of the project the model belongs to

  • id (str) – id of the SHAP matrix

Return type:

ShapMatrix object representing specified record

get_as_dataframe(read_timeout=60)

Retrieve SHAP matrix values as dataframe.

Return type:

DataFrame

Returns:

  • dataframe (pandas.DataFrame) – A dataframe with SHAP scores

  • read_timeout (int (optional, default 60)) – .. versionadded:: 2.29

    Wait this many seconds for the server to respond.

Raises:
class datarobot.models.ClassListMode

Calculate prediction explanations for the specified classes in each row.

Variables:

class_names (list) – List of class names that will be explained for each dataset row.

get_api_parameters(batch_route=False)

Get parameters passed in corresponding API call

Parameters:

batch_route (bool) – Batch routes describe prediction calls with all possible parameters, so to distinguish explanation parameters from others they have prefix in parameters.

Return type:

dict

class datarobot.models.TopPredictionsMode

Calculate prediction explanations for the number of top predicted classes in each row.

Variables:

num_top_classes (int) – Number of top predicted classes [1..10] that will be explained for each dataset row.

get_api_parameters(batch_route=False)

Get parameters passed in corresponding API call

Parameters:

batch_route (bool) – Batch routes describe prediction calls with all possible parameters, so to distinguish explanation parameters from others they have prefix in parameters.

Return type:

dict