Prediction Explanations
- class datarobot.PredictionExplanationsInitialization(project_id, model_id, prediction_explanations_sample=None)
Represents a prediction explanations initialization of a model.
- Attributes:
- project_idstr
id of the project the model belongs to
- model_idstr
id of the model the prediction explanations initialization is for
- prediction_explanations_samplelist of dict
a small sample of prediction explanations that could be generated for the model
- classmethod get(project_id, model_id)
Retrieve the prediction explanations initialization for a model.
Prediction explanations initializations are a prerequisite for computing prediction explanations, and include a sample what the computed prediction explanations for a prediction dataset would look like.
- Parameters:
- project_idstr
id of the project the model belongs to
- model_idstr
id of the model the prediction explanations initialization is for
- Returns:
- prediction_explanations_initializationPredictionExplanationsInitialization
The queried instance.
- Raises:
- ClientError (404)
If the project or model does not exist or the initialization has not been computed.
- classmethod create(project_id, model_id)
Create a prediction explanations initialization for the specified model.
- Parameters:
- project_idstr
id of the project the model belongs to
- model_idstr
id of the model for which initialization is requested
- Returns:
- jobJob
an instance of created async job
- delete()
Delete this prediction explanations initialization.
- class datarobot.PredictionExplanations(id, project_id, model_id, dataset_id, max_explanations, num_columns, finish_time, prediction_explanations_location, threshold_low=None, threshold_high=None, class_names=None, num_top_classes=None, source=None)
Represents prediction explanations metadata and provides access to computation results.
Examples
prediction_explanations = dr.PredictionExplanations.get(project_id, explanations_id) for row in prediction_explanations.get_rows(): print(row) # row is an instance of PredictionExplanationsRow
- Attributes:
- idstr
id of the record and prediction explanations computation result
- project_idstr
id of the project the model belongs to
- model_idstr
id of the model the prediction explanations are for
- dataset_idstr
id of the prediction dataset prediction explanations were computed for
- max_explanationsint
maximum number of prediction explanations to supply per row of the dataset
- threshold_lowfloat
the lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset
- threshold_highfloat
the high threshold, above which a prediction must score in order for prediction explanations to be computed for a row in the dataset
- num_columnsint
the number of columns prediction explanations were computed for
- finish_timefloat
timestamp referencing when computation for these prediction explanations finished
- prediction_explanations_locationstr
where to retrieve the prediction explanations
- source: str
For OTV/TS in-training predictions. Holds the portion of the training dataset used to generate predictions.
- classmethod get(project_id, prediction_explanations_id)
Retrieve a specific prediction explanations metadata.
- Parameters:
- project_idstr
id of the project the explanations belong to
- prediction_explanations_idstr
id of the prediction explanations
- Returns:
- prediction_explanationsPredictionExplanations
The queried instance.
- classmethod create(project_id, model_id, dataset_id, max_explanations=None, threshold_low=None, threshold_high=None, mode=None)
Create prediction explanations for the specified dataset.
In order to create PredictionExplanations for a particular model and dataset, you must first:
Compute feature impact for the model via
datarobot.Model.get_feature_impact()
Compute a PredictionExplanationsInitialization for the model via
datarobot.PredictionExplanationsInitialization.create(project_id, model_id)
Compute predictions for the model and dataset via
datarobot.Model.request_predictions(dataset_id)
threshold_high
andthreshold_low
are optional filters applied to speed up computation. When at least one is specified, only the selected outlier rows will have prediction explanations computed. Rows are considered to be outliers if their predicted value (in case of regression projects) or probability of being the positive class (in case of classification projects) is less thanthreshold_low
or greater thanthresholdHigh
. If neither is specified, prediction explanations will be computed for all rows.- Parameters:
- project_idstr
id of the project the model belongs to
- model_idstr
id of the model for which prediction explanations are requested
- dataset_idstr
id of the prediction dataset for which prediction explanations are requested
- threshold_lowfloat, optional
the lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset. If neither
threshold_high
northreshold_low
is specified, prediction explanations will be computed for all rows.- threshold_highfloat, optional
the high threshold, above which a prediction must score in order for prediction explanations to be computed. If neither
threshold_high
northreshold_low
is specified, prediction explanations will be computed for all rows.- max_explanationsint, optional
the maximum number of prediction explanations to supply per row of the dataset, default: 3.
- modePredictionExplanationsMode, optional
mode of calculation for multiclass models, if not specified - server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).
- Returns:
- job: Job
an instance of created async job
- classmethod create_on_training_data(project_id, model_id, dataset_id, max_explanations=None, threshold_low=None, threshold_high=None, mode=None, datetime_prediction_partition=None)
Create prediction explanations for the the dataset used to train the model. This can be retrieved by calling
dr.Model.get().featurelist_id
. For OTV and timeseries projects,datetime_prediction_partition
is required and limited to the first backtest (‘0’) or holdout (‘holdout’).In order to create PredictionExplanations for a particular model and dataset, you must first:
Compute Feature Impact for the model via
datarobot.Model.get_feature_impact()
/Compute a PredictionExplanationsInitialization for the model via
datarobot.PredictionExplanationsInitialization.create(project_id, model_id)
.Compute predictions for the model and dataset via
datarobot.Model.request_predictions(dataset_id)
.
threshold_high
andthreshold_low
are optional filters applied to speed up computation. When at least one is specified, only the selected outlier rows will have prediction explanations computed. Rows are considered to be outliers if their predicted value (in case of regression projects) or probability of being the positive class (in case of classification projects) is less thanthreshold_low
or greater thanthresholdHigh
. If neither is specified, prediction explanations will be computed for all rows.- Parameters:
- project_idstr
The ID of the project the model belongs to.
- model_idstr
The ID of the model for which prediction explanations are requested.
- dataset_idstr
The ID of the prediction dataset for which prediction explanations are requested.
- threshold_lowfloat, optional
The lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset. If neither
threshold_high
northreshold_low
is specified, prediction explanations will be computed for all rows.- threshold_highfloat, optional
The high threshold, above which a prediction must score in order for prediction explanations to be computed. If neither
threshold_high
northreshold_low
is specified, prediction explanations will be computed for all rows.- max_explanationsint, optional
The maximum number of prediction explanations to supply per row of the dataset (default: 3).
- modePredictionExplanationsMode, optional
The mode of calculation for multiclass models. If not specified, the server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).
- datetime_prediction_partition: str
Options: ‘0’, ‘holdout’ or None. Used only by time series and OTV projects to indicate what part of the dataset will be used to generate predictions for computing prediction explanation. Current options are ‘0’ (first backtest) and ‘holdout’. Note that only the validation partition of the first backtest will be used to generation predictions.
- Returns:
- job: Job
An instance of created async job.
- classmethod list(project_id, model_id=None, limit=None, offset=None)
List of prediction explanations metadata for a specified project.
- Parameters:
- project_idstr
id of the project to list prediction explanations for
- model_idstr, optional
if specified, only prediction explanations computed for this model will be returned
- limitint or None
at most this many results are returned, default: no limit
- offsetint or None
this many results will be skipped, default: 0
- Returns:
- prediction_explanationslist[PredictionExplanations]
- get_rows(batch_size=None, exclude_adjusted_predictions=True)
Retrieve prediction explanations rows.
- Parameters:
- batch_sizeint or None, optional
maximum number of prediction explanations rows to retrieve per request
- exclude_adjusted_predictionsbool
Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
- Yields:
- prediction_explanations_rowPredictionExplanationsRow
Represents prediction explanations computed for a prediction row.
- is_multiclass()
Whether these explanations are for a multiclass project or a non-multiclass project
- is_unsupervised_clustering_or_multiclass()
Clustering and multiclass XEMP always has either one of num_top_classes or class_names parameters set
- get_number_of_explained_classes()
How many classes we attempt to explain for each row
- get_all_as_dataframe(exclude_adjusted_predictions=True)
Retrieve all prediction explanations rows and return them as a pandas.DataFrame.
Returned dataframe has the following structure:
row_id : row id from prediction dataset
prediction : the output of the model for this row
adjusted_prediction : adjusted prediction values (only appears for projects that utilize prediction adjustments, e.g. projects with an exposure column)
class_0_label : a class level from the target (only appears for classification projects)
class_0_probability : the probability that the target is this class (only appears for classification projects)
class_1_label : a class level from the target (only appears for classification projects)
class_1_probability : the probability that the target is this class (only appears for classification projects)
explanation_0_feature : the name of the feature contributing to the prediction for this explanation
explanation_0_feature_value : the value the feature took on
explanation_0_label : the output being driven by this explanation. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.
explanation_0_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this explanation
explanation_0_per_ngram_text_explanations : Text prediction explanations data in json formatted string.
explanation_0_strength : the amount this feature’s value affected the prediction
…
explanation_N_feature : the name of the feature contributing to the prediction for this explanation
explanation_N_feature_value : the value the feature took on
explanation_N_label : the output being driven by this explanation. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.
explanation_N_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this explanation
explanation_N_per_ngram_text_explanations : Text prediction explanations data in json formatted string.
explanation_N_strength : the amount this feature’s value affected the prediction
For classification projects, the server does not guarantee any ordering on the prediction values, however within this function we sort the values so that class_X corresponds to the same class from row to row.
- Parameters:
- exclude_adjusted_predictionsbool
Optional, defaults to True. Set this to False to include adjusted prediction values in the returned dataframe.
- Returns:
- dataframe: pandas.DataFrame
- download_to_csv(filename, encoding='utf-8', exclude_adjusted_predictions=True)
Save prediction explanations rows into CSV file.
- Parameters:
- filenamestr or file object
path or file object to save prediction explanations rows
- encodingstring, optional
A string representing the encoding to use in the output file, defaults to ‘utf-8’
- exclude_adjusted_predictionsbool
Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
- get_prediction_explanations_page(limit=None, offset=None, exclude_adjusted_predictions=True)
Get prediction explanations.
If you don’t want use a generator interface, you can access paginated prediction explanations directly.
- Parameters:
- limitint or None
the number of records to return, the server will use a (possibly finite) default if not specified
- offsetint or None
the number of records to skip, default 0
- exclude_adjusted_predictionsbool
Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
- Returns:
- prediction_explanationsPredictionExplanationsPage
- delete()
Delete these prediction explanations.
- class datarobot.models.prediction_explanations.PredictionExplanationsRow(row_id, prediction, prediction_values, prediction_explanations=None, adjusted_prediction=None, adjusted_prediction_values=None)
Represents prediction explanations computed for a prediction row.
Notes
PredictionValue
contains:label
: describes what this model output corresponds to. For regression projects, it is the name of the target feature. For classification projects, it is a level from the target feature.value
: the output of the prediction. For regression projects, it is the predicted value of the target. For classification projects, it is the predicted probability the row belongs to the class identified by the label.
PredictionExplanation
contains:label
: described what output was driven by this explanation. For regression projects, it is the name of the target feature. For classification projects, it is the class whose probability increasing would correspond to a positive strength of this prediction explanation.feature
: the name of the feature contributing to the predictionfeature_value
: the value the feature took on for this rowstrength
: the amount this feature’s value affected the predictionqualitative_strength
: a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’)
- Attributes:
- row_idint
which row this
PredictionExplanationsRow
describes- predictionfloat
the output of the model for this row
- adjusted_predictionfloat or None
adjusted prediction value for projects that provide this information, None otherwise
- prediction_valueslist
an array of dictionaries with a schema described as
PredictionValue
- adjusted_prediction_valueslist
same as prediction_values but for adjusted predictions
- prediction_explanationslist
an array of dictionaries with a schema described as
PredictionExplanation
- class datarobot.models.prediction_explanations.PredictionExplanationsPage(id, count=None, previous=None, next=None, data=None, prediction_explanations_record_location=None, adjustment_method=None)
Represents a batch of prediction explanations received by one request.
- Attributes:
- idstr
id of the prediction explanations computation result
- datalist[dict]
list of raw prediction explanations; each row corresponds to a row of the prediction dataset
- countint
total number of rows computed
- previous_pagestr
where to retrieve previous page of prediction explanations, None if current page is the first
- next_pagestr
where to retrieve next page of prediction explanations, None if current page is the last
- prediction_explanations_record_locationstr
where to retrieve the prediction explanations metadata
- adjustment_methodstr
Adjustment method that was applied to predictions, or ‘N/A’ if no adjustments were done.
- classmethod get(project_id, prediction_explanations_id, limit=None, offset=0, exclude_adjusted_predictions=True)
Retrieve prediction explanations.
- Parameters:
- project_idstr
id of the project the model belongs to
- prediction_explanations_idstr
id of the prediction explanations
- limitint or None
the number of records to return; the server will use a (possibly finite) default if not specified
- offsetint or None
the number of records to skip, default 0
- exclude_adjusted_predictionsbool
Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
- Returns:
- prediction_explanationsPredictionExplanationsPage
The queried instance.
- class datarobot.models.ShapMatrix(project_id, id, model_id=None, dataset_id=None)
Represents SHAP based prediction explanations and provides access to score values.
Examples
import datarobot as dr # request SHAP matrix calculation shap_matrix_job = dr.ShapMatrix.create(project_id, model_id, dataset_id) shap_matrix = shap_matrix_job.get_result_when_complete() # list available SHAP matrices shap_matrices = dr.ShapMatrix.list(project_id) shap_matrix = shap_matrices[0] # get SHAP matrix as dataframe shap_matrix_values = shap_matrix.get_as_dataframe()
- Attributes:
- project_idstr
id of the project the model belongs to
- shap_matrix_idstr
id of the generated SHAP matrix
- model_idstr
id of the model used to
- dataset_idstr
id of the prediction dataset SHAP values were computed for
- classmethod create(cls, project_id, model_id, dataset_id)
Calculate SHAP based prediction explanations against previously uploaded dataset.
- Parameters:
- project_idstr
id of the project the model belongs to
- model_idstr
id of the model for which prediction explanations are requested
- dataset_idstr
id of the prediction dataset for which prediction explanations are requested (as uploaded from Project.upload_dataset)
- Returns:
- jobShapMatrixJob
The job computing the SHAP based prediction explanations
- Raises:
- ClientError
If the server responded with 4xx status. Possible reasons are project, model or dataset don’t exist, user is not allowed or model doesn’t support SHAP based prediction explanations
- ServerError
If the server responded with 5xx status
- Return type:
- classmethod list(cls, project_id)
Fetch all the computed SHAP prediction explanations for a project.
- Parameters:
- project_idstr
id of the project
- Returns:
- List of ShapMatrix
A list of
ShapMatrix
objects
- Raises:
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type:
List
[ShapMatrix
]
- classmethod get(cls, project_id, id)
Retrieve the specific SHAP matrix.
- Parameters:
- project_idstr
id of the project the model belongs to
- idstr
id of the SHAP matrix
- Returns:
ShapMatrix
object representing specified record
- Return type:
- get_as_dataframe(read_timeout=60)
Retrieve SHAP matrix values as dataframe.
- Returns:
- dataframepandas.DataFrame
A dataframe with SHAP scores
- read_timeoutint (optional, default 60)
Added in version 2.29.
Wait this many seconds for the server to respond.
- Raises:
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- Return type:
DataFrame
- class datarobot.models.ClassListMode(class_names)
Calculate prediction explanations for the specified classes in each row.
- Attributes:
- class_nameslist
List of class names that will be explained for each dataset row.
- get_api_parameters(batch_route=False)
Get parameters passed in corresponding API call
- Parameters:
- batch_routebool
Batch routes describe prediction calls with all possible parameters, so to distinguish explanation parameters from others they have prefix in parameters.
- Returns:
- dict
- class datarobot.models.TopPredictionsMode(num_top_classes)
Calculate prediction explanations for the number of top predicted classes in each row.
- Attributes:
- num_top_classesint
Number of top predicted classes [1..10] that will be explained for each dataset row.
- get_api_parameters(batch_route=False)
Get parameters passed in corresponding API call
- Parameters:
- batch_routebool
Batch routes describe prediction calls with all possible parameters, so to distinguish explanation parameters from others they have prefix in parameters.
- Returns:
- dict