Insights
- class datarobot.insights.ShapMatrix
Class for SHAP Matrix calculations. Use the standard methods of BaseInsight to compute and retrieve: compute, create, list, get.
- property matrix: Any
SHAP matrix values.
- property base_value: float
SHAP base value for the matrix values
- property columns: List[str]
List of columns associated with the SHAP matrix
- property link_function: str
Link function used to generate the SHAP matrix
- classmethod compute(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, **kwargs)
Submit an insight compute request. You can use create if you want to wait synchronously for the completion of the job. May be overridden by insight subclasses to accept additional parameters.
- Parameters:
entity_id (
str
) – The ID of the entity to compute the insight.source (
str
) – The source type to use when computing the insight.data_slice_id (
Optional[str]
) – Data slice ID to use when computing the insight.external_dataset_id (
Optional[str]
) – External dataset ID to use when computing the insight.entity_type (
Optional[ENTITY_TYPES]
) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”.quick_compute (
Optional[bool]
) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.
- Returns:
Status check job entity for the asynchronous insight calculation.
- Return type:
StatusCheckJob
- classmethod create(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, max_wait=600, **kwargs)
Create an insight and wait for completion. May be overridden by insight subclasses to accept additional parameters.
- Parameters:
entity_id (
str
) – The ID of the entity to compute the insight.source (
str
) – The source type to use when computing the insight.data_slice_id (
Optional[str]
) – Data slice ID to use when computing the insight.external_dataset_id (
Optional[str]
) – External dataset ID to use when computing the insight.entity_type (
Optional[ENTITY_TYPES]
) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”.quick_compute (
Optional[bool]
) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.max_wait (
int
) – The number of seconds to wait for the result.
- Returns:
Entity of the newly or already computed insights.
- Return type:
Self
- classmethod from_data(data)
Instantiate an object of this class using a dict.
- Parameters:
data (
dict
) – Correctly snake_cased keys and their values.- Return type:
TypeVar
(T
, bound= APIObject)
- classmethod from_server_data(data, keep_attrs=None)
Override from_server_data to handle paginated responses
- Return type:
Self
- classmethod get(entity_id, source=INSIGHTS_SOURCES.VALIDATION, quick_compute=None, **kwargs)
Return the first matching insight based on the entity id and kwargs.
- Parameters:
entity_id (
str
) – The ID of the entity to retrieve generated insights.source (
str
) – The source type to use when retrieving the insight.quick_compute (
Optional[bool]
) – Sets whether to retrieve the insight that was computed using quick-compute. If not specified, quick_compute is not used for matching.
- Returns:
Previously computed insight.
- Return type:
Self
- classmethod get_as_csv(entity_id, **kwargs)
Retrieve a specific insight represented in CSV format.
- Parameters:
entity_id (
str
) – ID of the entity to retrieve the insight.**kwargs (
Any
) – Additional keyword arguments to pass to the retrieve function.
- Returns:
The retrieved insight.
- Return type:
str
- classmethod get_as_dataframe(entity_id, **kwargs)
Retrieve a specific insight represented as a pandas DataFrame.
- Parameters:
entity_id (
str
) – ID of the entity to retrieve the insight.**kwargs (
Any
) – Additional keyword arguments to pass to the retrieve function.
- Returns:
The retrieved insight.
- Return type:
DataFrame
- get_uri()
This should define the URI to their browser based interactions
- Return type:
str
- classmethod list(entity_id)
List all generated insights.
- Parameters:
entity_id (
str
) – The ID of the entity queried for listing all generated insights.- Returns:
List of newly or previously computed insights.
- Return type:
List[Self]
- open_in_browser()
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type:
None
- sort(key_name)
Sorts insights data
- Return type:
None
- class datarobot.insights.ShapPreview
Class for SHAP Preview calculations. Use the standard methods of BaseInsight to compute and retrieve: compute, create, list, get.
- property previews: List[Dict[str, Any]]
SHAP preview values.
- Returns:
preview – A list of the ShapPreview values for each row.
- Return type:
List[Dict[str
,Any]]
- property previews_count: int
The number of shap preview rows.
- Return type:
int
- classmethod get(entity_id, source=INSIGHTS_SOURCES.VALIDATION, quick_compute=None, prediction_filter_row_count=None, prediction_filter_percentiles=None, prediction_filter_operand_first=None, prediction_filter_operand_second=None, prediction_filter_operator=None, feature_filter_count=None, feature_filter_name=None, **kwargs)
Return the first matching ShapPreview insight based on the entity id and kwargs.
- Parameters:
entity_id (
str
) – The ID of the entity to retrieve generated insights.source (
str
) – The source type to use when retrieving the insight.quick_compute (
Optional[bool]
) – Sets whether to retrieve the insight that was computed using quick-compute. If not specified, quick_compute is not used for matching.prediction_filter_row_count (
Optional[int]
) – The maximum number of preview rows to return.prediction_filter_percentiles (
Optional[int]
) – The number of percentile intervals to select from the total number of rows. This field will supersede predictionFilterRowCount if both are present.prediction_filter_operand_first (
Optional[float]
) – The first operand to apply to filtered predictions.prediction_filter_operand_second (
Optional[float]
) – The second operand to apply to filtered predictions.prediction_filter_operator (
Optional[str]
) – The operator to apply to filtered predictions.feature_filter_count (
Optional[int]
) – The maximum number of features to return for each preview.feature_filter_name (
Optional[str]
) – The names of specific features to return for each preview.
- Returns:
List of newly or already computed insights.
- Return type:
List[Any]
- classmethod compute(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, **kwargs)
Submit an insight compute request. You can use create if you want to wait synchronously for the completion of the job. May be overridden by insight subclasses to accept additional parameters.
- Parameters:
entity_id (
str
) – The ID of the entity to compute the insight.source (
str
) – The source type to use when computing the insight.data_slice_id (
Optional[str]
) – Data slice ID to use when computing the insight.external_dataset_id (
Optional[str]
) – External dataset ID to use when computing the insight.entity_type (
Optional[ENTITY_TYPES]
) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”.quick_compute (
Optional[bool]
) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.
- Returns:
Status check job entity for the asynchronous insight calculation.
- Return type:
StatusCheckJob
- classmethod create(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, max_wait=600, **kwargs)
Create an insight and wait for completion. May be overridden by insight subclasses to accept additional parameters.
- Parameters:
entity_id (
str
) – The ID of the entity to compute the insight.source (
str
) – The source type to use when computing the insight.data_slice_id (
Optional[str]
) – Data slice ID to use when computing the insight.external_dataset_id (
Optional[str]
) – External dataset ID to use when computing the insight.entity_type (
Optional[ENTITY_TYPES]
) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”.quick_compute (
Optional[bool]
) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.max_wait (
int
) – The number of seconds to wait for the result.
- Returns:
Entity of the newly or already computed insights.
- Return type:
Self
- classmethod from_data(data)
Instantiate an object of this class using a dict.
- Parameters:
data (
dict
) – Correctly snake_cased keys and their values.- Return type:
TypeVar
(T
, bound= APIObject)
- classmethod from_server_data(data, keep_attrs=None)
Override from_server_data to handle paginated responses
- Return type:
Self
- get_uri()
This should define the URI to their browser based interactions
- Return type:
str
- classmethod list(entity_id)
List all generated insights.
- Parameters:
entity_id (
str
) – The ID of the entity queried for listing all generated insights.- Returns:
List of newly or previously computed insights.
- Return type:
List[Self]
- open_in_browser()
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type:
None
- sort(key_name)
Sorts insights data
- Return type:
None
- class datarobot.insights.ShapImpact
Class for SHAP Impact calculations. Use the standard methods of BaseInsight to compute and retrieve: compute, create, list, get.
- sort(key_name='-impact_normalized')
Sorts insights data by key name.
- Parameters:
key_name (
str
) – item key name to sort data. One of ‘feature_name’, ‘impact_normalized’ or ‘impact_unnormalized’. Starting with ‘-’ reverses sort order. Default ‘-impact_normalized’- Return type:
None
- property shap_impacts: List[List[Any]]
SHAP impact values
- Returns:
A list of the SHAP impact values
- Return type:
shap impacts
- property base_value: List[float]
A list of base prediction values
- property capping: Dict[str, Any] | None
Capping for the models in the blender
- property link: str | None
Shared link function of the models in the blender
- property row_count: int | None
Number of SHAP impact rows. This is deprecated.
- classmethod compute(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, **kwargs)
Submit an insight compute request. You can use create if you want to wait synchronously for the completion of the job. May be overridden by insight subclasses to accept additional parameters.
- Parameters:
entity_id (
str
) – The ID of the entity to compute the insight.source (
str
) – The source type to use when computing the insight.data_slice_id (
Optional[str]
) – Data slice ID to use when computing the insight.external_dataset_id (
Optional[str]
) – External dataset ID to use when computing the insight.entity_type (
Optional[ENTITY_TYPES]
) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”.quick_compute (
Optional[bool]
) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.
- Returns:
Status check job entity for the asynchronous insight calculation.
- Return type:
StatusCheckJob
- classmethod create(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, max_wait=600, **kwargs)
Create an insight and wait for completion. May be overridden by insight subclasses to accept additional parameters.
- Parameters:
entity_id (
str
) – The ID of the entity to compute the insight.source (
str
) – The source type to use when computing the insight.data_slice_id (
Optional[str]
) – Data slice ID to use when computing the insight.external_dataset_id (
Optional[str]
) – External dataset ID to use when computing the insight.entity_type (
Optional[ENTITY_TYPES]
) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”.quick_compute (
Optional[bool]
) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.max_wait (
int
) – The number of seconds to wait for the result.
- Returns:
Entity of the newly or already computed insights.
- Return type:
Self
- classmethod from_data(data)
Instantiate an object of this class using a dict.
- Parameters:
data (
dict
) – Correctly snake_cased keys and their values.- Return type:
TypeVar
(T
, bound= APIObject)
- classmethod from_server_data(data, keep_attrs=None)
Override from_server_data to handle paginated responses
- Return type:
Self
- classmethod get(entity_id, source=INSIGHTS_SOURCES.VALIDATION, quick_compute=None, **kwargs)
Return the first matching insight based on the entity id and kwargs.
- Parameters:
entity_id (
str
) – The ID of the entity to retrieve generated insights.source (
str
) – The source type to use when retrieving the insight.quick_compute (
Optional[bool]
) – Sets whether to retrieve the insight that was computed using quick-compute. If not specified, quick_compute is not used for matching.
- Returns:
Previously computed insight.
- Return type:
Self
- get_uri()
This should define the URI to their browser based interactions
- Return type:
str
- classmethod list(entity_id)
List all generated insights.
- Parameters:
entity_id (
str
) – The ID of the entity queried for listing all generated insights.- Returns:
List of newly or previously computed insights.
- Return type:
List[Self]
- open_in_browser()
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type:
None
- class datarobot.insights.ShapDistributions
Class for SHAP Distributions calculations. Use the standard methods of BaseInsight to compute and retrieve: compute, create, list, get.
- property features: List[Dict[str, Any]]
SHAP feature values
- Returns:
features – A list of the ShapDistributions values for each row
- Return type:
List[Dict[str
,Any]]
- property total_features_count: int
Number of shap distributions features
- Return type:
int
- classmethod compute(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, **kwargs)
Submit an insight compute request. You can use create if you want to wait synchronously for the completion of the job. May be overridden by insight subclasses to accept additional parameters.
- Parameters:
entity_id (
str
) – The ID of the entity to compute the insight.source (
str
) – The source type to use when computing the insight.data_slice_id (
Optional[str]
) – Data slice ID to use when computing the insight.external_dataset_id (
Optional[str]
) – External dataset ID to use when computing the insight.entity_type (
Optional[ENTITY_TYPES]
) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”.quick_compute (
Optional[bool]
) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.
- Returns:
Status check job entity for the asynchronous insight calculation.
- Return type:
StatusCheckJob
- classmethod create(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, max_wait=600, **kwargs)
Create an insight and wait for completion. May be overridden by insight subclasses to accept additional parameters.
- Parameters:
entity_id (
str
) – The ID of the entity to compute the insight.source (
str
) – The source type to use when computing the insight.data_slice_id (
Optional[str]
) – Data slice ID to use when computing the insight.external_dataset_id (
Optional[str]
) – External dataset ID to use when computing the insight.entity_type (
Optional[ENTITY_TYPES]
) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”.quick_compute (
Optional[bool]
) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.max_wait (
int
) – The number of seconds to wait for the result.
- Returns:
Entity of the newly or already computed insights.
- Return type:
Self
- classmethod from_data(data)
Instantiate an object of this class using a dict.
- Parameters:
data (
dict
) – Correctly snake_cased keys and their values.- Return type:
TypeVar
(T
, bound= APIObject)
- classmethod from_server_data(data, keep_attrs=None)
Override from_server_data to handle paginated responses
- Return type:
Self
- classmethod get(entity_id, source=INSIGHTS_SOURCES.VALIDATION, quick_compute=None, **kwargs)
Return the first matching insight based on the entity id and kwargs.
- Parameters:
entity_id (
str
) – The ID of the entity to retrieve generated insights.source (
str
) – The source type to use when retrieving the insight.quick_compute (
Optional[bool]
) – Sets whether to retrieve the insight that was computed using quick-compute. If not specified, quick_compute is not used for matching.
- Returns:
Previously computed insight.
- Return type:
Self
- get_uri()
This should define the URI to their browser based interactions
- Return type:
str
- classmethod list(entity_id)
List all generated insights.
- Parameters:
entity_id (
str
) – The ID of the entity queried for listing all generated insights.- Returns:
List of newly or previously computed insights.
- Return type:
List[Self]
- open_in_browser()
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type:
None
- sort(key_name)
Sorts insights data
- Return type:
None
Types
- class datarobot.models.RocCurveEstimatedMetric
Typed dict for estimated metric
- class datarobot.models.AnomalyAssessmentRecordMetadata
Typed dict for record metadata
- class datarobot.models.AnomalyAssessmentPreviewBin
Typed dict for preview bin
- class datarobot.models.ShapleyFeatureContribution
Typed dict for shapley feature contribution
- class datarobot.models.AnomalyAssessmentDataPoint
Typed dict for data points
- class datarobot.models.RegionExplanationsData
Typed dict for region explanations
Anomaly assessment
- class datarobot.models.anomaly_assessment.AnomalyAssessmentRecord
Object which keeps metadata about anomaly assessment insight for the particular subset, backtest and series and the links to proceed to get the anomaly assessment data.
Added in version v2.25.
- Variables:
record_id (
str
) – The ID of the record.project_id (
str
) – The ID of the project record belongs to.model_id (
str
) – The ID of the model record belongs to.backtest (
int
or"holdout"
) – The backtest of the record.source (
"training"
or"validation"
) – The source of the recordseries_id (
str
orNone
) – The series id of the record for the multiseries projects. Defined only for the multiseries projects.status (
str
) – The status of the insight. One ofdatarobot.enums.AnomalyAssessmentStatus
status_details (
str
) – The explanation of the status.start_date (
str
orNone
) – The ISO-formatted timestamp of the first prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.end_date (
str
orNone
) – The ISO-formatted timestamp of the last prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.prediction_threshold (
float
orNone
) – The threshold, all rows with anomaly scores greater or equal to it have shap explanations computed.preview_location (
str
orNone
) – The URL to retrieve predictions preview for the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.latest_explanations_location (
str
orNone
) – The URL to retrieve the latest predictions with the shap explanations. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.delete_location (
str
) – The URL to delete anomaly assessment record and relevant insight data.
- classmethod list(project_id, model_id, backtest=None, source=None, series_id=None, limit=100, offset=0, with_data_only=False)
Retrieve the list of the anomaly assessment records for the project and model. Output can be filtered and limited.
- Parameters:
project_id (
str
) – The ID of the project record belongs to.model_id (
str
) – The ID of the model record belongs to.backtest (
int
or"holdout"
) – The backtest to filter records by.source (
"training"
or"validation"
) – The source to filter records by.series_id (
Optional[str]
) – The series id to filter records by. Can be specified for multiseries projects.limit (
Optional[int]
) – 100 by default. At most this many results are returned.offset (
Optional[int]
) – This many results will be skipped.with_data_only (
bool
,False by default
) – Filter by status == AnomalyAssessmentStatus.COMPLETED. If True, records with no data or not supported will be omitted.
- Returns:
The anomaly assessment record.
- Return type:
- classmethod compute(project_id, model_id, backtest, source, series_id=None)
Request anomaly assessment insight computation on the specified subset.
- Parameters:
project_id (
str
) – The ID of the project to compute insight for.model_id (
str
) – The ID of the model to compute insight for.backtest (
int
or"holdout"
) – The backtest to compute insight for.source (
"training"
or"validation"
) – The source to compute insight for.series_id (
Optional[str]
) – The series id to compute insight for. Required for multiseries projects.
- Returns:
The anomaly assessment record.
- Return type:
- delete()
Delete anomaly assessment record with preview and explanations.
- Return type:
None
- get_predictions_preview()
Retrieve aggregated predictions statistics for the anomaly assessment record.
- Return type:
- get_latest_explanations()
Retrieve latest predictions along with shap explanations for the most anomalous records.
- Return type:
- get_explanations(start_date=None, end_date=None, points_count=None)
Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.
- Parameters:
start_date (
Optional[str]
) – The start of the date range to get explanations in. Example:2020-01-01T00:00:00.000000Z
end_date (
Optional[str]
) – The end of the date range to get explanations in. Example:2020-10-01T00:00:00.000000Z
points_count (
Optional[int]
) – The number of the rows to return.
- Return type:
- get_explanations_data_in_regions(regions, prediction_threshold=0.0)
Get predictions along with explanations for the specified regions, sorted by predictions in descending order.
- Parameters:
regions (
list
ofAnomalyAssessmentPreviewBin
) – For each region explanations will be retrieved and merged.prediction_threshold (
Optional[float]
) – If specified, only points with score greater or equal to the threshold will be returned.
- Returns:
dict in a form of {‘explanations’: explanations, ‘shap_base_value’: shap_base_value}
- Return type:
RegionExplanationsData
- class datarobot.models.anomaly_assessment.AnomalyAssessmentExplanations
Object which keeps predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points.
Added in version v2.25.
- Variables:
record_id (
str
) – The ID of the record.project_id (
str
) – The ID of the project record belongs to.model_id (
str
) – The ID of the model record belongs to.backtest (
int
or"holdout"
) – The backtest of the record.source (
"training"
or"validation"
) – The source of the record.series_id (
str
orNone
) – The series id of the record for the multiseries projects. Defined only for the multiseries projects.start_date (
str
orNone
) – The ISO-formatted datetime of the first row in thedata
. Will be None of there is no data in the specified range.end_date (
str
orNone
) – The ISO-formatted datetime of the last row in thedata
. Will be None of there is no data in the specified range.shap_base_value (
float
) – Shap base value.count (
int
) – The number of points indata
.data (
array
of DataPoint objects orNone
) – The list of DataPoint objects in the specified date range.
Notes
DataPoint
contains:shap_explanation
: None or an array of up to 10 ShapleyFeatureContribution objects. Only rows with the highest anomaly scores have Shapley explanations calculated. Value is None if prediction is lower than prediction_threshold.timestamp
(str) : ISO-formatted timestamp for the row.prediction
(float) : The output of the model for this row.
ShapleyFeatureContribution
contains:feature_value
(str) : the feature value for this row. First 50 characters are returned.strength
(float) : the shap value for this feature and row.feature
(str) : the feature name.
- classmethod get(project_id, record_id, start_date=None, end_date=None, points_count=None)
Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.
- Parameters:
project_id (
str
) – The ID of the project.record_id (
str
) – The ID of the anomaly assessment record.start_date (
Optional[str]
) – The start of the date range to get explanations in. Example:2020-01-01T00:00:00.000000Z
end_date (
Optional[str]
) – The end of the date range to get explanations in. Example:2020-10-01T00:00:00.000000Z
points_count (
Optional[int]
) – The number of the rows to return.
- Return type:
- class datarobot.models.anomaly_assessment.AnomalyAssessmentPredictionsPreview
Aggregated predictions over time for the corresponding anomaly assessment record. Intended to find the bins with highest anomaly scores.
Added in version v2.25.
- Variables:
record_id (
str
) – The ID of the record.project_id (
str
) – The ID of the project record belongs to.model_id (
str
) – The ID of the model record belongs to.backtest (
int
or"holdout"
) – The backtest of the record.source (
"training"
or"validation"
) – The source of the recordseries_id (
str
orNone
) – The series id of the record for the multiseries projects. Defined only for the multiseries projects.start_date (
str
) – the ISO-formatted timestamp of the first prediction in the subset.end_date (
str
) – the ISO-formatted timestamp of the last prediction in the subset.preview_bins (
list
ofpreview_bin objects.
) – The aggregated predictions for the subset. Bins boundaries may differ from actual start/end dates because this is an aggregation.
Notes
PreviewBin
contains:start_date
(str) : the ISO-formatted datetime of the start of the bin.end_date
(str) : the ISO-formatted datetime of the end of the bin.avg_predicted
(float or None) : the average prediction of the model in the bin. None if there are no entries in the bin.max_predicted
(float or None) : the maximum prediction of the model in the bin. None if there are no entries in the bin.frequency
(int) : the number of the rows in the bin.
- classmethod get(project_id, record_id)
Retrieve aggregated predictions over time.
- Parameters:
project_id (
str
) – The ID of the project.record_id (
str
) – The ID of the anomaly assessment record.
- Return type:
- find_anomalous_regions(max_prediction_threshold=0.0)
- Sort preview bins by max_predicted value and select those with max predicted value
greater or equal to max prediction threshold. Sort the result by max predicted value in descending order.
- Parameters:
max_prediction_threshold (
Optional[float]
) – Return bins with maximum anomaly score greater or equal to max_prediction_threshold.- Returns:
preview_bins – Filtered and sorted preview bins
- Return type:
list
ofpreview_bin
Confusion chart
- class datarobot.models.confusion_chart.ConfusionChart
Confusion Chart data for model.
Notes
ClassMetrics
is a dict containing the following:class_name
(string) name of the classactual_count
(int) number of times this class is seen in the validation datapredicted_count
(int) number of times this class has been predicted for the validation dataf1
(float) F1 scorerecall
(float) recall scoreprecision
(float) precision scorewas_actual_percentages
(list of dict) one vs all actual percentages in format specified below.other_class_name
(string) the name of the other classpercentage
(float) the percentage of the times this class was predicted when is was actually class (from 0 to 1)
was_predicted_percentages
(list of dict) one vs all predicted percentages in format specified below.other_class_name
(string) the name of the other classpercentage
(float) the percentage of the times this class was actual predicted (from 0 to 1)
confusion_matrix_one_vs_all
(list of list) 2d list representing 2x2 one vs all matrix.This represents the True/False Negative/Positive rates as integer for each class. The data structure looks like:
[ [ True Negative, False Positive ], [ False Negative, True Positive ] ]
- Variables:
source (
str
) – Confusion Chart data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.raw_data (
dict
) – All of the raw data for the Confusion Chartconfusion_matrix (
list
oflist
) – The N x N confusion matrixclasses (
list
) – The names of each of the classesclass_metrics (
list
ofdicts
) – List of dicts with schema described asClassMetrics
above.source_model_id (
str
) – ID of the model this Confusion chart represents; in some cases, insights from the parent of a frozen model may be used
Lift chart
- class datarobot.models.lift_chart.LiftChart
Lift chart data for model.
Notes
LiftChartBin
is a dict containing the following:actual
(float) Sum of actual target values in binpredicted
(float) Sum of predicted target values in binbin_weight
(float) The weight of the bin. For weighted projects, it is the sum of the weights of the rows in the bin. For unweighted projects, it is the number of rows in the bin.
- Variables:
source (
str
) – Lift chart data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.bins (
list
ofdict
) – List of dicts with schema described asLiftChartBin
above.source_model_id (
str
) – ID of the model this lift chart represents; in some cases, insights from the parent of a frozen model may be usedtarget_class (
Optional[str]
) – For multiclass lift - target class for this lift chart data.data_slice_id (
string
orNone
) – The slice to retrieve Lift Chart for; if None, retrieve unsliced data.
- classmethod from_server_data(data, keep_attrs=None, use_insights_format=False, **kwargs)
Overwrite APIObject.from_server_data to handle lift chart data retrieved from either legacy URL or /insights/ new URL.
- Parameters:
data (
dict
) – The directly translated dict of JSON from the server. No casing fixes have taken placeuse_insights_format (
Optional[bool]
) – Whether to repack the data from the format used in the GET /insights/liftChart/ URL to the format used in the legacy URL.
Data slices
- class datarobot.models.data_slice.DataSlice
Definition of a data slice
- Variables:
id (
str
) – ID of the data slice.name (
str
) – Name of the data slice definition.filters (
list[DataSliceFiltersType]
) –- List of DataSliceFiltersType with params
operand (str) Name of the feature to use in the filter.
operator (str) Operator to use in the filter - eq, in, <, or >.
values (Union[str, int, float]) Values to use from the feature.
project_id (
str
) – ID of the project that the model is part of.
- classmethod list(project, offset=0, limit=100)
List the data slices in the same project
- Parameters:
project (
Union[str
,Project]
) – ID of the project or Project object from which to list data slices.offset (
Optional[int]
) – Number of items to skip.limit (
Optional[int]
) – Number of items to return.
- Returns:
data_slices
- Return type:
list[DataSlice]
Examples
>>> import datarobot as dr >>> ... # set up your Client >>> data_slices = dr.DataSlice.list("646d0ea0cd8eb2355a68b0e5") >>> data_slices [DataSlice(...), DataSlice(...), ...]
- classmethod create(name, filters, project)
Creates a data slice in the project with the given name and filters
- Parameters:
name (
str
) – Name of the data slice definition.filters (
list[DataSliceFiltersType]
) –- List of filters (dict) with params:
- operand (str)
Name of the feature to use in filter.
- operator (str)
Operator to use: ‘eq’, ‘in’, ‘<’, or ‘>’.
- values (Union[str, int, float])
Values to use from the feature.
project (
Union[str
,Project]
) – Project ID or Project object from which to list data slices.
- Returns:
data_slice – The data slice object created
- Return type:
Examples
>>> import datarobot as dr >>> ... # set up your Client and retrieve a project >>> data_slice = dr.DataSlice.create( >>> ... name='yes', >>> ... filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}], >>> ... project=project, >>> ... ) >>> data_slice DataSlice( filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}], id=646d1296bd0c543d88923c9d, name=yes, project_id=646d0ea0cd8eb2355a68b0e5 )
- delete()
Deletes the data slice from storage :rtype:
None
Examples
>>> import datarobot as dr >>> data_slice = dr.DataSlice.get('5a8ac9ab07a57a0001be501f') >>> data_slice.delete()
>>> import datarobot as dr >>> ... # get project or project_id >>> data_slices = dr.DataSlice.list(project) # project object or project_id >>> data_slice = data_slices[0] # choose a data slice from the list >>> data_slice.delete()
- request_size(source, model=None)
Submits a request to validate the data slice’s filters and calculate the data slice’s number of rows on a given source
- Parameters:
source (
INSIGHTS_SOURCES
) – Subset of data (partition or “source”) on which to apply the data slice for estimating available rows.model (
Optional[Union[str
,Model]]
) – Model object or ID of the model. It is only required when source is “training”.
- Returns:
status_check_job – Object contains all needed logic for a periodical status check of an async job.
- Return type:
StatusCheckJob
Examples
>>> import datarobot as dr >>> ... # get project or project_id >>> data_slices = dr.DataSlice.list(project) # project object or project_id >>> data_slice = data_slices[0] # choose a data slice from the list >>> status_check_job = data_slice.request_size("validation")
Model is required when source is ‘training’
>>> import datarobot as dr >>> ... # get project or project_id >>> data_slices = dr.DataSlice.list(project) # project object or project_id >>> data_slice = data_slices[0] # choose a data slice from the list >>> status_check_job = data_slice.request_size("training", model)
- get_size_info(source, model=None)
Get information about the data slice applied to a source
- Parameters:
source (
INSIGHTS_SOURCES
) – Source (partition or subset) to which the data slice was appliedmodel (
Optional[Union[str
,Model]]
) – ID for the model whose training data was sliced with this data slice. Required when the source is “training”, and not used for other sources.
- Returns:
slice_size_info – Information of the data slice applied to a source
- Return type:
Examples
>>> import datarobot as dr >>> ... # set up your Client >>> data_slices = dr.DataSlice.list("646d0ea0cd8eb2355a68b0e5") >>> data_slice = slices[0] # can be any slice in the list >>> data_slice_size_info = data_slice.get_size_info("validation") >>> data_slice_size_info DataSliceSizeInfo( data_slice_id=6493a1776ea78e6644382535, messages=[ { 'level': 'WARNING', 'description': 'Low Observation Count', 'additional_info': 'Insufficient number of observations to compute some insights.' } ], model_id=None, project_id=646d0ea0cd8eb2355a68b0e5, slice_size=1, source=validation, ) >>> data_slice_size_info.to_dict() { 'data_slice_id': '6493a1776ea78e6644382535', 'messages': [ { 'level': 'WARNING', 'description': 'Low Observation Count', 'additional_info': 'Insufficient number of observations to compute some insights.' } ], 'model_id': None, 'project_id': '646d0ea0cd8eb2355a68b0e5', 'slice_size': 1, 'source': 'validation', }
>>> import datarobot as dr >>> ... # set up your Client >>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535") >>> data_slice_size_info = data_slice.get_size_info("validation")
When using source=’training’, the model param is required.
>>> import datarobot as dr >>> ... # set up your Client >>> model = dr.Model.get(project_id, model_id) >>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535") >>> data_slice_size_info = data_slice.get_size_info("training", model)
>>> import datarobot as dr >>> ... # set up your Client >>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535") >>> data_slice_size_info = data_slice.get_size_info("training", model_id)
- classmethod get(data_slice_id)
Retrieve a specific data slice.
- Parameters:
data_slice_id (
str
) – The identifier of the data slice to retrieve.- Returns:
data_slice – The required data slice.
- Return type:
Examples
>>> import datarobot as dr >>> dr.DataSlice.get('648b232b9da812a6aaa0b7a9') DataSlice(filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}], id=648b232b9da812a6aaa0b7a9, name=test, project_id=644bc575572480b565ca42cd )
- class datarobot.models.data_slice.DataSliceSizeInfo
Definition of a data slice applied to a source
- Variables:
data_slice_id (
str
) – ID of the data sliceproject_id (
str
) – ID of the projectsource (
str
) – Data source used to calculate the number of rows (slice size) after applying the data slice’s filtersmodel_id (
Optional[str]
) – ID of the model, required when source (subset) is ‘training’slice_size (
int
) – Number of rows in the data slice for a given sourcemessages (
list[DataSliceSizeMessageType]
) – List of user-relevant messages related to a data slice
Datetime trend plots
- class datarobot.models.datetime_trend_plots.AccuracyOverTimePlotsMetadata
Accuracy over Time metadata for datetime model.
Added in version v2.25.
- Variables:
project_id (
string
) – The project ID.model_id (
string
) – The model ID.forecast_distance (
int
orNone
) – The forecast distance for which the metadata was retrieved. None for OTV projects.resolutions (
list
ofstring
) – A list ofdatarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
, which represents available time resolutions for which plots can be retrieved.backtest_metadata (
list
ofdict
) – List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details.holdout_metadata (
dict
) – Holdout metadata dict. See backtest/holdout metadata info in Notes for more details.backtest_statuses (
list
ofdict
) – List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details.holdout_statuses (
dict
) – Holdout status dict. See backtest/holdout status info in Notes for more details.
Notes
Backtest/holdout status is a dict containing the following:
- training: string
Status backtest/holdout training. One of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
- validation: string
Status backtest/holdout validation. One of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
Backtest/holdout metadata is a dict containing the following:
- training: dict
Start and end dates for the backtest/holdout training.
- validation: dict
Start and end dates for the backtest/holdout validation.
Each dict in the training and validation in backtest/holdout metadata is structured like:
- start_date: datetime.datetime or None
The datetime of the start of the chart data (inclusive). None if chart data is not computed.
- end_date: datetime.datetime or None
The datetime of the end of the chart data (exclusive). None if chart data is not computed.
- class datarobot.models.datetime_trend_plots.AccuracyOverTimePlot
Accuracy over Time plot for datetime model.
Added in version v2.25.
- Variables:
project_id (
string
) – The project ID.model_id (
string
) – The model ID.resolution (
string
) – The resolution that is used for binning. One ofdatarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
start_date (
datetime.datetime
) – The datetime of the start of the chart data (inclusive).end_date (
datetime.datetime
) – The datetime of the end of the chart data (exclusive).bins (
list
ofdict
) – List of plot bins. See bin info in Notes for more details.statistics (
dict
) – Statistics for plot. See statistics info in Notes for more details.calendar_events (
list
ofdict
) – List of calendar events for the plot. See calendar events info in Notes for more details.
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
- actual: float or None
Average actual value of the target in the bin. None if there are no entries in the bin.
- predicted: float or None
Average prediction of the model in the bin. None if there are no entries in the bin.
- frequency: int or None
Indicates number of values averaged in bin.
Statistics is a dict containing the following:
- durbin_watson: float or None
The Durbin-Watson statistic for the chart data. Value is between 0 and 4. Durbin-Watson statistic is a test statistic used to detect the presence of autocorrelation at lag 1 in the residuals (prediction errors) from a regression analysis. More info https://wikipedia.org/wiki/Durbin%E2%80%93Watson_statistic
Calendar event is a dict containing the following:
- name: string
Name of the calendar event.
- date: datetime
Date of the calendar event.
- series_id: string or None
The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.
- class datarobot.models.datetime_trend_plots.AccuracyOverTimePlotPreview
Accuracy over Time plot preview for datetime model.
Added in version v2.25.
- Variables:
project_id (
string
) – The project ID.model_id (
string
) – The model ID.start_date (
datetime.datetime
) – The datetime of the start of the chart data (inclusive).end_date (
datetime.datetime
) – The datetime of the end of the chart data (exclusive).bins (
list
ofdict
) – List of plot bins. See bin info in Notes for more details.
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
- actual: float or None
Average actual value of the target in the bin. None if there are no entries in the bin.
- predicted: float or None
Average prediction of the model in the bin. None if there are no entries in the bin.
- class datarobot.models.datetime_trend_plots.ForecastVsActualPlotsMetadata
Forecast vs Actual plots metadata for datetime model.
Added in version v2.25.
- Variables:
project_id (
string
) – The project ID.model_id (
string
) – The model ID.resolutions (
list
ofstring
) – A list ofdatarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
, which represents available time resolutions for which plots can be retrieved.backtest_metadata (
list
ofdict
) – List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details.holdout_metadata (
dict
) – Holdout metadata dict. See backtest/holdout metadata info in Notes for more details.backtest_statuses (
list
ofdict
) – List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details.holdout_statuses (
dict
) – Holdout status dict. See backtest/holdout status info in Notes for more details.
Notes
Backtest/holdout status is a dict containing the following:
- training: dict
Dict containing each of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
as dict key, and list of forecast distances for particular status as dict value.
- validation: dict
Dict containing each of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
as dict key, and list of forecast distances for particular status as dict value.
Backtest/holdout metadata is a dict containing the following:
- training: dict
Start and end dates for the backtest/holdout training.
- validation: dict
Start and end dates for the backtest/holdout validation.
Each dict in the training and validation in backtest/holdout metadata is structured like:
- start_date: datetime.datetime or None
The datetime of the start of the chart data (inclusive). None if chart data is not computed.
- end_date: datetime.datetime or None
The datetime of the end of the chart data (exclusive). None if chart data is not computed.
- class datarobot.models.datetime_trend_plots.ForecastVsActualPlot
Forecast vs Actual plot for datetime model.
Added in version v2.25.
- Variables:
project_id (
string
) – The project ID.model_id (
string
) – The model ID.forecast_distances (
list
ofint
) – A list of forecast distances that were retrieved.resolution (
string
) – The resolution that is used for binning. One ofdatarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
start_date (
datetime.datetime
) – The datetime of the start of the chart data (inclusive).end_date (
datetime.datetime
) – The datetime of the end of the chart data (exclusive).bins (
list
ofdict
) – List of plot bins. See bin info in Notes for more details.calendar_events (
list
ofdict
) – List of calendar events for the plot. See calendar events info in Notes for more details.
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
- actual: float or None
Average actual value of the target in the bin. None if there are no entries in the bin.
- forecasts: list of float
A list of average forecasts for the model for each forecast distance. Empty if there are no forecasts in the bin. Each index in the forecasts list maps to forecastDistances list index.
- error: float or None
Average absolute residual value of the bin. None if there are no entries in the bin.
- normalized_error: float or None
Normalized average absolute residual value of the bin. None if there are no entries in the bin.
- frequency: int or None
Indicates number of values averaged in bin.
Calendar event is a dict containing the following:
- name: string
Name of the calendar event.
- date: datetime
Date of the calendar event.
- series_id: string or None
The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.
- class datarobot.models.datetime_trend_plots.ForecastVsActualPlotPreview
Forecast vs Actual plot preview for datetime model.
Added in version v2.25.
- Variables:
project_id (
string
) – The project ID.model_id (
string
) – The model ID.start_date (
datetime.datetime
) – The datetime of the start of the chart data (inclusive).end_date (
datetime.datetime
) – The datetime of the end of the chart data (exclusive).bins (
list
ofdict
) – List of plot bins. See bin info in Notes for more details.
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
- actual: float or None
Average actual value of the target in the bin. None if there are no entries in the bin.
- predicted: float or None
Average prediction of the model in the bin. None if there are no entries in the bin.
- class datarobot.models.datetime_trend_plots.AnomalyOverTimePlotsMetadata
Anomaly over Time metadata for datetime model.
Added in version v2.25.
- Variables:
project_id (
string
) – The project ID.model_id (
string
) – The model ID.resolutions (
list
ofstring
) – A list ofdatarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
, which represents available time resolutions for which plots can be retrieved.backtest_metadata (
list
ofdict
) – List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details.holdout_metadata (
dict
) – Holdout metadata dict. See backtest/holdout metadata info in Notes for more details.backtest_statuses (
list
ofdict
) – List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details.holdout_statuses (
dict
) – Holdout status dict. See backtest/holdout status info in Notes for more details.
Notes
Backtest/holdout status is a dict containing the following:
- training: string
Status backtest/holdout training. One of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
- validation: string
Status backtest/holdout validation. One of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
Backtest/holdout metadata is a dict containing the following:
- training: dict
Start and end dates for the backtest/holdout training.
- validation: dict
Start and end dates for the backtest/holdout validation.
Each dict in the training and validation in backtest/holdout metadata is structured like:
- start_date: datetime.datetime or None
The datetime of the start of the chart data (inclusive). None if chart data is not computed.
- end_date: datetime.datetime or None
The datetime of the end of the chart data (exclusive). None if chart data is not computed.
- class datarobot.models.datetime_trend_plots.AnomalyOverTimePlot
Anomaly over Time plot for datetime model.
Added in version v2.25.
- Variables:
project_id (
string
) – The project ID.model_id (
string
) – The model ID.resolution (
string
) – The resolution that is used for binning. One ofdatarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
start_date (
datetime.datetime
) – The datetime of the start of the chart data (inclusive).end_date (
datetime.datetime
) – The datetime of the end of the chart data (exclusive).bins (
list
ofdict
) – List of plot bins. See bin info in Notes for more details.calendar_events (
list
ofdict
) – List of calendar events for the plot. See calendar events info in Notes for more details.
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
- predicted: float or None
Average prediction of the model in the bin. None if there are no entries in the bin.
- frequency: int or None
Indicates number of values averaged in bin.
Calendar event is a dict containing the following:
- name: string
Name of the calendar event.
- date: datetime
Date of the calendar event.
- series_id: string or None
The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.
- class datarobot.models.datetime_trend_plots.AnomalyOverTimePlotPreview
Anomaly over Time plot preview for datetime model.
Added in version v2.25.
- Variables:
project_id (
string
) – The project ID.model_id (
string
) – The model ID.prediction_threshold (
float
) – Only bins with predictions exceeding this threshold are returned in the response.start_date (
datetime.datetime
) – The datetime of the start of the chart data (inclusive).end_date (
datetime.datetime
) – The datetime of the end of the chart data (exclusive).bins (
list
ofdict
) – List of plot bins. See bin info in Notes for more details.
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
External scores and insights
- class datarobot.ExternalScores
Metric scores on prediction dataset with target or actual value column in unsupervised case. Contains project metrics for supervised and special classification metrics set for unsupervised projects.
Added in version v2.21.
- Variables:
project_id (
str
) – id of the project the model belongs tomodel_id (
str
) – id of the modeldataset_id (
str
) – id of the prediction dataset with target or actual value column for unsupervised caseactual_value_column (
Optional[str]
) – For unsupervised projects only. Actual value column which was used to calculate the classification metrics and insights on the prediction dataset.scores (
list
ofdicts in a form
of{'label': metric_name, 'value': score}
) – Scores on the dataset.
Examples
List all scores for a dataset
from datarobot.models.external_dataset_scores_insights.external_scores import ExternalScores scores = ExternalScores.list(project_id, dataset_id=dataset_id)
- classmethod create(project_id, model_id, dataset_id, actual_value_column=None)
Compute an external dataset insights for the specified model.
- Parameters:
project_id (
str
) – id of the project the model belongs tomodel_id (
str
) – id of the model for which insights is requesteddataset_id (
str
) – id of the dataset for which insights is requestedactual_value_column (
Optional[str]
) – actual values column label, for unsupervised projects only
- Returns:
job – an instance of created async job
- Return type:
Job
- classmethod list(project_id, model_id=None, dataset_id=None, offset=0, limit=100)
Fetch external scores list for the project and optionally for model and dataset.
- Parameters:
project_id (
str
) – id of the projectmodel_id (
Optional[str]
) – if specified, only scores for this model will be retrieveddataset_id (
Optional[str]
) – if specified, only scores for this dataset will be retrievedoffset (
Optional[int]
) – this many results will be skipped, default: 0limit (
Optional[int]
) – at most this many results are returned, default: 100, max 1000. To return all results, specify 0
- Return type:
List
[ExternalScores
]- Returns:
A list of
External Scores
objects
- classmethod get(project_id, model_id, dataset_id)
Retrieve external scores for the project, model and dataset.
- Parameters:
project_id (
str
) – id of the projectmodel_id (
str
) – if specified, only scores for this model will be retrieveddataset_id (
str
) – if specified, only scores for this dataset will be retrieved
- Return type:
- Returns:
External Scores
object
- class datarobot.ExternalLiftChart
Lift chart for the model and prediction dataset with target or actual value column in unsupervised case.
Added in version v2.21.
LiftChartBin
is a dict containing the following:actual
(float) Sum of actual target values in binpredicted
(float) Sum of predicted target values in binbin_weight
(float) The weight of the bin. For weighted projects, it is the sum of the weights of the rows in the bin. For unweighted projects, it is the number of rows in the bin.
- Variables:
dataset_id (
str
) – id of the prediction dataset with target or actual value column for unsupervised casebins (
list
ofdict
) – List of dicts with schema described asLiftChartBin
above.
- classmethod list(project_id, model_id, dataset_id=None, offset=0, limit=100)
Retrieve list of the lift charts for the model.
- Parameters:
project_id (
str
) – id of the projectmodel_id (
str
) – if specified, only lift chart for this model will be retrieveddataset_id (
Optional[str]
) – if specified, only lift chart for this dataset will be retrievedoffset (
Optional[int]
) – this many results will be skipped, default: 0limit (
Optional[int]
) – at most this many results are returned, default: 100, max 1000. To return all results, specify 0
- Return type:
List
[ExternalLiftChart
]- Returns:
A list of
ExternalLiftChart
objects
- classmethod get(project_id, model_id, dataset_id)
Retrieve lift chart for the model and prediction dataset.
- Parameters:
project_id (
str
) – project idmodel_id (
str
) – model iddataset_id (
str
) – prediction dataset id with target or actual value column for unsupervised case
- Return type:
- Returns:
ExternalLiftChart
object
- class datarobot.ExternalRocCurve
ROC curve data for the model and prediction dataset with target or actual value column in unsupervised case.
Added in version v2.21.
- Variables:
dataset_id (
str
) – id of the prediction dataset with target or actual value column for unsupervised caseroc_points (
list
ofdict
) – List of precalculated metrics associated with thresholds for ROC curve.negative_class_predictions (
list
offloat
) – List of predictions from example for negative classpositive_class_predictions (
list
offloat
) – List of predictions from example for positive class
- classmethod list(project_id, model_id, dataset_id=None, offset=0, limit=100)
Retrieve list of the roc curves for the model.
- Parameters:
project_id (
str
) – id of the projectmodel_id (
str
) – if specified, only lift chart for this model will be retrieveddataset_id (
Optional[str]
) – if specified, only lift chart for this dataset will be retrievedoffset (
Optional[int]
) – this many results will be skipped, default: 0limit (
Optional[int]
) – at most this many results are returned, default: 100, max 1000. To return all results, specify 0
- Return type:
List
[ExternalRocCurve
]- Returns:
A list of
ExternalRocCurve
objects
- classmethod get(project_id, model_id, dataset_id)
Retrieve ROC curve chart for the model and prediction dataset.
- Parameters:
project_id (
str
) – project idmodel_id (
str
) – model iddataset_id (
str
) – prediction dataset id with target or actual value column for unsupervised case
- Return type:
- Returns:
ExternalRocCurve
object
Feature association
- class datarobot.models.FeatureAssociationMatrix
Feature association statistics for a project.
Notes
Projects created prior to v2.17 are not supported by this feature.
- Variables:
project_id (
str
) – Id of the associated project.strengths (
list
ofdict
) – Pairwise statistics for the available features as structured below.features (
list
ofdict
) – Metadata for each feature and where it goes in the matrix.
Examples
import datarobot as dr # retrieve feature association matrix feature_association_matrix = dr.FeatureAssociationMatrix.get(project_id) feature_association_matrix.strengths feature_association_matrix.features # retrieve feature association matrix for a metric, association type or a feature list feature_association_matrix = dr.FeatureAssociationMatrix.get( project_id, metric=enums.FEATURE_ASSOCIATION_METRIC.SPEARMAN, association_type=enums.FEATURE_ASSOCIATION_TYPE.CORRELATION, featurelist_id=featurelist_id, )
- classmethod get(project_id, metric=None, association_type=None, featurelist_id=None)
Get feature association statistics.
- Parameters:
project_id (
str
) – Id of the project that contains the requested associations.metric (
enums.FEATURE_ASSOCIATION_METRIC
) – The name of a metric to get pairwise data for. Since ‘v2.19’ this is optional and defaults to enums.FEATURE_ASSOCIATION_METRIC.MUTUAL_INFO.association_type (
enums.FEATURE_ASSOCIATION_TYPE
) – The type of dependence for the data. Since ‘v2.19’ this is optional and defaults to enums.FEATURE_ASSOCIATION_TYPE.ASSOCIATION.featurelist_id (
str
orNone
) – Optional, the feature list to lookup FAM data for. By default, depending on the type of the project “Informative Features” or “Timeseries Informative Features” list will be used. (New in version v2.19)
- Returns:
Feature association pairwise metric strength data, feature clustering data, and ordering data for Feature Association Matrix visualization.
- Return type:
- classmethod create(project_id, featurelist_id)
Compute the Feature Association Matrix for a Feature List
- Parameters:
project_id (
str
) – The ID of the project that the feature list belongs to.featurelist_id (
str
) – The ID of the feature list for which insights are requested.
- Returns:
status_check_job – Object contains all needed logic for a periodical status check of an async job.
- Return type:
Feature association matrix details
- class datarobot.models.FeatureAssociationMatrixDetails
Plotting details for a pair of passed features present in the feature association matrix.
Notes
Projects created prior to v2.17 are not supported by this feature.
- Variables:
project_id (
str
) – Id of the project that contains the requested associations.chart_type (
str
) – Which type of plotting the pair of features gets in the UI. e.g. ‘HORIZONTAL_BOX’, ‘VERTICAL_BOX’, ‘SCATTER’ or ‘CONTINGENCY’values (
list
) – The data triplets for pairwise plotting e.g. {“values”: [[460.0, 428.5, 0.001], [1679.3, 259.0, 0.001], …] The first entry of each list is a value of feature1, the second entry of each list is a value of feature2, and the third is the relative frequency of the pair of datapoints in the sample.features (
list
) – A list of the requested features, [feature1, feature2]types (
list
) – The type of feature1 and feature2. Possible values: “CATEGORICAL”, “NUMERIC”featurelist_id (
str
) – Id of the feature list to lookup FAM details for.
- classmethod get(project_id, feature1, feature2, featurelist_id=None)
Get a sample of the actual values used to measure the association between a pair of features
Added in version v2.17.
- Parameters:
project_id (
str
) – Id of the project of interest.feature1 (
str
) – Feature name for the first feature of interest.feature2 (
str
) – Feature name for the second feature of interest.featurelist_id (
str
) – Optional, the feature list to lookup FAM data for. By default, depending on the type of the project “Informative Features” or “Timeseries Informative Features” list will be used.
- Returns:
The feature association plotting for provided pair of features.
- Return type:
Feature association featurelists
- class datarobot.models.FeatureAssociationFeaturelists
Featurelists with feature association matrix availability flags for a project.
- Variables:
project_id (
str
) – Id of the project that contains the requested associations.featurelists (
list fo dict
) – The featurelists with the featurelist_id, title and the has_fam flag.
- classmethod get(project_id)
Get featurelists with feature association status for each.
- Parameters:
project_id (
str
) – Id of the project of interest.- Returns:
Featurelist with feature association status for each.
- Return type:
Feature effects
- class datarobot.models.FeatureEffects
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
- Variables:
project_id (
string
) – The project that contains requested modelmodel_id (
string
) – The model to retrieve Feature Effects forsource (
string
) – The source to retrieve Feature Effects fordata_slice_id (
string
orNone
) – The slice to retrieve Feature Effects for; if None, retrieve unsliced datafeature_effects (
list
) – Feature Effects for every featurebacktest_index (
string
,required only for DatetimeModels,
) – The backtest index to retrieve Feature Effects for.
Notes
featureEffects
is a dict containing the following:feature_name
(string) Name of the featurefeature_type
(string) dr.enums.FEATURE_TYPE, Feature type either numeric, categorical or datetimefeature_impact_score
(float) Feature impact scoreweight_label
(string) optional, Weight label if configured for the project else nullpartial_dependence
(List) Partial dependence resultspredicted_vs_actual
(List) optional, Predicted versus actual results, may be omitted if there are insufficient qualified samples
partial_dependence
is a dict containing the following:is_capped
(bool) Indicates whether the data for computation is cappeddata
(List) partial dependence results in the following format
data
is a list of dict containing the following:label
(string) Contains label for categorical and numeric features as stringdependence
(float) Value of partial dependence
predicted_vs_actual
is a dict containing the following:is_capped
(bool) Indicates whether the data for computation is cappeddata
(List) pred vs actual results in the following format
data
is a list of dict containing the following:label
(string) Contains label for categorical features for numeric features contains range or numeric value.bin
(List) optional, For numeric features contains labels for left and right bin limitspredicted
(float) Predicted valueactual
(float) Actual value. Actual value is null for unsupervised timeseries modelsrow_count
(int or float) Number of rows for the label and bin. Type is float if weight or exposure is set for the project.
- classmethod from_server_data(data, *args, use_insights_format=False, **kwargs)
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing.
- Parameters:
data (
dict
) – The directly translated dict of JSON from the server. No casing fixes have taken placeuse_insights_format (
Optional[bool]
) – Whether to repack the data from the format used in the GET /insights/featureEffects/ URL to the format used in the legacy URL.
- class datarobot.models.FeatureEffectMetadata
Feature Effect Metadata for model, contains status and available model sources.
Notes
source
is expected parameter to retrieve Feature Effect. One of provided sources shall be used.
- class datarobot.models.FeatureEffectMetadataDatetime
Feature Effect Metadata for datetime model, contains list of feature effect metadata per backtest.
Notes
feature effect metadata per backtest
contains:status
: str.backtest_index
: str.sources
: List[str].
source
is expected parameter to retrieve Feature Effect. One of provided sources shall be used.backtest_index
is expected parameter to submit compute request and retrieve Feature Effect. One of provided backtest indexes shall be used.- Variables:
data (
list[FeatureEffectMetadataDatetimePerBacktest]
) – List feature effect metadata per backtest
- class datarobot.models.FeatureEffectMetadataDatetimePerBacktest
Convert dictionary into feature effect metadata per backtest which contains backtest_index, status and sources.
Payoff matrix
- class datarobot.models.PayoffMatrix
Represents a Payoff Matrix, a costs/benefit scenario used for creating a profit curve.
- Variables:
project_id (
str
) – id of the project with which the payoff matrix is associated.id (
str
) – id of the payoff matrix.name (
str
) – User-supplied label for the payoff matrix.true_positive_value (
float
) – Cost or benefit of a true positive classificationtrue_negative_value (
float
) – Cost or benefit of a true negative classificationfalse_positive_value (
float
) – Cost or benefit of a false positive classificationfalse_negative_value (
float
) – Cost or benefit of a false negative classification
Examples
import datarobot as dr # create a payoff matrix payoff_matrix = dr.PayoffMatrix.create( project_id, name, true_positive_value=100, true_negative_value=10, false_positive_value=0, false_negative_value=-10, ) # list available payoff matrices payoff_matrices = dr.PayoffMatrix.list(project_id) payoff_matrix = payoff_matrices[0]
- classmethod create(project_id, name, true_positive_value=1, true_negative_value=1, false_positive_value=-1, false_negative_value=-1)
Create a payoff matrix associated with a specific project.
- Parameters:
project_id (
str
) – id of the project with which the payoff matrix will be associated- Returns:
payoff_matrix – The newly created payoff matrix
- Return type:
- classmethod list(project_id)
Fetch all the payoff matrices for a project.
- Parameters:
project_id (
str
) – id of the project- Returns:
A list of
PayoffMatrix
objects- Return type:
List
ofPayoffMatrix
- Raises:
datarobot.errors.ClientError – if the server responded with 4xx status
datarobot.errors.ServerError – if the server responded with 5xx status
- classmethod get(project_id, id)
Retrieve a specified payoff matrix.
- Parameters:
project_id (
str
) – id of the project the model belongs toid (
str
) – id of the payoff matrix
- Return type:
- Returns:
PayoffMatrix
object representing specifiedpayoff matrix
- Raises:
datarobot.errors.ClientError – if the server responded with 4xx status
datarobot.errors.ServerError – if the server responded with 5xx status
- classmethod update(project_id, id, name, true_positive_value, true_negative_value, false_positive_value, false_negative_value)
Update (replace) a payoff matrix. Note that all data fields are required.
- Parameters:
project_id (
str
) – id of the project to which the payoff matrix belongsid (
str
) – id of the payoff matrixname (
str
) – User-supplied label for the payoff matrixtrue_positive_value (
float
) – True positive payoff value to use for the profit curvetrue_negative_value (
float
) – True negative payoff value to use for the profit curvefalse_positive_value (
float
) – False positive payoff value to use for the profit curvefalse_negative_value (
float
) – False negative payoff value to use for the profit curve
- Returns:
PayoffMatrix with updated values
- Return type:
payoff_matrix
- Raises:
datarobot.errors.ClientError – if the server responded with 4xx status
datarobot.errors.ServerError – if the server responded with 5xx status
- classmethod delete(project_id, id)
Delete a specified payoff matrix.
- Parameters:
project_id (
str
) – id of the project the model belongs toid (
str
) – id of the payoff matrix
- Returns:
response – Empty response (204)
- Return type:
requests.Response
- Raises:
datarobot.errors.ClientError – if the server responded with 4xx status
datarobot.errors.ServerError – if the server responded with 5xx status
- classmethod from_data(data)
Instantiate an object of this class using a dict.
- Parameters:
data (
dict
) – Correctly snake_cased keys and their values.- Return type:
TypeVar
(T
, bound= APIObject)
- classmethod from_server_data(data, keep_attrs=None)
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters:
data (
dict
) – The directly translated dict of JSON from the server. No casing fixes have taken placekeep_attrs (
iterable
) – List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type:
TypeVar
(T
, bound= APIObject)
Prediction explanations
- class datarobot.PredictionExplanationsInitialization
Represents a prediction explanations initialization of a model.
- Variables:
project_id (
str
) – id of the project the model belongs tomodel_id (
str
) – id of the model the prediction explanations initialization is forprediction_explanations_sample (
list
ofdict
) – a small sample of prediction explanations that could be generated for the model
- classmethod get(project_id, model_id)
Retrieve the prediction explanations initialization for a model.
Prediction explanations initializations are a prerequisite for computing prediction explanations, and include a sample what the computed prediction explanations for a prediction dataset would look like.
- Parameters:
project_id (
str
) – id of the project the model belongs tomodel_id (
str
) – id of the model the prediction explanations initialization is for
- Returns:
prediction_explanations_initialization – The queried instance.
- Return type:
- Raises:
ClientError – If the project or model does not exist or the initialization has not been computed.
- classmethod create(project_id, model_id)
Create a prediction explanations initialization for the specified model.
- Parameters:
project_id (
str
) – id of the project the model belongs tomodel_id (
str
) – id of the model for which initialization is requested
- Returns:
job – an instance of created async job
- Return type:
Job
- delete()
Delete this prediction explanations initialization.
- class datarobot.PredictionExplanations
Represents prediction explanations metadata and provides access to computation results.
Examples
prediction_explanations = dr.PredictionExplanations.get(project_id, explanations_id) for row in prediction_explanations.get_rows(): print(row) # row is an instance of PredictionExplanationsRow
- Variables:
id (
str
) – id of the record and prediction explanations computation resultproject_id (
str
) – id of the project the model belongs tomodel_id (
str
) – id of the model the prediction explanations are fordataset_id (
str
) – id of the prediction dataset prediction explanations were computed formax_explanations (
int
) – maximum number of prediction explanations to supply per row of the datasetthreshold_low (
float
) – the lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the datasetthreshold_high (
float
) – the high threshold, above which a prediction must score in order for prediction explanations to be computed for a row in the datasetnum_columns (
int
) – the number of columns prediction explanations were computed forfinish_time (
float
) – timestamp referencing when computation for these prediction explanations finishedprediction_explanations_location (
str
) – where to retrieve the prediction explanationssource (
str
) – For OTV/TS in-training predictions. Holds the portion of the training dataset used to generate predictions.
- classmethod get(project_id, prediction_explanations_id)
Retrieve a specific prediction explanations metadata.
- Parameters:
project_id (
str
) – id of the project the explanations belong toprediction_explanations_id (
str
) – id of the prediction explanations
- Returns:
prediction_explanations – The queried instance.
- Return type:
- classmethod create(project_id, model_id, dataset_id, max_explanations=None, threshold_low=None, threshold_high=None, mode=None)
Create prediction explanations for the specified dataset.
In order to create PredictionExplanations for a particular model and dataset, you must first:
Compute feature impact for the model via
datarobot.Model.get_feature_impact()
Compute a PredictionExplanationsInitialization for the model via
datarobot.PredictionExplanationsInitialization.create(project_id, model_id)
Compute predictions for the model and dataset via
datarobot.Model.request_predictions(dataset_id)
threshold_high
andthreshold_low
are optional filters applied to speed up computation. When at least one is specified, only the selected outlier rows will have prediction explanations computed. Rows are considered to be outliers if their predicted value (in case of regression projects) or probability of being the positive class (in case of classification projects) is less thanthreshold_low
or greater thanthresholdHigh
. If neither is specified, prediction explanations will be computed for all rows.- Parameters:
project_id (
str
) – id of the project the model belongs tomodel_id (
str
) – id of the model for which prediction explanations are requesteddataset_id (
str
) – id of the prediction dataset for which prediction explanations are requestedthreshold_low (
Optional[float]
) – the lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset. If neitherthreshold_high
northreshold_low
is specified, prediction explanations will be computed for all rows.threshold_high (
Optional[float]
) – the high threshold, above which a prediction must score in order for prediction explanations to be computed. If neitherthreshold_high
northreshold_low
is specified, prediction explanations will be computed for all rows.max_explanations (
Optional[int]
) – the maximum number of prediction explanations to supply per row of the dataset, default: 3.mode (
PredictionExplanationsMode
, optional) – mode of calculation for multiclass models, if not specified - server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).
- Returns:
job – an instance of created async job
- Return type:
Job
- classmethod create_on_training_data(project_id, model_id, dataset_id, max_explanations=None, threshold_low=None, threshold_high=None, mode=None, datetime_prediction_partition=None)
Create prediction explanations for the the dataset used to train the model. This can be retrieved by calling
dr.Model.get().featurelist_id
. For OTV and timeseries projects,datetime_prediction_partition
is required and limited to the first backtest (‘0’) or holdout (‘holdout’).In order to create PredictionExplanations for a particular model and dataset, you must first:
Compute Feature Impact for the model via
datarobot.Model.get_feature_impact()
/Compute a PredictionExplanationsInitialization for the model via
datarobot.PredictionExplanationsInitialization.create(project_id, model_id)
.Compute predictions for the model and dataset via
datarobot.Model.request_predictions(dataset_id)
.
threshold_high
andthreshold_low
are optional filters applied to speed up computation. When at least one is specified, only the selected outlier rows will have prediction explanations computed. Rows are considered to be outliers if their predicted value (in case of regression projects) or probability of being the positive class (in case of classification projects) is less thanthreshold_low
or greater thanthresholdHigh
. If neither is specified, prediction explanations will be computed for all rows.- Parameters:
project_id (
str
) – The ID of the project the model belongs to.model_id (
str
) – The ID of the model for which prediction explanations are requested.dataset_id (
str
) – The ID of the prediction dataset for which prediction explanations are requested.threshold_low (
Optional[float]
) – The lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset. If neitherthreshold_high
northreshold_low
is specified, prediction explanations will be computed for all rows.threshold_high (
Optional[float]
) – The high threshold, above which a prediction must score in order for prediction explanations to be computed. If neitherthreshold_high
northreshold_low
is specified, prediction explanations will be computed for all rows.max_explanations (
Optional[int]
) – The maximum number of prediction explanations to supply per row of the dataset (default: 3).mode (
PredictionExplanationsMode
, optional) – The mode of calculation for multiclass models. If not specified, the server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).datetime_prediction_partition (
str
) – Options: ‘0’, ‘holdout’ or None. Used only by time series and OTV projects to indicate what part of the dataset will be used to generate predictions for computing prediction explanation. Current options are ‘0’ (first backtest) and ‘holdout’. Note that only the validation partition of the first backtest will be used to generation predictions.
- Returns:
job – An instance of created async job.
- Return type:
Job
- classmethod list(project_id, model_id=None, limit=None, offset=None)
List of prediction explanations metadata for a specified project.
- Parameters:
project_id (
str
) – id of the project to list prediction explanations formodel_id (
Optional[str]
) – if specified, only prediction explanations computed for this model will be returnedlimit (
int
orNone
) – at most this many results are returned, default: no limitoffset (
int
orNone
) – this many results will be skipped, default: 0
- Returns:
prediction_explanations
- Return type:
list[PredictionExplanations]
- get_rows(batch_size=None, exclude_adjusted_predictions=True)
Retrieve prediction explanations rows.
- Parameters:
batch_size (
int
orNone
, optional) – maximum number of prediction explanations rows to retrieve per requestexclude_adjusted_predictions (
bool
) – Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
- Yields:
prediction_explanations_row (
PredictionExplanationsRow
) – Represents prediction explanations computed for a prediction row.
- is_multiclass()
Whether these explanations are for a multiclass project or a non-multiclass project
- is_unsupervised_clustering_or_multiclass()
Clustering and multiclass XEMP always has either one of num_top_classes or class_names parameters set
- get_number_of_explained_classes()
How many classes we attempt to explain for each row
- get_all_as_dataframe(exclude_adjusted_predictions=True)
Retrieve all prediction explanations rows and return them as a pandas.DataFrame.
Returned dataframe has the following structure:
row_id : row id from prediction dataset
prediction : the output of the model for this row
adjusted_prediction : adjusted prediction values (only appears for projects that utilize prediction adjustments, e.g. projects with an exposure column)
class_0_label : a class level from the target (only appears for classification projects)
class_0_probability : the probability that the target is this class (only appears for classification projects)
class_1_label : a class level from the target (only appears for classification projects)
class_1_probability : the probability that the target is this class (only appears for classification projects)
explanation_0_feature : the name of the feature contributing to the prediction for this explanation
explanation_0_feature_value : the value the feature took on
explanation_0_label : the output being driven by this explanation. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.
explanation_0_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this explanation
explanation_0_per_ngram_text_explanations : Text prediction explanations data in json formatted string.
explanation_0_strength : the amount this feature’s value affected the prediction
…
explanation_N_feature : the name of the feature contributing to the prediction for this explanation
explanation_N_feature_value : the value the feature took on
explanation_N_label : the output being driven by this explanation. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.
explanation_N_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this explanation
explanation_N_per_ngram_text_explanations : Text prediction explanations data in json formatted string.
explanation_N_strength : the amount this feature’s value affected the prediction
For classification projects, the server does not guarantee any ordering on the prediction values, however within this function we sort the values so that class_X corresponds to the same class from row to row.
- Parameters:
exclude_adjusted_predictions (
bool
) – Optional, defaults to True. Set this to False to include adjusted prediction values in the returned dataframe.- Returns:
dataframe
- Return type:
pandas.DataFrame
- download_to_csv(filename, encoding='utf-8', exclude_adjusted_predictions=True)
Save prediction explanations rows into CSV file.
- Parameters:
filename (
str
orfile object
) – path or file object to save prediction explanations rowsencoding (
string
, optional) – A string representing the encoding to use in the output file, defaults to ‘utf-8’exclude_adjusted_predictions (
bool
) – Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
- get_prediction_explanations_page(limit=None, offset=None, exclude_adjusted_predictions=True)
Get prediction explanations.
If you don’t want use a generator interface, you can access paginated prediction explanations directly.
- Parameters:
limit (
int
orNone
) – the number of records to return, the server will use a (possibly finite) default if not specifiedoffset (
int
orNone
) – the number of records to skip, default 0exclude_adjusted_predictions (
bool
) – Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
- Returns:
prediction_explanations
- Return type:
PredictionExplanationsPage
- delete()
Delete these prediction explanations.
- class datarobot.models.prediction_explanations.PredictionExplanationsRow
Represents prediction explanations computed for a prediction row.
Notes
PredictionValue
contains:label
: describes what this model output corresponds to. For regression projects, it is the name of the target feature. For classification projects, it is a level from the target feature.value
: the output of the prediction. For regression projects, it is the predicted value of the target. For classification projects, it is the predicted probability the row belongs to the class identified by the label.
PredictionExplanation
contains:label
: described what output was driven by this explanation. For regression projects, it is the name of the target feature. For classification projects, it is the class whose probability increasing would correspond to a positive strength of this prediction explanation.feature
: the name of the feature contributing to the predictionfeature_value
: the value the feature took on for this rowstrength
: the amount this feature’s value affected the predictionqualitative_strength
: a human-readable description of how strongly the feature affected the prediction. A large positive effect is denoted ‘+++’, medium ‘++’, small ‘+’, very small ‘<+’. A large negative effect is denoted ‘—’, medium ‘–’, small ‘-’, very small ‘<-‘.
- Variables:
row_id (
int
) – which row thisPredictionExplanationsRow
describesprediction (
float
) – the output of the model for this rowadjusted_prediction (
float
orNone
) – adjusted prediction value for projects that provide this information, None otherwiseprediction_values (
list
) – an array of dictionaries with a schema described asPredictionValue
adjusted_prediction_values (
list
) – same as prediction_values but for adjusted predictionsprediction_explanations (
list
) – an array of dictionaries with a schema described asPredictionExplanation
- class datarobot.models.prediction_explanations.PredictionExplanationsPage
Represents a batch of prediction explanations received by one request.
- Variables:
id (
str
) – id of the prediction explanations computation resultdata (
list[dict]
) – list of raw prediction explanations; each row corresponds to a row of the prediction datasetcount (
int
) – total number of rows computedprevious_page (
str
) – where to retrieve previous page of prediction explanations, None if current page is the firstnext_page (
str
) – where to retrieve next page of prediction explanations, None if current page is the lastprediction_explanations_record_location (
str
) – where to retrieve the prediction explanations metadataadjustment_method (
str
) – Adjustment method that was applied to predictions, or ‘N/A’ if no adjustments were done.
- classmethod get(project_id, prediction_explanations_id, limit=None, offset=0, exclude_adjusted_predictions=True)
Retrieve prediction explanations.
- Parameters:
project_id (
str
) – id of the project the model belongs toprediction_explanations_id (
str
) – id of the prediction explanationslimit (
int
orNone
) – the number of records to return; the server will use a (possibly finite) default if not specifiedoffset (
int
orNone
) – the number of records to skip, default 0exclude_adjusted_predictions (
bool
) – Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
- Returns:
prediction_explanations – The queried instance.
- Return type:
- class datarobot.models.ShapMatrix
Represents SHAP based prediction explanations and provides access to score values.
- Variables:
project_id (
str
) – id of the project the model belongs toshap_matrix_id (
str
) – id of the generated SHAP matrixmodel_id (
str
) – id of the model used todataset_id (
str
) – id of the prediction dataset SHAP values were computed for
Examples
import datarobot as dr # request SHAP matrix calculation shap_matrix_job = dr.ShapMatrix.create(project_id, model_id, dataset_id) shap_matrix = shap_matrix_job.get_result_when_complete() # list available SHAP matrices shap_matrices = dr.ShapMatrix.list(project_id) shap_matrix = shap_matrices[0] # get SHAP matrix as dataframe shap_matrix_values = shap_matrix.get_as_dataframe()
- classmethod create(cls, project_id, model_id, dataset_id)
Calculate SHAP based prediction explanations against previously uploaded dataset.
- Parameters:
project_id (
str
) – id of the project the model belongs tomodel_id (
str
) – id of the model for which prediction explanations are requesteddataset_id (
str
) – id of the prediction dataset for which prediction explanations are requested (as uploaded from Project.upload_dataset)
- Returns:
job – The job computing the SHAP based prediction explanations
- Return type:
- Raises:
ClientError – If the server responded with 4xx status. Possible reasons are project, model or dataset don’t exist, user is not allowed or model doesn’t support SHAP based prediction explanations
ServerError – If the server responded with 5xx status
- classmethod list(cls, project_id)
Fetch all the computed SHAP prediction explanations for a project.
- Parameters:
project_id (
str
) – id of the project- Returns:
A list of
ShapMatrix
objects- Return type:
List
ofShapMatrix
- Raises:
datarobot.errors.ClientError – if the server responded with 4xx status
datarobot.errors.ServerError – if the server responded with 5xx status
- classmethod get(cls, project_id, id)
Retrieve the specific SHAP matrix.
- Parameters:
project_id (
str
) – id of the project the model belongs toid (
str
) – id of the SHAP matrix
- Return type:
ShapMatrix
object representing specified record
- get_as_dataframe(read_timeout=60)
Retrieve SHAP matrix values as dataframe.
- Return type:
DataFrame
- Returns:
dataframe (
pandas.DataFrame
) – A dataframe with SHAP scoresread_timeout (
int (optional
, default60)
) – .. versionadded:: 2.29Wait this many seconds for the server to respond.
- Raises:
datarobot.errors.ClientError – if the server responded with 4xx status.
datarobot.errors.ServerError – if the server responded with 5xx status.
- class datarobot.models.ClassListMode
Calculate prediction explanations for the specified classes in each row.
- Variables:
class_names (
list
) – List of class names that will be explained for each dataset row.
- get_api_parameters(batch_route=False)
Get parameters passed in corresponding API call
- Parameters:
batch_route (
bool
) – Batch routes describe prediction calls with all possible parameters, so to distinguish explanation parameters from others they have prefix in parameters.- Return type:
dict
- class datarobot.models.TopPredictionsMode
Calculate prediction explanations for the number of top predicted classes in each row.
- Variables:
num_top_classes (
int
) – Number of top predicted classes [1..10] that will be explained for each dataset row.
- get_api_parameters(batch_route=False)
Get parameters passed in corresponding API call
- Parameters:
batch_route (
bool
) – Batch routes describe prediction calls with all possible parameters, so to distinguish explanation parameters from others they have prefix in parameters.- Return type:
dict
Rating table
- class datarobot.models.RatingTable
Interface to modify and download rating tables.
- Variables:
id (
str
) – The id of the rating table.project_id (
str
) – The id of the project this rating table belongs to.rating_table_name (
str
) – The name of the rating table.original_filename (
str
) – The name of the file used to create the rating table.parent_model_id (
str
) – The model id of the model the rating table was validated against.model_id (
str
) – The model id of the model that was created from the rating table. Can be None if a model has not been created from the rating table.model_job_id (
str
) – The id of the job to create a model from this rating table. Can be None if a model has not been created from the rating table.validation_job_id (
str
) – The id of the created job to validate the rating table. Can be None if the rating table has not been validated.validation_error (
str
) – Contains a description of any errors caused during validation.
- classmethod from_server_data(data, should_warn=True, keep_attrs=None)
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters:
data (
dict
) – The directly translated dict of JSON from the server. No casing fixes have taken placeshould_warn (
bool
) – Whether or not to issue a warning if an invalid rating table is being retrieved.
- Return type:
- classmethod get(project_id, rating_table_id)
Retrieve a single rating table
- Parameters:
project_id (
str
) – The ID of the project the rating table is associated with.rating_table_id (
str
) – The ID of the rating table
- Returns:
rating_table – The queried instance
- Return type:
- classmethod create(project_id, parent_model_id, filename, rating_table_name='Uploaded Rating Table')
Uploads and validates a new rating table CSV
- Parameters:
project_id (
str
) – id of the project the rating table belongs toparent_model_id (
str
) – id of the model for which this rating table should be validated againstfilename (
str
) – The path of the CSV file containing the modified rating table.rating_table_name (
Optional[str]
) – A human friendly name for the new rating table. The string may be truncated and a suffix may be added to maintain unique names of all rating tables.
- Returns:
job – an instance of created async job
- Return type:
- Raises:
InputNotUnderstoodError – Raised if filename isn’t one of supported types.
ClientError – Raised if parent_model_id is invalid.
- download(filepath)
Download a csv file containing the contents of this rating table
- Parameters:
filepath (
str
) – The path at which to save the rating table file.- Return type:
None
- rename(rating_table_name)
Renames a rating table to a different name.
- Parameters:
rating_table_name (
str
) – The new name to rename the rating table to.- Return type:
None
- create_model()
Creates a new model from this rating table record. This rating table must not already be associated with a model and must be valid.
- Returns:
job – an instance of created async job
- Return type:
- Raises:
ClientError – Raised if creating model from a RatingTable that failed validation
JobAlreadyRequested – Raised if creating model from a RatingTable that is already associated with a RatingTableModel
ROC curve
- class datarobot.models.roc_curve.RocCurve
ROC curve data for model.
- Variables:
source (
str
) – ROC curve data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.roc_points (
list
ofdict
) – List of precalculated metrics associated with thresholds for ROC curve.negative_class_predictions (
list
offloat
) – List of predictions from example for negative classpositive_class_predictions (
list
offloat
) – List of predictions from example for positive classsource_model_id (
str
) – ID of the model this ROC curve represents; in some cases, insights from the parent of a frozen model may be useddata_slice_id (
str
) – ID of the data slice this ROC curve represents.
- classmethod from_server_data(data, keep_attrs=None, use_insights_format=False, **kwargs)
Overwrite APIObject.from_server_data to handle roc curve data retrieved from either legacy URL or /insights/ new URL.
- Parameters:
data (
dict
) – The directly translated dict of JSON from the server. No casing fixes have taken place.keep_attrs (
iterable
) – List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are Noneuse_insights_format (
Optional[bool]
) – Whether to repack the data from the format used in the GET /insights/RocCur/ URL to the format used in the legacy URL.
- Return type:
- class datarobot.models.roc_curve.LabelwiseRocCurve
Labelwise ROC curve data for one label and one source.
- Variables:
source (
str
) – ROC curve data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.roc_points (
list
ofdict
) – List of precalculated metrics associated with thresholds for ROC curve.negative_class_predictions (
list
offloat
) – List of predictions from example for negative classpositive_class_predictions (
list
offloat
) – List of predictions from example for positive classsource_model_id (
str
) – ID of the model this ROC curve represents; in some cases, insights from the parent of a frozen model may be usedlabel (
str
) – Label name forkolmogorov_smirnov_metric (
float
) – Kolmogorov-Smirnov metric value for labelauc (
float
) – AUC metric value for label
Word Cloud
- class datarobot.models.word_cloud.WordCloud
Word cloud data for the model.
Notes
WordCloudNgram
is a dict containing the following:ngram
(str) Word or ngram value.coefficient
(float) Value from [-1.0, 1.0] range, describes effect of this ngram on the target. Large negative value means strong effect toward negative class in classification and smaller target value in regression models. Large positive - toward positive class and bigger value respectively.count
(int) Number of rows in the training sample where this ngram appears.frequency
(float) Value from (0.0, 1.0] range, relative frequency of given ngram to most frequent ngram.is_stopword
(bool) True for ngrams that DataRobot evaluates as stopwords.class
(str or None) For classification - values of the target class for corresponding word or ngram. For regression - None.
- Variables:
ngrams (
list
ofdict
) – List of dicts with schema described asWordCloudNgram
above.
- most_frequent(top_n=5)
Return most frequent ngrams in the word cloud.
- Parameters:
top_n (
int
) – Number of ngrams to return- Returns:
Up to top_n top most frequent ngrams in the word cloud. If top_n bigger then total number of ngrams in word cloud - return all sorted by frequency in descending order.
- Return type:
list
ofdict
- most_important(top_n=5)
Return most important ngrams in the word cloud.
- Parameters:
top_n (
int
) – Number of ngrams to return- Returns:
Up to top_n top most important ngrams in the word cloud. If top_n bigger then total number of ngrams in word cloud - return all sorted by absolute coefficient value in descending order.
- Return type:
list
ofdict
- ngrams_per_class()
Split ngrams per target class values. Useful for multiclass models.
- Returns:
Dictionary in the format of (class label) -> (list of ngrams for that class)
- Return type:
dict
- class datarobot.models.word_cloud.WordCloudNgram