Anomaly assessment
- class datarobot.models.anomaly_assessment.AnomalyAssessmentRecord
Object which keeps metadata about anomaly assessment insight for the particular subset, backtest and series and the links to proceed to get the anomaly assessment data.
Added in version v2.25.
- Variables:
record_id (
str
) – The ID of the record.project_id (
str
) – The ID of the project record belongs to.model_id (
str
) – The ID of the model record belongs to.backtest (
int
or"holdout"
) – The backtest of the record.source (
"training"
or"validation"
) – The source of the recordseries_id (
str
orNone
) – The series id of the record for the multiseries projects. Defined only for the multiseries projects.status (
str
) – The status of the insight. One ofdatarobot.enums.AnomalyAssessmentStatus
status_details (
str
) – The explanation of the status.start_date (
str
orNone
) – The ISO-formatted timestamp of the first prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.end_date (
str
orNone
) – The ISO-formatted timestamp of the last prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.prediction_threshold (
float
orNone
) – The threshold, all rows with anomaly scores greater or equal to it have shap explanations computed.preview_location (
str
orNone
) – The URL to retrieve predictions preview for the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.latest_explanations_location (
str
orNone
) – The URL to retrieve the latest predictions with the shap explanations. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.delete_location (
str
) – The URL to delete anomaly assessment record and relevant insight data.
- classmethod list(project_id, model_id, backtest=None, source=None, series_id=None, limit=100, offset=0, with_data_only=False)
Retrieve the list of the anomaly assessment records for the project and model. Output can be filtered and limited.
- Parameters:
project_id (
str
) – The ID of the project record belongs to.model_id (
str
) – The ID of the model record belongs to.backtest (
int
or"holdout"
) – The backtest to filter records by.source (
"training"
or"validation"
) – The source to filter records by.series_id (
Optional[str]
) – The series id to filter records by. Can be specified for multiseries projects.limit (
Optional[int]
) – 100 by default. At most this many results are returned.offset (
Optional[int]
) – This many results will be skipped.with_data_only (
bool
,False by default
) – Filter by status == AnomalyAssessmentStatus.COMPLETED. If True, records with no data or not supported will be omitted.
- Returns:
The anomaly assessment record.
- Return type:
- classmethod compute(project_id, model_id, backtest, source, series_id=None)
Request anomaly assessment insight computation on the specified subset.
- Parameters:
project_id (
str
) – The ID of the project to compute insight for.model_id (
str
) – The ID of the model to compute insight for.backtest (
int
or"holdout"
) – The backtest to compute insight for.source (
"training"
or"validation"
) – The source to compute insight for.series_id (
Optional[str]
) – The series id to compute insight for. Required for multiseries projects.
- Returns:
The anomaly assessment record.
- Return type:
- delete()
Delete anomaly assessment record with preview and explanations.
- Return type:
None
- get_predictions_preview()
Retrieve aggregated predictions statistics for the anomaly assessment record.
- Return type:
- get_latest_explanations()
Retrieve latest predictions along with shap explanations for the most anomalous records.
- Return type:
- get_explanations(start_date=None, end_date=None, points_count=None)
Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.
- Parameters:
start_date (
Optional[str]
) – The start of the date range to get explanations in. Example:2020-01-01T00:00:00.000000Z
end_date (
Optional[str]
) – The end of the date range to get explanations in. Example:2020-10-01T00:00:00.000000Z
points_count (
Optional[int]
) – The number of the rows to return.
- Return type:
- get_explanations_data_in_regions(regions, prediction_threshold=0.0)
Get predictions along with explanations for the specified regions, sorted by predictions in descending order.
- Parameters:
regions (
list
ofAnomalyAssessmentPreviewBin
) – For each region explanations will be retrieved and merged.prediction_threshold (
Optional[float]
) – If specified, only points with score greater or equal to the threshold will be returned.
- Returns:
dict in a form of {‘explanations’: explanations, ‘shap_base_value’: shap_base_value}
- Return type:
RegionExplanationsData
- class datarobot.models.anomaly_assessment.AnomalyAssessmentExplanations
Object which keeps predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points.
Added in version v2.25.
- Variables:
record_id (
str
) – The ID of the record.project_id (
str
) – The ID of the project record belongs to.model_id (
str
) – The ID of the model record belongs to.backtest (
int
or"holdout"
) – The backtest of the record.source (
"training"
or"validation"
) – The source of the record.series_id (
str
orNone
) – The series id of the record for the multiseries projects. Defined only for the multiseries projects.start_date (
str
orNone
) – The ISO-formatted datetime of the first row in thedata
. Will be None of there is no data in the specified range.end_date (
str
orNone
) – The ISO-formatted datetime of the last row in thedata
. Will be None of there is no data in the specified range.shap_base_value (
float
) – Shap base value.count (
int
) – The number of points indata
.data (
array
of DataPoint objects orNone
) – The list of DataPoint objects in the specified date range.
Notes
DataPoint
contains:shap_explanation
: None or an array of up to 10 ShapleyFeatureContribution objects. Only rows with the highest anomaly scores have Shapley explanations calculated. Value is None if prediction is lower than prediction_threshold.timestamp
(str) : ISO-formatted timestamp for the row.prediction
(float) : The output of the model for this row.
ShapleyFeatureContribution
contains:feature_value
(str) : the feature value for this row. First 50 characters are returned.strength
(float) : the shap value for this feature and row.feature
(str) : the feature name.
- classmethod get(project_id, record_id, start_date=None, end_date=None, points_count=None)
Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.
- Parameters:
project_id (
str
) – The ID of the project.record_id (
str
) – The ID of the anomaly assessment record.start_date (
Optional[str]
) – The start of the date range to get explanations in. Example:2020-01-01T00:00:00.000000Z
end_date (
Optional[str]
) – The end of the date range to get explanations in. Example:2020-10-01T00:00:00.000000Z
points_count (
Optional[int]
) – The number of the rows to return.
- Return type:
- class datarobot.models.anomaly_assessment.AnomalyAssessmentPredictionsPreview
Aggregated predictions over time for the corresponding anomaly assessment record. Intended to find the bins with highest anomaly scores.
Added in version v2.25.
- Variables:
record_id (
str
) – The ID of the record.project_id (
str
) – The ID of the project record belongs to.model_id (
str
) – The ID of the model record belongs to.backtest (
int
or"holdout"
) – The backtest of the record.source (
"training"
or"validation"
) – The source of the recordseries_id (
str
orNone
) – The series id of the record for the multiseries projects. Defined only for the multiseries projects.start_date (
str
) – the ISO-formatted timestamp of the first prediction in the subset.end_date (
str
) – the ISO-formatted timestamp of the last prediction in the subset.preview_bins (
list
ofpreview_bin objects.
) – The aggregated predictions for the subset. Bins boundaries may differ from actual start/end dates because this is an aggregation.
Notes
PreviewBin
contains:start_date
(str) : the ISO-formatted datetime of the start of the bin.end_date
(str) : the ISO-formatted datetime of the end of the bin.avg_predicted
(float or None) : the average prediction of the model in the bin. None if there are no entries in the bin.max_predicted
(float or None) : the maximum prediction of the model in the bin. None if there are no entries in the bin.frequency
(int) : the number of the rows in the bin.
- classmethod get(project_id, record_id)
Retrieve aggregated predictions over time.
- Parameters:
project_id (
str
) – The ID of the project.record_id (
str
) – The ID of the anomaly assessment record.
- Return type:
- find_anomalous_regions(max_prediction_threshold=0.0)
- Sort preview bins by max_predicted value and select those with max predicted value
greater or equal to max prediction threshold. Sort the result by max predicted value in descending order.
- Parameters:
max_prediction_threshold (
Optional[float]
) – Return bins with maximum anomaly score greater or equal to max_prediction_threshold.- Returns:
preview_bins – Filtered and sorted preview bins
- Return type:
list
ofpreview_bin