Anomaly assessment

class datarobot.models.anomaly_assessment.AnomalyAssessmentRecord

Object which keeps metadata about anomaly assessment insight for the particular subset, backtest and series and the links to proceed to get the anomaly assessment data.

Added in version v2.25.

Variables:
  • record_id (str) – The ID of the record.

  • project_id (str) – The ID of the project record belongs to.

  • model_id (str) – The ID of the model record belongs to.

  • backtest (int or "holdout") – The backtest of the record.

  • source ("training" or "validation") – The source of the record

  • series_id (str or None) – The series id of the record for the multiseries projects. Defined only for the multiseries projects.

  • status (str) – The status of the insight. One of datarobot.enums.AnomalyAssessmentStatus

  • status_details (str) – The explanation of the status.

  • start_date (str or None) – The ISO-formatted timestamp of the first prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.

  • end_date (str or None) – The ISO-formatted timestamp of the last prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.

  • prediction_threshold (float or None) – The threshold, all rows with anomaly scores greater or equal to it have shap explanations computed.

  • preview_location (str or None) – The URL to retrieve predictions preview for the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.

  • latest_explanations_location (str or None) – The URL to retrieve the latest predictions with the shap explanations. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.

  • delete_location (str) – The URL to delete anomaly assessment record and relevant insight data.

classmethod list(project_id, model_id, backtest=None, source=None, series_id=None, limit=100, offset=0, with_data_only=False)

Retrieve the list of the anomaly assessment records for the project and model. Output can be filtered and limited.

Parameters:
  • project_id (str) – The ID of the project record belongs to.

  • model_id (str) – The ID of the model record belongs to.

  • backtest (int or "holdout") – The backtest to filter records by.

  • source ("training" or "validation") – The source to filter records by.

  • series_id (Optional[str]) – The series id to filter records by. Can be specified for multiseries projects.

  • limit (Optional[int]) – 100 by default. At most this many results are returned.

  • offset (Optional[int]) – This many results will be skipped.

  • with_data_only (bool, False by default) – Filter by status == AnomalyAssessmentStatus.COMPLETED. If True, records with no data or not supported will be omitted.

Returns:

The anomaly assessment record.

Return type:

AnomalyAssessmentRecord

classmethod compute(project_id, model_id, backtest, source, series_id=None)

Request anomaly assessment insight computation on the specified subset.

Parameters:
  • project_id (str) – The ID of the project to compute insight for.

  • model_id (str) – The ID of the model to compute insight for.

  • backtest (int or "holdout") – The backtest to compute insight for.

  • source ("training" or "validation") – The source to compute insight for.

  • series_id (Optional[str]) – The series id to compute insight for. Required for multiseries projects.

Returns:

The anomaly assessment record.

Return type:

AnomalyAssessmentRecord

delete()

Delete anomaly assessment record with preview and explanations.

Return type:

None

get_predictions_preview()

Retrieve aggregated predictions statistics for the anomaly assessment record.

Return type:

AnomalyAssessmentPredictionsPreview

get_latest_explanations()

Retrieve latest predictions along with shap explanations for the most anomalous records.

Return type:

AnomalyAssessmentExplanations

get_explanations(start_date=None, end_date=None, points_count=None)

Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.

Parameters:
  • start_date (Optional[str]) – The start of the date range to get explanations in. Example: 2020-01-01T00:00:00.000000Z

  • end_date (Optional[str]) – The end of the date range to get explanations in. Example: 2020-10-01T00:00:00.000000Z

  • points_count (Optional[int]) – The number of the rows to return.

Return type:

AnomalyAssessmentExplanations

get_explanations_data_in_regions(regions, prediction_threshold=0.0)

Get predictions along with explanations for the specified regions, sorted by predictions in descending order.

Parameters:
  • regions (list of AnomalyAssessmentPreviewBin) – For each region explanations will be retrieved and merged.

  • prediction_threshold (Optional[float]) – If specified, only points with score greater or equal to the threshold will be returned.

Returns:

dict in a form of {‘explanations’: explanations, ‘shap_base_value’: shap_base_value}

Return type:

RegionExplanationsData

class datarobot.models.anomaly_assessment.AnomalyAssessmentExplanations

Object which keeps predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points.

Added in version v2.25.

Variables:
  • record_id (str) – The ID of the record.

  • project_id (str) – The ID of the project record belongs to.

  • model_id (str) – The ID of the model record belongs to.

  • backtest (int or "holdout") – The backtest of the record.

  • source ("training" or "validation") – The source of the record.

  • series_id (str or None) – The series id of the record for the multiseries projects. Defined only for the multiseries projects.

  • start_date (str or None) – The ISO-formatted datetime of the first row in the data. Will be None of there is no data in the specified range.

  • end_date (str or None) – The ISO-formatted datetime of the last row in the data. Will be None of there is no data in the specified range.

  • shap_base_value (float) – Shap base value.

  • count (int) – The number of points in data.

  • data (array of DataPoint objects or None) – The list of DataPoint objects in the specified date range.

Notes

DataPoint contains:

  • shap_explanation : None or an array of up to 10 ShapleyFeatureContribution objects. Only rows with the highest anomaly scores have Shapley explanations calculated. Value is None if prediction is lower than prediction_threshold.

  • timestamp (str) : ISO-formatted timestamp for the row.

  • prediction (float) : The output of the model for this row.

ShapleyFeatureContribution contains:

  • feature_value (str) : the feature value for this row. First 50 characters are returned.

  • strength (float) : the shap value for this feature and row.

  • feature (str) : the feature name.

classmethod get(project_id, record_id, start_date=None, end_date=None, points_count=None)

Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.

Parameters:
  • project_id (str) – The ID of the project.

  • record_id (str) – The ID of the anomaly assessment record.

  • start_date (Optional[str]) – The start of the date range to get explanations in. Example: 2020-01-01T00:00:00.000000Z

  • end_date (Optional[str]) – The end of the date range to get explanations in. Example: 2020-10-01T00:00:00.000000Z

  • points_count (Optional[int]) – The number of the rows to return.

Return type:

AnomalyAssessmentExplanations

class datarobot.models.anomaly_assessment.AnomalyAssessmentPredictionsPreview

Aggregated predictions over time for the corresponding anomaly assessment record. Intended to find the bins with highest anomaly scores.

Added in version v2.25.

Variables:
  • record_id (str) – The ID of the record.

  • project_id (str) – The ID of the project record belongs to.

  • model_id (str) – The ID of the model record belongs to.

  • backtest (int or "holdout") – The backtest of the record.

  • source ("training" or "validation") – The source of the record

  • series_id (str or None) – The series id of the record for the multiseries projects. Defined only for the multiseries projects.

  • start_date (str) – the ISO-formatted timestamp of the first prediction in the subset.

  • end_date (str) – the ISO-formatted timestamp of the last prediction in the subset.

  • preview_bins (list of preview_bin objects.) – The aggregated predictions for the subset. Bins boundaries may differ from actual start/end dates because this is an aggregation.

Notes

PreviewBin contains:

  • start_date (str) : the ISO-formatted datetime of the start of the bin.

  • end_date (str) : the ISO-formatted datetime of the end of the bin.

  • avg_predicted (float or None) : the average prediction of the model in the bin. None if there are no entries in the bin.

  • max_predicted (float or None) : the maximum prediction of the model in the bin. None if there are no entries in the bin.

  • frequency (int) : the number of the rows in the bin.

classmethod get(project_id, record_id)

Retrieve aggregated predictions over time.

Parameters:
  • project_id (str) – The ID of the project.

  • record_id (str) – The ID of the anomaly assessment record.

Return type:

AnomalyAssessmentPredictionsPreview

find_anomalous_regions(max_prediction_threshold=0.0)
Sort preview bins by max_predicted value and select those with max predicted value

greater or equal to max prediction threshold. Sort the result by max predicted value in descending order.

Parameters:

max_prediction_threshold (Optional[float]) – Return bins with maximum anomaly score greater or equal to max_prediction_threshold.

Returns:

preview_bins – Filtered and sorted preview bins

Return type:

list of preview_bin