Insights

class datarobot.insights.ShapMatrix

Class for SHAP Matrix calculations. Use the standard methods of BaseInsight to compute and retrieve: compute, create, list, get.

property matrix: Any

SHAP matrix values.

property base_value: float

SHAP base value for the matrix values

property columns: List[str]

List of columns associated with the SHAP matrix

Link function used to generate the SHAP matrix

classmethod compute(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, **kwargs)

Submit an insight compute request. You can use create if you want to wait synchronously for the completion of the job. May be overridden by insight subclasses to accept additional parameters.

Parameters:
  • entity_id (str) – The ID of the entity to compute the insight.

  • source (str) – The source type to use when computing the insight.

  • data_slice_id (Optional[str]) – Data slice ID to use when computing the insight.

  • external_dataset_id (Optional[str]) – External dataset ID to use when computing the insight.

  • entity_type (Optional[ENTITY_TYPES]) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”.

  • quick_compute (Optional[bool]) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.

Returns:

Status check job entity for the asynchronous insight calculation.

Return type:

StatusCheckJob

classmethod create(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, max_wait=600, **kwargs)

Create an insight and wait for completion. May be overridden by insight subclasses to accept additional parameters.

Parameters:
  • entity_id (str) – The ID of the entity to compute the insight.

  • source (str) – The source type to use when computing the insight.

  • data_slice_id (Optional[str]) – Data slice ID to use when computing the insight.

  • external_dataset_id (Optional[str]) – External dataset ID to use when computing the insight.

  • entity_type (Optional[ENTITY_TYPES]) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”.

  • quick_compute (Optional[bool]) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.

  • max_wait (int) – The number of seconds to wait for the result.

Returns:

Entity of the newly or already computed insights.

Return type:

Self

classmethod from_data(data)

Instantiate an object of this class using a dict.

Parameters:

data (dict) – Correctly snake_cased keys and their values.

Return type:

TypeVar(T, bound= APIObject)

classmethod from_server_data(data, keep_attrs=None)

Override from_server_data to handle paginated responses

Return type:

Self

classmethod get(entity_id, source=INSIGHTS_SOURCES.VALIDATION, quick_compute=None, **kwargs)

Return the first matching insight based on the entity id and kwargs.

Parameters:
  • entity_id (str) – The ID of the entity to retrieve generated insights.

  • source (str) – The source type to use when retrieving the insight.

  • quick_compute (Optional[bool]) – Sets whether to retrieve the insight that was computed using quick-compute. If not specified, quick_compute is not used for matching.

Returns:

Previously computed insight.

Return type:

Self

classmethod get_as_csv(entity_id, **kwargs)

Retrieve a specific insight represented in CSV format.

Parameters:
  • entity_id (str) – ID of the entity to retrieve the insight.

  • **kwargs (Any) – Additional keyword arguments to pass to the retrieve function.

Returns:

The retrieved insight.

Return type:

str

classmethod get_as_dataframe(entity_id, **kwargs)

Retrieve a specific insight represented as a pandas DataFrame.

Parameters:
  • entity_id (str) – ID of the entity to retrieve the insight.

  • **kwargs (Any) – Additional keyword arguments to pass to the retrieve function.

Returns:

The retrieved insight.

Return type:

DataFrame

get_uri()

This should define the URI to their browser based interactions

Return type:

str

classmethod list(entity_id)

List all generated insights.

Parameters:

entity_id (str) – The ID of the entity queried for listing all generated insights.

Returns:

List of newly or previously computed insights.

Return type:

List[Self]

open_in_browser()

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type:

None

sort(key_name)

Sorts insights data

Return type:

None

class datarobot.insights.ShapPreview

Class for SHAP Preview calculations. Use the standard methods of BaseInsight to compute and retrieve: compute, create, list, get.

property previews: List[Dict[str, Any]]

SHAP preview values.

Returns:

preview – A list of the ShapPreview values for each row.

Return type:

List[Dict[str, Any]]

property previews_count: int

The number of shap preview rows.

Return type:

int

classmethod get(entity_id, source=INSIGHTS_SOURCES.VALIDATION, quick_compute=None, prediction_filter_row_count=None, prediction_filter_percentiles=None, prediction_filter_operand_first=None, prediction_filter_operand_second=None, prediction_filter_operator=None, feature_filter_count=None, feature_filter_name=None, **kwargs)

Return the first matching ShapPreview insight based on the entity id and kwargs.

Parameters:
  • entity_id (str) – The ID of the entity to retrieve generated insights.

  • source (str) – The source type to use when retrieving the insight.

  • quick_compute (Optional[bool]) – Sets whether to retrieve the insight that was computed using quick-compute. If not specified, quick_compute is not used for matching.

  • prediction_filter_row_count (Optional[int]) – The maximum number of preview rows to return.

  • prediction_filter_percentiles (Optional[int]) – The number of percentile intervals to select from the total number of rows. This field will supersede predictionFilterRowCount if both are present.

  • prediction_filter_operand_first (Optional[float]) – The first operand to apply to filtered predictions.

  • prediction_filter_operand_second (Optional[float]) – The second operand to apply to filtered predictions.

  • prediction_filter_operator (Optional[str]) – The operator to apply to filtered predictions.

  • feature_filter_count (Optional[int]) – The maximum number of features to return for each preview.

  • feature_filter_name (Optional[str]) – The names of specific features to return for each preview.

Returns:

List of newly or already computed insights.

Return type:

List[Any]

classmethod compute(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, **kwargs)

Submit an insight compute request. You can use create if you want to wait synchronously for the completion of the job. May be overridden by insight subclasses to accept additional parameters.

Parameters:
  • entity_id (str) – The ID of the entity to compute the insight.

  • source (str) – The source type to use when computing the insight.

  • data_slice_id (Optional[str]) – Data slice ID to use when computing the insight.

  • external_dataset_id (Optional[str]) – External dataset ID to use when computing the insight.

  • entity_type (Optional[ENTITY_TYPES]) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”.

  • quick_compute (Optional[bool]) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.

Returns:

Status check job entity for the asynchronous insight calculation.

Return type:

StatusCheckJob

classmethod create(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, max_wait=600, **kwargs)

Create an insight and wait for completion. May be overridden by insight subclasses to accept additional parameters.

Parameters:
  • entity_id (str) – The ID of the entity to compute the insight.

  • source (str) – The source type to use when computing the insight.

  • data_slice_id (Optional[str]) – Data slice ID to use when computing the insight.

  • external_dataset_id (Optional[str]) – External dataset ID to use when computing the insight.

  • entity_type (Optional[ENTITY_TYPES]) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”.

  • quick_compute (Optional[bool]) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.

  • max_wait (int) – The number of seconds to wait for the result.

Returns:

Entity of the newly or already computed insights.

Return type:

Self

classmethod from_data(data)

Instantiate an object of this class using a dict.

Parameters:

data (dict) – Correctly snake_cased keys and their values.

Return type:

TypeVar(T, bound= APIObject)

classmethod from_server_data(data, keep_attrs=None)

Override from_server_data to handle paginated responses

Return type:

Self

get_uri()

This should define the URI to their browser based interactions

Return type:

str

classmethod list(entity_id)

List all generated insights.

Parameters:

entity_id (str) – The ID of the entity queried for listing all generated insights.

Returns:

List of newly or previously computed insights.

Return type:

List[Self]

open_in_browser()

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type:

None

sort(key_name)

Sorts insights data

Return type:

None

class datarobot.insights.ShapImpact

Class for SHAP Impact calculations. Use the standard methods of BaseInsight to compute and retrieve: compute, create, list, get.

sort(key_name='-impact_normalized')

Sorts insights data by key name.

Parameters:

key_name (str) – item key name to sort data. One of ‘feature_name’, ‘impact_normalized’ or ‘impact_unnormalized’. Starting with ‘-’ reverses sort order. Default ‘-impact_normalized’

Return type:

None

property shap_impacts: List[List[Any]]

SHAP impact values

Returns:

A list of the SHAP impact values

Return type:

shap impacts

property base_value: List[float]

A list of base prediction values

property capping: Dict[str, Any] | None

Capping for the models in the blender

Shared link function of the models in the blender

property row_count: int | None

Number of SHAP impact rows. This is deprecated.

classmethod compute(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, **kwargs)

Submit an insight compute request. You can use create if you want to wait synchronously for the completion of the job. May be overridden by insight subclasses to accept additional parameters.

Parameters:
  • entity_id (str) – The ID of the entity to compute the insight.

  • source (str) – The source type to use when computing the insight.

  • data_slice_id (Optional[str]) – Data slice ID to use when computing the insight.

  • external_dataset_id (Optional[str]) – External dataset ID to use when computing the insight.

  • entity_type (Optional[ENTITY_TYPES]) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”.

  • quick_compute (Optional[bool]) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.

Returns:

Status check job entity for the asynchronous insight calculation.

Return type:

StatusCheckJob

classmethod create(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, max_wait=600, **kwargs)

Create an insight and wait for completion. May be overridden by insight subclasses to accept additional parameters.

Parameters:
  • entity_id (str) – The ID of the entity to compute the insight.

  • source (str) – The source type to use when computing the insight.

  • data_slice_id (Optional[str]) – Data slice ID to use when computing the insight.

  • external_dataset_id (Optional[str]) – External dataset ID to use when computing the insight.

  • entity_type (Optional[ENTITY_TYPES]) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”.

  • quick_compute (Optional[bool]) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.

  • max_wait (int) – The number of seconds to wait for the result.

Returns:

Entity of the newly or already computed insights.

Return type:

Self

classmethod from_data(data)

Instantiate an object of this class using a dict.

Parameters:

data (dict) – Correctly snake_cased keys and their values.

Return type:

TypeVar(T, bound= APIObject)

classmethod from_server_data(data, keep_attrs=None)

Override from_server_data to handle paginated responses

Return type:

Self

classmethod get(entity_id, source=INSIGHTS_SOURCES.VALIDATION, quick_compute=None, **kwargs)

Return the first matching insight based on the entity id and kwargs.

Parameters:
  • entity_id (str) – The ID of the entity to retrieve generated insights.

  • source (str) – The source type to use when retrieving the insight.

  • quick_compute (Optional[bool]) – Sets whether to retrieve the insight that was computed using quick-compute. If not specified, quick_compute is not used for matching.

Returns:

Previously computed insight.

Return type:

Self

get_uri()

This should define the URI to their browser based interactions

Return type:

str

classmethod list(entity_id)

List all generated insights.

Parameters:

entity_id (str) – The ID of the entity queried for listing all generated insights.

Returns:

List of newly or previously computed insights.

Return type:

List[Self]

open_in_browser()

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type:

None

class datarobot.insights.ShapDistributions

Class for SHAP Distributions calculations. Use the standard methods of BaseInsight to compute and retrieve: compute, create, list, get.

property features: List[Dict[str, Any]]

SHAP feature values

Returns:

features – A list of the ShapDistributions values for each row

Return type:

List[Dict[str, Any]]

property total_features_count: int

Number of shap distributions features

Return type:

int

classmethod compute(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, **kwargs)

Submit an insight compute request. You can use create if you want to wait synchronously for the completion of the job. May be overridden by insight subclasses to accept additional parameters.

Parameters:
  • entity_id (str) – The ID of the entity to compute the insight.

  • source (str) – The source type to use when computing the insight.

  • data_slice_id (Optional[str]) – Data slice ID to use when computing the insight.

  • external_dataset_id (Optional[str]) – External dataset ID to use when computing the insight.

  • entity_type (Optional[ENTITY_TYPES]) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”.

  • quick_compute (Optional[bool]) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.

Returns:

Status check job entity for the asynchronous insight calculation.

Return type:

StatusCheckJob

classmethod create(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, quick_compute=None, max_wait=600, **kwargs)

Create an insight and wait for completion. May be overridden by insight subclasses to accept additional parameters.

Parameters:
  • entity_id (str) – The ID of the entity to compute the insight.

  • source (str) – The source type to use when computing the insight.

  • data_slice_id (Optional[str]) – Data slice ID to use when computing the insight.

  • external_dataset_id (Optional[str]) – External dataset ID to use when computing the insight.

  • entity_type (Optional[ENTITY_TYPES]) – The type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”.

  • quick_compute (Optional[bool]) – Sets whether to use quick-compute for the insight. If True or unspecified, the insight is computed using a 2500-row data sample. If False, the insight is computed using all rows in the chosen source.

  • max_wait (int) – The number of seconds to wait for the result.

Returns:

Entity of the newly or already computed insights.

Return type:

Self

classmethod from_data(data)

Instantiate an object of this class using a dict.

Parameters:

data (dict) – Correctly snake_cased keys and their values.

Return type:

TypeVar(T, bound= APIObject)

classmethod from_server_data(data, keep_attrs=None)

Override from_server_data to handle paginated responses

Return type:

Self

classmethod get(entity_id, source=INSIGHTS_SOURCES.VALIDATION, quick_compute=None, **kwargs)

Return the first matching insight based on the entity id and kwargs.

Parameters:
  • entity_id (str) – The ID of the entity to retrieve generated insights.

  • source (str) – The source type to use when retrieving the insight.

  • quick_compute (Optional[bool]) – Sets whether to retrieve the insight that was computed using quick-compute. If not specified, quick_compute is not used for matching.

Returns:

Previously computed insight.

Return type:

Self

get_uri()

This should define the URI to their browser based interactions

Return type:

str

classmethod list(entity_id)

List all generated insights.

Parameters:

entity_id (str) – The ID of the entity queried for listing all generated insights.

Returns:

List of newly or previously computed insights.

Return type:

List[Self]

open_in_browser()

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type:

None

sort(key_name)

Sorts insights data

Return type:

None

Types

class datarobot.models.RocCurveEstimatedMetric

Typed dict for estimated metric

class datarobot.models.AnomalyAssessmentRecordMetadata

Typed dict for record metadata

class datarobot.models.AnomalyAssessmentPreviewBin

Typed dict for preview bin

class datarobot.models.ShapleyFeatureContribution

Typed dict for shapley feature contribution

class datarobot.models.AnomalyAssessmentDataPoint

Typed dict for data points

class datarobot.models.RegionExplanationsData

Typed dict for region explanations

Anomaly assessment

class datarobot.models.anomaly_assessment.AnomalyAssessmentRecord

Object which keeps metadata about anomaly assessment insight for the particular subset, backtest and series and the links to proceed to get the anomaly assessment data.

Added in version v2.25.

Variables:
  • record_id (str) – The ID of the record.

  • project_id (str) – The ID of the project record belongs to.

  • model_id (str) – The ID of the model record belongs to.

  • backtest (int or "holdout") – The backtest of the record.

  • source ("training" or "validation") – The source of the record

  • series_id (str or None) – The series id of the record for the multiseries projects. Defined only for the multiseries projects.

  • status (str) – The status of the insight. One of datarobot.enums.AnomalyAssessmentStatus

  • status_details (str) – The explanation of the status.

  • start_date (str or None) – The ISO-formatted timestamp of the first prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.

  • end_date (str or None) – The ISO-formatted timestamp of the last prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.

  • prediction_threshold (float or None) – The threshold, all rows with anomaly scores greater or equal to it have shap explanations computed.

  • preview_location (str or None) – The URL to retrieve predictions preview for the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.

  • latest_explanations_location (str or None) – The URL to retrieve the latest predictions with the shap explanations. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.

  • delete_location (str) – The URL to delete anomaly assessment record and relevant insight data.

classmethod list(project_id, model_id, backtest=None, source=None, series_id=None, limit=100, offset=0, with_data_only=False)

Retrieve the list of the anomaly assessment records for the project and model. Output can be filtered and limited.

Parameters:
  • project_id (str) – The ID of the project record belongs to.

  • model_id (str) – The ID of the model record belongs to.

  • backtest (int or "holdout") – The backtest to filter records by.

  • source ("training" or "validation") – The source to filter records by.

  • series_id (Optional[str]) – The series id to filter records by. Can be specified for multiseries projects.

  • limit (Optional[int]) – 100 by default. At most this many results are returned.

  • offset (Optional[int]) – This many results will be skipped.

  • with_data_only (bool, False by default) – Filter by status == AnomalyAssessmentStatus.COMPLETED. If True, records with no data or not supported will be omitted.

Returns:

The anomaly assessment record.

Return type:

AnomalyAssessmentRecord

classmethod compute(project_id, model_id, backtest, source, series_id=None)

Request anomaly assessment insight computation on the specified subset.

Parameters:
  • project_id (str) – The ID of the project to compute insight for.

  • model_id (str) – The ID of the model to compute insight for.

  • backtest (int or "holdout") – The backtest to compute insight for.

  • source ("training" or "validation") – The source to compute insight for.

  • series_id (Optional[str]) – The series id to compute insight for. Required for multiseries projects.

Returns:

The anomaly assessment record.

Return type:

AnomalyAssessmentRecord

delete()

Delete anomaly assessment record with preview and explanations.

Return type:

None

get_predictions_preview()

Retrieve aggregated predictions statistics for the anomaly assessment record.

Return type:

AnomalyAssessmentPredictionsPreview

get_latest_explanations()

Retrieve latest predictions along with shap explanations for the most anomalous records.

Return type:

AnomalyAssessmentExplanations

get_explanations(start_date=None, end_date=None, points_count=None)

Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.

Parameters:
  • start_date (Optional[str]) – The start of the date range to get explanations in. Example: 2020-01-01T00:00:00.000000Z

  • end_date (Optional[str]) – The end of the date range to get explanations in. Example: 2020-10-01T00:00:00.000000Z

  • points_count (Optional[int]) – The number of the rows to return.

Return type:

AnomalyAssessmentExplanations

get_explanations_data_in_regions(regions, prediction_threshold=0.0)

Get predictions along with explanations for the specified regions, sorted by predictions in descending order.

Parameters:
  • regions (list of AnomalyAssessmentPreviewBin) – For each region explanations will be retrieved and merged.

  • prediction_threshold (Optional[float]) – If specified, only points with score greater or equal to the threshold will be returned.

Returns:

dict in a form of {‘explanations’: explanations, ‘shap_base_value’: shap_base_value}

Return type:

RegionExplanationsData

class datarobot.models.anomaly_assessment.AnomalyAssessmentExplanations

Object which keeps predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points.

Added in version v2.25.

Variables:
  • record_id (str) – The ID of the record.

  • project_id (str) – The ID of the project record belongs to.

  • model_id (str) – The ID of the model record belongs to.

  • backtest (int or "holdout") – The backtest of the record.

  • source ("training" or "validation") – The source of the record.

  • series_id (str or None) – The series id of the record for the multiseries projects. Defined only for the multiseries projects.

  • start_date (str or None) – The ISO-formatted datetime of the first row in the data. Will be None of there is no data in the specified range.

  • end_date (str or None) – The ISO-formatted datetime of the last row in the data. Will be None of there is no data in the specified range.

  • shap_base_value (float) – Shap base value.

  • count (int) – The number of points in data.

  • data (array of DataPoint objects or None) – The list of DataPoint objects in the specified date range.

Notes

DataPoint contains:

  • shap_explanation : None or an array of up to 10 ShapleyFeatureContribution objects. Only rows with the highest anomaly scores have Shapley explanations calculated. Value is None if prediction is lower than prediction_threshold.

  • timestamp (str) : ISO-formatted timestamp for the row.

  • prediction (float) : The output of the model for this row.

ShapleyFeatureContribution contains:

  • feature_value (str) : the feature value for this row. First 50 characters are returned.

  • strength (float) : the shap value for this feature and row.

  • feature (str) : the feature name.

classmethod get(project_id, record_id, start_date=None, end_date=None, points_count=None)

Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.

Parameters:
  • project_id (str) – The ID of the project.

  • record_id (str) – The ID of the anomaly assessment record.

  • start_date (Optional[str]) – The start of the date range to get explanations in. Example: 2020-01-01T00:00:00.000000Z

  • end_date (Optional[str]) – The end of the date range to get explanations in. Example: 2020-10-01T00:00:00.000000Z

  • points_count (Optional[int]) – The number of the rows to return.

Return type:

AnomalyAssessmentExplanations

class datarobot.models.anomaly_assessment.AnomalyAssessmentPredictionsPreview

Aggregated predictions over time for the corresponding anomaly assessment record. Intended to find the bins with highest anomaly scores.

Added in version v2.25.

Variables:
  • record_id (str) – The ID of the record.

  • project_id (str) – The ID of the project record belongs to.

  • model_id (str) – The ID of the model record belongs to.

  • backtest (int or "holdout") – The backtest of the record.

  • source ("training" or "validation") – The source of the record

  • series_id (str or None) – The series id of the record for the multiseries projects. Defined only for the multiseries projects.

  • start_date (str) – the ISO-formatted timestamp of the first prediction in the subset.

  • end_date (str) – the ISO-formatted timestamp of the last prediction in the subset.

  • preview_bins (list of preview_bin objects.) – The aggregated predictions for the subset. Bins boundaries may differ from actual start/end dates because this is an aggregation.

Notes

PreviewBin contains:

  • start_date (str) : the ISO-formatted datetime of the start of the bin.

  • end_date (str) : the ISO-formatted datetime of the end of the bin.

  • avg_predicted (float or None) : the average prediction of the model in the bin. None if there are no entries in the bin.

  • max_predicted (float or None) : the maximum prediction of the model in the bin. None if there are no entries in the bin.

  • frequency (int) : the number of the rows in the bin.

classmethod get(project_id, record_id)

Retrieve aggregated predictions over time.

Parameters:
  • project_id (str) – The ID of the project.

  • record_id (str) – The ID of the anomaly assessment record.

Return type:

AnomalyAssessmentPredictionsPreview

find_anomalous_regions(max_prediction_threshold=0.0)
Sort preview bins by max_predicted value and select those with max predicted value

greater or equal to max prediction threshold. Sort the result by max predicted value in descending order.

Parameters:

max_prediction_threshold (Optional[float]) – Return bins with maximum anomaly score greater or equal to max_prediction_threshold.

Returns:

preview_bins – Filtered and sorted preview bins

Return type:

list of preview_bin

Confusion chart

class datarobot.models.confusion_chart.ConfusionChart

Confusion Chart data for model.

Notes

ClassMetrics is a dict containing the following:

  • class_name (string) name of the class

  • actual_count (int) number of times this class is seen in the validation data

  • predicted_count (int) number of times this class has been predicted for the validation data

  • f1 (float) F1 score

  • recall (float) recall score

  • precision (float) precision score

  • was_actual_percentages (list of dict) one vs all actual percentages in format specified below.
    • other_class_name (string) the name of the other class

    • percentage (float) the percentage of the times this class was predicted when is was actually class (from 0 to 1)

  • was_predicted_percentages (list of dict) one vs all predicted percentages in format specified below.
    • other_class_name (string) the name of the other class

    • percentage (float) the percentage of the times this class was actual predicted (from 0 to 1)

  • confusion_matrix_one_vs_all (list of list) 2d list representing 2x2 one vs all matrix.
    • This represents the True/False Negative/Positive rates as integer for each class. The data structure looks like:

    • [ [ True Negative, False Positive ], [ False Negative, True Positive ] ]

Variables:
  • source (str) – Confusion Chart data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.

  • raw_data (dict) – All of the raw data for the Confusion Chart

  • confusion_matrix (list of list) – The N x N confusion matrix

  • classes (list) – The names of each of the classes

  • class_metrics (list of dicts) – List of dicts with schema described as ClassMetrics above.

  • source_model_id (str) – ID of the model this Confusion chart represents; in some cases, insights from the parent of a frozen model may be used

Lift chart

class datarobot.models.lift_chart.LiftChart

Lift chart data for model.

Notes

LiftChartBin is a dict containing the following:

  • actual (float) Sum of actual target values in bin

  • predicted (float) Sum of predicted target values in bin

  • bin_weight (float) The weight of the bin. For weighted projects, it is the sum of the weights of the rows in the bin. For unweighted projects, it is the number of rows in the bin.

Variables:
  • source (str) – Lift chart data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.

  • bins (list of dict) – List of dicts with schema described as LiftChartBin above.

  • source_model_id (str) – ID of the model this lift chart represents; in some cases, insights from the parent of a frozen model may be used

  • target_class (Optional[str]) – For multiclass lift - target class for this lift chart data.

  • data_slice_id (string or None) – The slice to retrieve Lift Chart for; if None, retrieve unsliced data.

classmethod from_server_data(data, keep_attrs=None, use_insights_format=False, **kwargs)

Overwrite APIObject.from_server_data to handle lift chart data retrieved from either legacy URL or /insights/ new URL.

Parameters:
  • data (dict) – The directly translated dict of JSON from the server. No casing fixes have taken place

  • use_insights_format (Optional[bool]) – Whether to repack the data from the format used in the GET /insights/liftChart/ URL to the format used in the legacy URL.

Data slices

class datarobot.models.data_slice.DataSlice

Definition of a data slice

Variables:
  • id (str) – ID of the data slice.

  • name (str) – Name of the data slice definition.

  • filters (list[DataSliceFiltersType]) –

    List of DataSliceFiltersType with params
    • operand (str) Name of the feature to use in the filter.

    • operator (str) Operator to use in the filter - eq, in, <, or >.

    • values (Union[str, int, float]) Values to use from the feature.

  • project_id (str) – ID of the project that the model is part of.

classmethod list(project, offset=0, limit=100)

List the data slices in the same project

Parameters:
  • project (Union[str, Project]) – ID of the project or Project object from which to list data slices.

  • offset (Optional[int]) – Number of items to skip.

  • limit (Optional[int]) – Number of items to return.

Returns:

data_slices

Return type:

list[DataSlice]

Examples

>>> import datarobot as dr
>>> ...  # set up your Client
>>> data_slices = dr.DataSlice.list("646d0ea0cd8eb2355a68b0e5")
>>> data_slices
[DataSlice(...), DataSlice(...), ...]
classmethod create(name, filters, project)

Creates a data slice in the project with the given name and filters

Parameters:
  • name (str) – Name of the data slice definition.

  • filters (list[DataSliceFiltersType]) –

    List of filters (dict) with params:
    • operand (str)

      Name of the feature to use in filter.

    • operator (str)

      Operator to use: ‘eq’, ‘in’, ‘<’, or ‘>’.

    • values (Union[str, int, float])

      Values to use from the feature.

  • project (Union[str, Project]) – Project ID or Project object from which to list data slices.

Returns:

data_slice – The data slice object created

Return type:

DataSlice

Examples

>>> import datarobot as dr
>>> ...  # set up your Client and retrieve a project
>>> data_slice = dr.DataSlice.create(
>>> ...    name='yes',
>>> ...    filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}],
>>> ...    project=project,
>>> ...  )
>>> data_slice
DataSlice(
    filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}],
    id=646d1296bd0c543d88923c9d,
    name=yes,
    project_id=646d0ea0cd8eb2355a68b0e5
)
delete()

Deletes the data slice from storage :rtype: None

Examples

>>> import datarobot as dr
>>> data_slice = dr.DataSlice.get('5a8ac9ab07a57a0001be501f')
>>> data_slice.delete()
>>> import datarobot as dr
>>> ... # get project or project_id
>>> data_slices = dr.DataSlice.list(project)  # project object or project_id
>>> data_slice = data_slices[0]  # choose a data slice from the list
>>> data_slice.delete()
request_size(source, model=None)

Submits a request to validate the data slice’s filters and calculate the data slice’s number of rows on a given source

Parameters:
  • source (INSIGHTS_SOURCES) – Subset of data (partition or “source”) on which to apply the data slice for estimating available rows.

  • model (Optional[Union[str, Model]]) – Model object or ID of the model. It is only required when source is “training”.

Returns:

status_check_job – Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

Examples

>>> import datarobot as dr
>>> ... # get project or project_id
>>> data_slices = dr.DataSlice.list(project)  # project object or project_id
>>> data_slice = data_slices[0]  # choose a data slice from the list
>>> status_check_job = data_slice.request_size("validation")

Model is required when source is ‘training’

>>> import datarobot as dr
>>> ... # get project or project_id
>>> data_slices = dr.DataSlice.list(project)  # project object or project_id
>>> data_slice = data_slices[0]  # choose a data slice from the list
>>> status_check_job = data_slice.request_size("training", model)
get_size_info(source, model=None)

Get information about the data slice applied to a source

Parameters:
  • source (INSIGHTS_SOURCES) – Source (partition or subset) to which the data slice was applied

  • model (Optional[Union[str, Model]]) – ID for the model whose training data was sliced with this data slice. Required when the source is “training”, and not used for other sources.

Returns:

slice_size_info – Information of the data slice applied to a source

Return type:

DataSliceSizeInfo

Examples

>>> import datarobot as dr
>>> ...  # set up your Client
>>> data_slices = dr.DataSlice.list("646d0ea0cd8eb2355a68b0e5")
>>> data_slice = slices[0]  # can be any slice in the list
>>> data_slice_size_info = data_slice.get_size_info("validation")
>>> data_slice_size_info
DataSliceSizeInfo(
    data_slice_id=6493a1776ea78e6644382535,
    messages=[
        {
            'level': 'WARNING',
            'description': 'Low Observation Count',
            'additional_info': 'Insufficient number of observations to compute some insights.'
        }
    ],
    model_id=None,
    project_id=646d0ea0cd8eb2355a68b0e5,
    slice_size=1,
    source=validation,
)
>>> data_slice_size_info.to_dict()
{
    'data_slice_id': '6493a1776ea78e6644382535',
    'messages': [
        {
            'level': 'WARNING',
            'description': 'Low Observation Count',
            'additional_info': 'Insufficient number of observations to compute some insights.'
        }
    ],
    'model_id': None,
    'project_id': '646d0ea0cd8eb2355a68b0e5',
    'slice_size': 1,
    'source': 'validation',
}
>>> import datarobot as dr
>>> ...  # set up your Client
>>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535")
>>> data_slice_size_info = data_slice.get_size_info("validation")

When using source=’training’, the model param is required.

>>> import datarobot as dr
>>> ...  # set up your Client
>>> model = dr.Model.get(project_id, model_id)
>>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535")
>>> data_slice_size_info = data_slice.get_size_info("training", model)
>>> import datarobot as dr
>>> ...  # set up your Client
>>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535")
>>> data_slice_size_info = data_slice.get_size_info("training", model_id)
classmethod get(data_slice_id)

Retrieve a specific data slice.

Parameters:

data_slice_id (str) – The identifier of the data slice to retrieve.

Returns:

data_slice – The required data slice.

Return type:

DataSlice

Examples

>>> import datarobot as dr
>>> dr.DataSlice.get('648b232b9da812a6aaa0b7a9')
DataSlice(filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}],
          id=648b232b9da812a6aaa0b7a9,
          name=test,
          project_id=644bc575572480b565ca42cd
          )
class datarobot.models.data_slice.DataSliceSizeInfo

Definition of a data slice applied to a source

Variables:
  • data_slice_id (str) – ID of the data slice

  • project_id (str) – ID of the project

  • source (str) – Data source used to calculate the number of rows (slice size) after applying the data slice’s filters

  • model_id (Optional[str]) – ID of the model, required when source (subset) is ‘training’

  • slice_size (int) – Number of rows in the data slice for a given source

  • messages (list[DataSliceSizeMessageType]) – List of user-relevant messages related to a data slice

Datetime trend plots

class datarobot.models.datetime_trend_plots.AccuracyOverTimePlotsMetadata

Accuracy over Time metadata for datetime model.

Added in version v2.25.

Variables:
  • project_id (string) – The project ID.

  • model_id (string) – The model ID.

  • forecast_distance (int or None) – The forecast distance for which the metadata was retrieved. None for OTV projects.

  • resolutions (list of string) – A list of datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION, which represents available time resolutions for which plots can be retrieved.

  • backtest_metadata (list of dict) – List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details.

  • holdout_metadata (dict) – Holdout metadata dict. See backtest/holdout metadata info in Notes for more details.

  • backtest_statuses (list of dict) – List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details.

  • holdout_statuses (dict) – Holdout status dict. See backtest/holdout status info in Notes for more details.

Notes

Backtest/holdout status is a dict containing the following:

  • training: string

    Status backtest/holdout training. One of datarobot.enums.DATETIME_TREND_PLOTS_STATUS

  • validation: string

    Status backtest/holdout validation. One of datarobot.enums.DATETIME_TREND_PLOTS_STATUS

Backtest/holdout metadata is a dict containing the following:

  • training: dict

    Start and end dates for the backtest/holdout training.

  • validation: dict

    Start and end dates for the backtest/holdout validation.

Each dict in the training and validation in backtest/holdout metadata is structured like:

  • start_date: datetime.datetime or None

    The datetime of the start of the chart data (inclusive). None if chart data is not computed.

  • end_date: datetime.datetime or None

    The datetime of the end of the chart data (exclusive). None if chart data is not computed.

class datarobot.models.datetime_trend_plots.AccuracyOverTimePlot

Accuracy over Time plot for datetime model.

Added in version v2.25.

Variables:
  • project_id (string) – The project ID.

  • model_id (string) – The model ID.

  • resolution (string) – The resolution that is used for binning. One of datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION

  • start_date (datetime.datetime) – The datetime of the start of the chart data (inclusive).

  • end_date (datetime.datetime) – The datetime of the end of the chart data (exclusive).

  • bins (list of dict) – List of plot bins. See bin info in Notes for more details.

  • statistics (dict) – Statistics for plot. See statistics info in Notes for more details.

  • calendar_events (list of dict) – List of calendar events for the plot. See calendar events info in Notes for more details.

Notes

Bin is a dict containing the following:

  • start_date: datetime.datetime

    The datetime of the start of the bin (inclusive).

  • end_date: datetime.datetime

    The datetime of the end of the bin (exclusive).

  • actual: float or None

    Average actual value of the target in the bin. None if there are no entries in the bin.

  • predicted: float or None

    Average prediction of the model in the bin. None if there are no entries in the bin.

  • frequency: int or None

    Indicates number of values averaged in bin.

Statistics is a dict containing the following:

  • durbin_watson: float or None

    The Durbin-Watson statistic for the chart data. Value is between 0 and 4. Durbin-Watson statistic is a test statistic used to detect the presence of autocorrelation at lag 1 in the residuals (prediction errors) from a regression analysis. More info https://wikipedia.org/wiki/Durbin%E2%80%93Watson_statistic

Calendar event is a dict containing the following:

  • name: string

    Name of the calendar event.

  • date: datetime

    Date of the calendar event.

  • series_id: string or None

    The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.

class datarobot.models.datetime_trend_plots.AccuracyOverTimePlotPreview

Accuracy over Time plot preview for datetime model.

Added in version v2.25.

Variables:
  • project_id (string) – The project ID.

  • model_id (string) – The model ID.

  • start_date (datetime.datetime) – The datetime of the start of the chart data (inclusive).

  • end_date (datetime.datetime) – The datetime of the end of the chart data (exclusive).

  • bins (list of dict) – List of plot bins. See bin info in Notes for more details.

Notes

Bin is a dict containing the following:

  • start_date: datetime.datetime

    The datetime of the start of the bin (inclusive).

  • end_date: datetime.datetime

    The datetime of the end of the bin (exclusive).

  • actual: float or None

    Average actual value of the target in the bin. None if there are no entries in the bin.

  • predicted: float or None

    Average prediction of the model in the bin. None if there are no entries in the bin.

class datarobot.models.datetime_trend_plots.ForecastVsActualPlotsMetadata

Forecast vs Actual plots metadata for datetime model.

Added in version v2.25.

Variables:
  • project_id (string) – The project ID.

  • model_id (string) – The model ID.

  • resolutions (list of string) – A list of datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION, which represents available time resolutions for which plots can be retrieved.

  • backtest_metadata (list of dict) – List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details.

  • holdout_metadata (dict) – Holdout metadata dict. See backtest/holdout metadata info in Notes for more details.

  • backtest_statuses (list of dict) – List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details.

  • holdout_statuses (dict) – Holdout status dict. See backtest/holdout status info in Notes for more details.

Notes

Backtest/holdout status is a dict containing the following:

  • training: dict

    Dict containing each of datarobot.enums.DATETIME_TREND_PLOTS_STATUS as dict key, and list of forecast distances for particular status as dict value.

  • validation: dict

    Dict containing each of datarobot.enums.DATETIME_TREND_PLOTS_STATUS as dict key, and list of forecast distances for particular status as dict value.

Backtest/holdout metadata is a dict containing the following:

  • training: dict

    Start and end dates for the backtest/holdout training.

  • validation: dict

    Start and end dates for the backtest/holdout validation.

Each dict in the training and validation in backtest/holdout metadata is structured like:

  • start_date: datetime.datetime or None

    The datetime of the start of the chart data (inclusive). None if chart data is not computed.

  • end_date: datetime.datetime or None

    The datetime of the end of the chart data (exclusive). None if chart data is not computed.

class datarobot.models.datetime_trend_plots.ForecastVsActualPlot

Forecast vs Actual plot for datetime model.

Added in version v2.25.

Variables:
  • project_id (string) – The project ID.

  • model_id (string) – The model ID.

  • forecast_distances (list of int) – A list of forecast distances that were retrieved.

  • resolution (string) – The resolution that is used for binning. One of datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION

  • start_date (datetime.datetime) – The datetime of the start of the chart data (inclusive).

  • end_date (datetime.datetime) – The datetime of the end of the chart data (exclusive).

  • bins (list of dict) – List of plot bins. See bin info in Notes for more details.

  • calendar_events (list of dict) – List of calendar events for the plot. See calendar events info in Notes for more details.

Notes

Bin is a dict containing the following:

  • start_date: datetime.datetime

    The datetime of the start of the bin (inclusive).

  • end_date: datetime.datetime

    The datetime of the end of the bin (exclusive).

  • actual: float or None

    Average actual value of the target in the bin. None if there are no entries in the bin.

  • forecasts: list of float

    A list of average forecasts for the model for each forecast distance. Empty if there are no forecasts in the bin. Each index in the forecasts list maps to forecastDistances list index.

  • error: float or None

    Average absolute residual value of the bin. None if there are no entries in the bin.

  • normalized_error: float or None

    Normalized average absolute residual value of the bin. None if there are no entries in the bin.

  • frequency: int or None

    Indicates number of values averaged in bin.

Calendar event is a dict containing the following:

  • name: string

    Name of the calendar event.

  • date: datetime

    Date of the calendar event.

  • series_id: string or None

    The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.

class datarobot.models.datetime_trend_plots.ForecastVsActualPlotPreview

Forecast vs Actual plot preview for datetime model.

Added in version v2.25.

Variables:
  • project_id (string) – The project ID.

  • model_id (string) – The model ID.

  • start_date (datetime.datetime) – The datetime of the start of the chart data (inclusive).

  • end_date (datetime.datetime) – The datetime of the end of the chart data (exclusive).

  • bins (list of dict) – List of plot bins. See bin info in Notes for more details.

Notes

Bin is a dict containing the following:

  • start_date: datetime.datetime

    The datetime of the start of the bin (inclusive).

  • end_date: datetime.datetime

    The datetime of the end of the bin (exclusive).

  • actual: float or None

    Average actual value of the target in the bin. None if there are no entries in the bin.

  • predicted: float or None

    Average prediction of the model in the bin. None if there are no entries in the bin.

class datarobot.models.datetime_trend_plots.AnomalyOverTimePlotsMetadata

Anomaly over Time metadata for datetime model.

Added in version v2.25.

Variables:
  • project_id (string) – The project ID.

  • model_id (string) – The model ID.

  • resolutions (list of string) – A list of datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION, which represents available time resolutions for which plots can be retrieved.

  • backtest_metadata (list of dict) – List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details.

  • holdout_metadata (dict) – Holdout metadata dict. See backtest/holdout metadata info in Notes for more details.

  • backtest_statuses (list of dict) – List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details.

  • holdout_statuses (dict) – Holdout status dict. See backtest/holdout status info in Notes for more details.

Notes

Backtest/holdout status is a dict containing the following:

  • training: string

    Status backtest/holdout training. One of datarobot.enums.DATETIME_TREND_PLOTS_STATUS

  • validation: string

    Status backtest/holdout validation. One of datarobot.enums.DATETIME_TREND_PLOTS_STATUS

Backtest/holdout metadata is a dict containing the following:

  • training: dict

    Start and end dates for the backtest/holdout training.

  • validation: dict

    Start and end dates for the backtest/holdout validation.

Each dict in the training and validation in backtest/holdout metadata is structured like:

  • start_date: datetime.datetime or None

    The datetime of the start of the chart data (inclusive). None if chart data is not computed.

  • end_date: datetime.datetime or None

    The datetime of the end of the chart data (exclusive). None if chart data is not computed.

class datarobot.models.datetime_trend_plots.AnomalyOverTimePlot

Anomaly over Time plot for datetime model.

Added in version v2.25.

Variables:
  • project_id (string) – The project ID.

  • model_id (string) – The model ID.

  • resolution (string) – The resolution that is used for binning. One of datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION

  • start_date (datetime.datetime) – The datetime of the start of the chart data (inclusive).

  • end_date (datetime.datetime) – The datetime of the end of the chart data (exclusive).

  • bins (list of dict) – List of plot bins. See bin info in Notes for more details.

  • calendar_events (list of dict) – List of calendar events for the plot. See calendar events info in Notes for more details.

Notes

Bin is a dict containing the following:

  • start_date: datetime.datetime

    The datetime of the start of the bin (inclusive).

  • end_date: datetime.datetime

    The datetime of the end of the bin (exclusive).

  • predicted: float or None

    Average prediction of the model in the bin. None if there are no entries in the bin.

  • frequency: int or None

    Indicates number of values averaged in bin.

Calendar event is a dict containing the following:

  • name: string

    Name of the calendar event.

  • date: datetime

    Date of the calendar event.

  • series_id: string or None

    The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.

class datarobot.models.datetime_trend_plots.AnomalyOverTimePlotPreview

Anomaly over Time plot preview for datetime model.

Added in version v2.25.

Variables:
  • project_id (string) – The project ID.

  • model_id (string) – The model ID.

  • prediction_threshold (float) – Only bins with predictions exceeding this threshold are returned in the response.

  • start_date (datetime.datetime) – The datetime of the start of the chart data (inclusive).

  • end_date (datetime.datetime) – The datetime of the end of the chart data (exclusive).

  • bins (list of dict) – List of plot bins. See bin info in Notes for more details.

Notes

Bin is a dict containing the following:

  • start_date: datetime.datetime

    The datetime of the start of the bin (inclusive).

  • end_date: datetime.datetime

    The datetime of the end of the bin (exclusive).

External scores and insights

class datarobot.ExternalScores

Metric scores on prediction dataset with target or actual value column in unsupervised case. Contains project metrics for supervised and special classification metrics set for unsupervised projects.

Added in version v2.21.

Variables:
  • project_id (str) – id of the project the model belongs to

  • model_id (str) – id of the model

  • dataset_id (str) – id of the prediction dataset with target or actual value column for unsupervised case

  • actual_value_column (Optional[str]) – For unsupervised projects only. Actual value column which was used to calculate the classification metrics and insights on the prediction dataset.

  • scores (list of dicts in a form of {'label': metric_name, 'value': score}) – Scores on the dataset.

Examples

List all scores for a dataset

from datarobot.models.external_dataset_scores_insights.external_scores import ExternalScores
scores = ExternalScores.list(project_id, dataset_id=dataset_id)
classmethod create(project_id, model_id, dataset_id, actual_value_column=None)

Compute an external dataset insights for the specified model.

Parameters:
  • project_id (str) – id of the project the model belongs to

  • model_id (str) – id of the model for which insights is requested

  • dataset_id (str) – id of the dataset for which insights is requested

  • actual_value_column (Optional[str]) – actual values column label, for unsupervised projects only

Returns:

job – an instance of created async job

Return type:

Job

classmethod list(project_id, model_id=None, dataset_id=None, offset=0, limit=100)

Fetch external scores list for the project and optionally for model and dataset.

Parameters:
  • project_id (str) – id of the project

  • model_id (Optional[str]) – if specified, only scores for this model will be retrieved

  • dataset_id (Optional[str]) – if specified, only scores for this dataset will be retrieved

  • offset (Optional[int]) – this many results will be skipped, default: 0

  • limit (Optional[int]) – at most this many results are returned, default: 100, max 1000. To return all results, specify 0

Return type:

List[ExternalScores]

Returns:

A list of External Scores objects

classmethod get(project_id, model_id, dataset_id)

Retrieve external scores for the project, model and dataset.

Parameters:
  • project_id (str) – id of the project

  • model_id (str) – if specified, only scores for this model will be retrieved

  • dataset_id (str) – if specified, only scores for this dataset will be retrieved

Return type:

ExternalScores

Returns:

External Scores object

class datarobot.ExternalLiftChart

Lift chart for the model and prediction dataset with target or actual value column in unsupervised case.

Added in version v2.21.

LiftChartBin is a dict containing the following:

  • actual (float) Sum of actual target values in bin

  • predicted (float) Sum of predicted target values in bin

  • bin_weight (float) The weight of the bin. For weighted projects, it is the sum of the weights of the rows in the bin. For unweighted projects, it is the number of rows in the bin.

Variables:
  • dataset_id (str) – id of the prediction dataset with target or actual value column for unsupervised case

  • bins (list of dict) – List of dicts with schema described as LiftChartBin above.

classmethod list(project_id, model_id, dataset_id=None, offset=0, limit=100)

Retrieve list of the lift charts for the model.

Parameters:
  • project_id (str) – id of the project

  • model_id (str) – if specified, only lift chart for this model will be retrieved

  • dataset_id (Optional[str]) – if specified, only lift chart for this dataset will be retrieved

  • offset (Optional[int]) – this many results will be skipped, default: 0

  • limit (Optional[int]) – at most this many results are returned, default: 100, max 1000. To return all results, specify 0

Return type:

List[ExternalLiftChart]

Returns:

A list of ExternalLiftChart objects

classmethod get(project_id, model_id, dataset_id)

Retrieve lift chart for the model and prediction dataset.

Parameters:
  • project_id (str) – project id

  • model_id (str) – model id

  • dataset_id (str) – prediction dataset id with target or actual value column for unsupervised case

Return type:

ExternalLiftChart

Returns:

ExternalLiftChart object

class datarobot.ExternalRocCurve

ROC curve data for the model and prediction dataset with target or actual value column in unsupervised case.

Added in version v2.21.

Variables:
  • dataset_id (str) – id of the prediction dataset with target or actual value column for unsupervised case

  • roc_points (list of dict) – List of precalculated metrics associated with thresholds for ROC curve.

  • negative_class_predictions (list of float) – List of predictions from example for negative class

  • positive_class_predictions (list of float) – List of predictions from example for positive class

classmethod list(project_id, model_id, dataset_id=None, offset=0, limit=100)

Retrieve list of the roc curves for the model.

Parameters:
  • project_id (str) – id of the project

  • model_id (str) – if specified, only lift chart for this model will be retrieved

  • dataset_id (Optional[str]) – if specified, only lift chart for this dataset will be retrieved

  • offset (Optional[int]) – this many results will be skipped, default: 0

  • limit (Optional[int]) – at most this many results are returned, default: 100, max 1000. To return all results, specify 0

Return type:

List[ExternalRocCurve]

Returns:

A list of ExternalRocCurve objects

classmethod get(project_id, model_id, dataset_id)

Retrieve ROC curve chart for the model and prediction dataset.

Parameters:
  • project_id (str) – project id

  • model_id (str) – model id

  • dataset_id (str) – prediction dataset id with target or actual value column for unsupervised case

Return type:

ExternalRocCurve

Returns:

ExternalRocCurve object

Feature association

class datarobot.models.FeatureAssociationMatrix

Feature association statistics for a project.

Notes

Projects created prior to v2.17 are not supported by this feature.

Variables:
  • project_id (str) – Id of the associated project.

  • strengths (list of dict) – Pairwise statistics for the available features as structured below.

  • features (list of dict) – Metadata for each feature and where it goes in the matrix.

Examples

import datarobot as dr

# retrieve feature association matrix
feature_association_matrix = dr.FeatureAssociationMatrix.get(project_id)
feature_association_matrix.strengths
feature_association_matrix.features

# retrieve feature association matrix for a metric, association type or a feature list
feature_association_matrix = dr.FeatureAssociationMatrix.get(
    project_id,
    metric=enums.FEATURE_ASSOCIATION_METRIC.SPEARMAN,
    association_type=enums.FEATURE_ASSOCIATION_TYPE.CORRELATION,
    featurelist_id=featurelist_id,
)
classmethod get(project_id, metric=None, association_type=None, featurelist_id=None)

Get feature association statistics.

Parameters:
  • project_id (str) – Id of the project that contains the requested associations.

  • metric (enums.FEATURE_ASSOCIATION_METRIC) – The name of a metric to get pairwise data for. Since ‘v2.19’ this is optional and defaults to enums.FEATURE_ASSOCIATION_METRIC.MUTUAL_INFO.

  • association_type (enums.FEATURE_ASSOCIATION_TYPE) – The type of dependence for the data. Since ‘v2.19’ this is optional and defaults to enums.FEATURE_ASSOCIATION_TYPE.ASSOCIATION.

  • featurelist_id (str or None) – Optional, the feature list to lookup FAM data for. By default, depending on the type of the project “Informative Features” or “Timeseries Informative Features” list will be used. (New in version v2.19)

Returns:

Feature association pairwise metric strength data, feature clustering data, and ordering data for Feature Association Matrix visualization.

Return type:

FeatureAssociationMatrix

classmethod create(project_id, featurelist_id)

Compute the Feature Association Matrix for a Feature List

Parameters:
  • project_id (str) – The ID of the project that the feature list belongs to.

  • featurelist_id (str) – The ID of the feature list for which insights are requested.

Returns:

status_check_job – Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

Feature association matrix details

class datarobot.models.FeatureAssociationMatrixDetails

Plotting details for a pair of passed features present in the feature association matrix.

Notes

Projects created prior to v2.17 are not supported by this feature.

Variables:
  • project_id (str) – Id of the project that contains the requested associations.

  • chart_type (str) – Which type of plotting the pair of features gets in the UI. e.g. ‘HORIZONTAL_BOX’, ‘VERTICAL_BOX’, ‘SCATTER’ or ‘CONTINGENCY’

  • values (list) – The data triplets for pairwise plotting e.g. {“values”: [[460.0, 428.5, 0.001], [1679.3, 259.0, 0.001], …] The first entry of each list is a value of feature1, the second entry of each list is a value of feature2, and the third is the relative frequency of the pair of datapoints in the sample.

  • features (list) – A list of the requested features, [feature1, feature2]

  • types (list) – The type of feature1 and feature2. Possible values: “CATEGORICAL”, “NUMERIC”

  • featurelist_id (str) – Id of the feature list to lookup FAM details for.

classmethod get(project_id, feature1, feature2, featurelist_id=None)

Get a sample of the actual values used to measure the association between a pair of features

Added in version v2.17.

Parameters:
  • project_id (str) – Id of the project of interest.

  • feature1 (str) – Feature name for the first feature of interest.

  • feature2 (str) – Feature name for the second feature of interest.

  • featurelist_id (str) – Optional, the feature list to lookup FAM data for. By default, depending on the type of the project “Informative Features” or “Timeseries Informative Features” list will be used.

Returns:

The feature association plotting for provided pair of features.

Return type:

FeatureAssociationMatrixDetails

Feature association featurelists

class datarobot.models.FeatureAssociationFeaturelists

Featurelists with feature association matrix availability flags for a project.

Variables:
  • project_id (str) – Id of the project that contains the requested associations.

  • featurelists (list fo dict) – The featurelists with the featurelist_id, title and the has_fam flag.

classmethod get(project_id)

Get featurelists with feature association status for each.

Parameters:

project_id (str) – Id of the project of interest.

Returns:

Featurelist with feature association status for each.

Return type:

FeatureAssociationFeaturelists

Feature effects

class datarobot.models.FeatureEffects

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Variables:
  • project_id (string) – The project that contains requested model

  • model_id (string) – The model to retrieve Feature Effects for

  • source (string) – The source to retrieve Feature Effects for

  • data_slice_id (string or None) – The slice to retrieve Feature Effects for; if None, retrieve unsliced data

  • feature_effects (list) – Feature Effects for every feature

  • backtest_index (string, required only for DatetimeModels,) – The backtest index to retrieve Feature Effects for.

Notes

featureEffects is a dict containing the following:

  • feature_name (string) Name of the feature

  • feature_type (string) dr.enums.FEATURE_TYPE, Feature type either numeric, categorical or datetime

  • feature_impact_score (float) Feature impact score

  • weight_label (string) optional, Weight label if configured for the project else null

  • partial_dependence (List) Partial dependence results

  • predicted_vs_actual (List) optional, Predicted versus actual results, may be omitted if there are insufficient qualified samples

partial_dependence is a dict containing the following:

  • is_capped (bool) Indicates whether the data for computation is capped

  • data (List) partial dependence results in the following format

data is a list of dict containing the following:

  • label (string) Contains label for categorical and numeric features as string

  • dependence (float) Value of partial dependence

predicted_vs_actual is a dict containing the following:

  • is_capped (bool) Indicates whether the data for computation is capped

  • data (List) pred vs actual results in the following format

data is a list of dict containing the following:

  • label (string) Contains label for categorical features for numeric features contains range or numeric value.

  • bin (List) optional, For numeric features contains labels for left and right bin limits

  • predicted (float) Predicted value

  • actual (float) Actual value. Actual value is null for unsupervised timeseries models

  • row_count (int or float) Number of rows for the label and bin. Type is float if weight or exposure is set for the project.

classmethod from_server_data(data, *args, use_insights_format=False, **kwargs)

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing.

Parameters:
  • data (dict) – The directly translated dict of JSON from the server. No casing fixes have taken place

  • use_insights_format (Optional[bool]) – Whether to repack the data from the format used in the GET /insights/featureEffects/ URL to the format used in the legacy URL.

class datarobot.models.FeatureEffectMetadata

Feature Effect Metadata for model, contains status and available model sources.

Notes

source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.

class datarobot.models.FeatureEffectMetadataDatetime

Feature Effect Metadata for datetime model, contains list of feature effect metadata per backtest.

Notes

feature effect metadata per backtest contains:

  • status : str.

  • backtest_index : str.

  • sources : List[str].

source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.

backtest_index is expected parameter to submit compute request and retrieve Feature Effect. One of provided backtest indexes shall be used.

Variables:

data (list[FeatureEffectMetadataDatetimePerBacktest]) – List feature effect metadata per backtest

class datarobot.models.FeatureEffectMetadataDatetimePerBacktest

Convert dictionary into feature effect metadata per backtest which contains backtest_index, status and sources.

Payoff matrix

class datarobot.models.PayoffMatrix

Represents a Payoff Matrix, a costs/benefit scenario used for creating a profit curve.

Variables:
  • project_id (str) – id of the project with which the payoff matrix is associated.

  • id (str) – id of the payoff matrix.

  • name (str) – User-supplied label for the payoff matrix.

  • true_positive_value (float) – Cost or benefit of a true positive classification

  • true_negative_value (float) – Cost or benefit of a true negative classification

  • false_positive_value (float) – Cost or benefit of a false positive classification

  • false_negative_value (float) – Cost or benefit of a false negative classification

Examples

import datarobot as dr

# create a payoff matrix
payoff_matrix = dr.PayoffMatrix.create(
    project_id,
    name,
    true_positive_value=100,
    true_negative_value=10,
    false_positive_value=0,
    false_negative_value=-10,
)

# list available payoff matrices
payoff_matrices = dr.PayoffMatrix.list(project_id)
payoff_matrix = payoff_matrices[0]
classmethod create(project_id, name, true_positive_value=1, true_negative_value=1, false_positive_value=-1, false_negative_value=-1)

Create a payoff matrix associated with a specific project.

Parameters:

project_id (str) – id of the project with which the payoff matrix will be associated

Returns:

payoff_matrix – The newly created payoff matrix

Return type:

PayoffMatrix

classmethod list(project_id)

Fetch all the payoff matrices for a project.

Parameters:

project_id (str) – id of the project

Returns:

A list of PayoffMatrix objects

Return type:

List of PayoffMatrix

Raises:
classmethod get(project_id, id)

Retrieve a specified payoff matrix.

Parameters:
  • project_id (str) – id of the project the model belongs to

  • id (str) – id of the payoff matrix

Return type:

PayoffMatrix

Returns:

Raises:
classmethod update(project_id, id, name, true_positive_value, true_negative_value, false_positive_value, false_negative_value)

Update (replace) a payoff matrix. Note that all data fields are required.

Parameters:
  • project_id (str) – id of the project to which the payoff matrix belongs

  • id (str) – id of the payoff matrix

  • name (str) – User-supplied label for the payoff matrix

  • true_positive_value (float) – True positive payoff value to use for the profit curve

  • true_negative_value (float) – True negative payoff value to use for the profit curve

  • false_positive_value (float) – False positive payoff value to use for the profit curve

  • false_negative_value (float) – False negative payoff value to use for the profit curve

Returns:

PayoffMatrix with updated values

Return type:

payoff_matrix

Raises:
classmethod delete(project_id, id)

Delete a specified payoff matrix.

Parameters:
  • project_id (str) – id of the project the model belongs to

  • id (str) – id of the payoff matrix

Returns:

response – Empty response (204)

Return type:

requests.Response

Raises:
classmethod from_data(data)

Instantiate an object of this class using a dict.

Parameters:

data (dict) – Correctly snake_cased keys and their values.

Return type:

TypeVar(T, bound= APIObject)

classmethod from_server_data(data, keep_attrs=None)

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters:
  • data (dict) – The directly translated dict of JSON from the server. No casing fixes have taken place

  • keep_attrs (iterable) – List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

Return type:

TypeVar(T, bound= APIObject)

Prediction explanations

class datarobot.PredictionExplanationsInitialization

Represents a prediction explanations initialization of a model.

Variables:
  • project_id (str) – id of the project the model belongs to

  • model_id (str) – id of the model the prediction explanations initialization is for

  • prediction_explanations_sample (list of dict) – a small sample of prediction explanations that could be generated for the model

classmethod get(project_id, model_id)

Retrieve the prediction explanations initialization for a model.

Prediction explanations initializations are a prerequisite for computing prediction explanations, and include a sample what the computed prediction explanations for a prediction dataset would look like.

Parameters:
  • project_id (str) – id of the project the model belongs to

  • model_id (str) – id of the model the prediction explanations initialization is for

Returns:

prediction_explanations_initialization – The queried instance.

Return type:

PredictionExplanationsInitialization

Raises:

ClientError – If the project or model does not exist or the initialization has not been computed.

classmethod create(project_id, model_id)

Create a prediction explanations initialization for the specified model.

Parameters:
  • project_id (str) – id of the project the model belongs to

  • model_id (str) – id of the model for which initialization is requested

Returns:

job – an instance of created async job

Return type:

Job

delete()

Delete this prediction explanations initialization.

class datarobot.PredictionExplanations

Represents prediction explanations metadata and provides access to computation results.

Examples

prediction_explanations = dr.PredictionExplanations.get(project_id, explanations_id)
for row in prediction_explanations.get_rows():
    print(row)  # row is an instance of PredictionExplanationsRow
Variables:
  • id (str) – id of the record and prediction explanations computation result

  • project_id (str) – id of the project the model belongs to

  • model_id (str) – id of the model the prediction explanations are for

  • dataset_id (str) – id of the prediction dataset prediction explanations were computed for

  • max_explanations (int) – maximum number of prediction explanations to supply per row of the dataset

  • threshold_low (float) – the lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset

  • threshold_high (float) – the high threshold, above which a prediction must score in order for prediction explanations to be computed for a row in the dataset

  • num_columns (int) – the number of columns prediction explanations were computed for

  • finish_time (float) – timestamp referencing when computation for these prediction explanations finished

  • prediction_explanations_location (str) – where to retrieve the prediction explanations

  • source (str) – For OTV/TS in-training predictions. Holds the portion of the training dataset used to generate predictions.

classmethod get(project_id, prediction_explanations_id)

Retrieve a specific prediction explanations metadata.

Parameters:
  • project_id (str) – id of the project the explanations belong to

  • prediction_explanations_id (str) – id of the prediction explanations

Returns:

prediction_explanations – The queried instance.

Return type:

PredictionExplanations

classmethod create(project_id, model_id, dataset_id, max_explanations=None, threshold_low=None, threshold_high=None, mode=None)

Create prediction explanations for the specified dataset.

In order to create PredictionExplanations for a particular model and dataset, you must first:

  • Compute feature impact for the model via datarobot.Model.get_feature_impact()

  • Compute a PredictionExplanationsInitialization for the model via datarobot.PredictionExplanationsInitialization.create(project_id, model_id)

  • Compute predictions for the model and dataset via datarobot.Model.request_predictions(dataset_id)

threshold_high and threshold_low are optional filters applied to speed up computation. When at least one is specified, only the selected outlier rows will have prediction explanations computed. Rows are considered to be outliers if their predicted value (in case of regression projects) or probability of being the positive class (in case of classification projects) is less than threshold_low or greater than thresholdHigh. If neither is specified, prediction explanations will be computed for all rows.

Parameters:
  • project_id (str) – id of the project the model belongs to

  • model_id (str) – id of the model for which prediction explanations are requested

  • dataset_id (str) – id of the prediction dataset for which prediction explanations are requested

  • threshold_low (Optional[float]) – the lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset. If neither threshold_high nor threshold_low is specified, prediction explanations will be computed for all rows.

  • threshold_high (Optional[float]) – the high threshold, above which a prediction must score in order for prediction explanations to be computed. If neither threshold_high nor threshold_low is specified, prediction explanations will be computed for all rows.

  • max_explanations (Optional[int]) – the maximum number of prediction explanations to supply per row of the dataset, default: 3.

  • mode (PredictionExplanationsMode, optional) – mode of calculation for multiclass models, if not specified - server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).

Returns:

job – an instance of created async job

Return type:

Job

classmethod create_on_training_data(project_id, model_id, dataset_id, max_explanations=None, threshold_low=None, threshold_high=None, mode=None, datetime_prediction_partition=None)

Create prediction explanations for the the dataset used to train the model. This can be retrieved by calling dr.Model.get().featurelist_id. For OTV and timeseries projects, datetime_prediction_partition is required and limited to the first backtest (‘0’) or holdout (‘holdout’).

In order to create PredictionExplanations for a particular model and dataset, you must first:

  • Compute Feature Impact for the model via datarobot.Model.get_feature_impact()/

  • Compute a PredictionExplanationsInitialization for the model via datarobot.PredictionExplanationsInitialization.create(project_id, model_id).

  • Compute predictions for the model and dataset via datarobot.Model.request_predictions(dataset_id).

threshold_high and threshold_low are optional filters applied to speed up computation. When at least one is specified, only the selected outlier rows will have prediction explanations computed. Rows are considered to be outliers if their predicted value (in case of regression projects) or probability of being the positive class (in case of classification projects) is less than threshold_low or greater than thresholdHigh. If neither is specified, prediction explanations will be computed for all rows.

Parameters:
  • project_id (str) – The ID of the project the model belongs to.

  • model_id (str) – The ID of the model for which prediction explanations are requested.

  • dataset_id (str) – The ID of the prediction dataset for which prediction explanations are requested.

  • threshold_low (Optional[float]) – The lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset. If neither threshold_high nor threshold_low is specified, prediction explanations will be computed for all rows.

  • threshold_high (Optional[float]) – The high threshold, above which a prediction must score in order for prediction explanations to be computed. If neither threshold_high nor threshold_low is specified, prediction explanations will be computed for all rows.

  • max_explanations (Optional[int]) – The maximum number of prediction explanations to supply per row of the dataset (default: 3).

  • mode (PredictionExplanationsMode, optional) – The mode of calculation for multiclass models. If not specified, the server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).

  • datetime_prediction_partition (str) – Options: ‘0’, ‘holdout’ or None. Used only by time series and OTV projects to indicate what part of the dataset will be used to generate predictions for computing prediction explanation. Current options are ‘0’ (first backtest) and ‘holdout’. Note that only the validation partition of the first backtest will be used to generation predictions.

Returns:

job – An instance of created async job.

Return type:

Job

classmethod list(project_id, model_id=None, limit=None, offset=None)

List of prediction explanations metadata for a specified project.

Parameters:
  • project_id (str) – id of the project to list prediction explanations for

  • model_id (Optional[str]) – if specified, only prediction explanations computed for this model will be returned

  • limit (int or None) – at most this many results are returned, default: no limit

  • offset (int or None) – this many results will be skipped, default: 0

Returns:

prediction_explanations

Return type:

list[PredictionExplanations]

get_rows(batch_size=None, exclude_adjusted_predictions=True)

Retrieve prediction explanations rows.

Parameters:
  • batch_size (int or None, optional) – maximum number of prediction explanations rows to retrieve per request

  • exclude_adjusted_predictions (bool) – Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

Yields:

prediction_explanations_row (PredictionExplanationsRow) – Represents prediction explanations computed for a prediction row.

is_multiclass()

Whether these explanations are for a multiclass project or a non-multiclass project

is_unsupervised_clustering_or_multiclass()

Clustering and multiclass XEMP always has either one of num_top_classes or class_names parameters set

get_number_of_explained_classes()

How many classes we attempt to explain for each row

get_all_as_dataframe(exclude_adjusted_predictions=True)

Retrieve all prediction explanations rows and return them as a pandas.DataFrame.

Returned dataframe has the following structure:

  • row_id : row id from prediction dataset

  • prediction : the output of the model for this row

  • adjusted_prediction : adjusted prediction values (only appears for projects that utilize prediction adjustments, e.g. projects with an exposure column)

  • class_0_label : a class level from the target (only appears for classification projects)

  • class_0_probability : the probability that the target is this class (only appears for classification projects)

  • class_1_label : a class level from the target (only appears for classification projects)

  • class_1_probability : the probability that the target is this class (only appears for classification projects)

  • explanation_0_feature : the name of the feature contributing to the prediction for this explanation

  • explanation_0_feature_value : the value the feature took on

  • explanation_0_label : the output being driven by this explanation. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.

  • explanation_0_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this explanation

  • explanation_0_per_ngram_text_explanations : Text prediction explanations data in json formatted string.

  • explanation_0_strength : the amount this feature’s value affected the prediction

  • explanation_N_feature : the name of the feature contributing to the prediction for this explanation

  • explanation_N_feature_value : the value the feature took on

  • explanation_N_label : the output being driven by this explanation. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.

  • explanation_N_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this explanation

  • explanation_N_per_ngram_text_explanations : Text prediction explanations data in json formatted string.

  • explanation_N_strength : the amount this feature’s value affected the prediction

For classification projects, the server does not guarantee any ordering on the prediction values, however within this function we sort the values so that class_X corresponds to the same class from row to row.

Parameters:

exclude_adjusted_predictions (bool) – Optional, defaults to True. Set this to False to include adjusted prediction values in the returned dataframe.

Returns:

dataframe

Return type:

pandas.DataFrame

download_to_csv(filename, encoding='utf-8', exclude_adjusted_predictions=True)

Save prediction explanations rows into CSV file.

Parameters:
  • filename (str or file object) – path or file object to save prediction explanations rows

  • encoding (string, optional) – A string representing the encoding to use in the output file, defaults to ‘utf-8’

  • exclude_adjusted_predictions (bool) – Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

get_prediction_explanations_page(limit=None, offset=None, exclude_adjusted_predictions=True)

Get prediction explanations.

If you don’t want use a generator interface, you can access paginated prediction explanations directly.

Parameters:
  • limit (int or None) – the number of records to return, the server will use a (possibly finite) default if not specified

  • offset (int or None) – the number of records to skip, default 0

  • exclude_adjusted_predictions (bool) – Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

Returns:

prediction_explanations

Return type:

PredictionExplanationsPage

delete()

Delete these prediction explanations.

class datarobot.models.prediction_explanations.PredictionExplanationsRow

Represents prediction explanations computed for a prediction row.

Notes

PredictionValue contains:

  • label : describes what this model output corresponds to. For regression projects, it is the name of the target feature. For classification projects, it is a level from the target feature.

  • value : the output of the prediction. For regression projects, it is the predicted value of the target. For classification projects, it is the predicted probability the row belongs to the class identified by the label.

PredictionExplanation contains:

  • label : described what output was driven by this explanation. For regression projects, it is the name of the target feature. For classification projects, it is the class whose probability increasing would correspond to a positive strength of this prediction explanation.

  • feature : the name of the feature contributing to the prediction

  • feature_value : the value the feature took on for this row

  • strength : the amount this feature’s value affected the prediction

  • qualitative_strength : a human-readable description of how strongly the feature affected the prediction. A large positive effect is denoted ‘+++’, medium ‘++’, small ‘+’, very small ‘<+’. A large negative effect is denoted ‘—’, medium ‘–’, small ‘-’, very small ‘<-‘.

Variables:
  • row_id (int) – which row this PredictionExplanationsRow describes

  • prediction (float) – the output of the model for this row

  • adjusted_prediction (float or None) – adjusted prediction value for projects that provide this information, None otherwise

  • prediction_values (list) – an array of dictionaries with a schema described as PredictionValue

  • adjusted_prediction_values (list) – same as prediction_values but for adjusted predictions

  • prediction_explanations (list) – an array of dictionaries with a schema described as PredictionExplanation

class datarobot.models.prediction_explanations.PredictionExplanationsPage

Represents a batch of prediction explanations received by one request.

Variables:
  • id (str) – id of the prediction explanations computation result

  • data (list[dict]) – list of raw prediction explanations; each row corresponds to a row of the prediction dataset

  • count (int) – total number of rows computed

  • previous_page (str) – where to retrieve previous page of prediction explanations, None if current page is the first

  • next_page (str) – where to retrieve next page of prediction explanations, None if current page is the last

  • prediction_explanations_record_location (str) – where to retrieve the prediction explanations metadata

  • adjustment_method (str) – Adjustment method that was applied to predictions, or ‘N/A’ if no adjustments were done.

classmethod get(project_id, prediction_explanations_id, limit=None, offset=0, exclude_adjusted_predictions=True)

Retrieve prediction explanations.

Parameters:
  • project_id (str) – id of the project the model belongs to

  • prediction_explanations_id (str) – id of the prediction explanations

  • limit (int or None) – the number of records to return; the server will use a (possibly finite) default if not specified

  • offset (int or None) – the number of records to skip, default 0

  • exclude_adjusted_predictions (bool) – Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

Returns:

prediction_explanations – The queried instance.

Return type:

PredictionExplanationsPage

class datarobot.models.ShapMatrix

Represents SHAP based prediction explanations and provides access to score values.

Variables:
  • project_id (str) – id of the project the model belongs to

  • shap_matrix_id (str) – id of the generated SHAP matrix

  • model_id (str) – id of the model used to

  • dataset_id (str) – id of the prediction dataset SHAP values were computed for

Examples

import datarobot as dr

# request SHAP matrix calculation
shap_matrix_job = dr.ShapMatrix.create(project_id, model_id, dataset_id)
shap_matrix = shap_matrix_job.get_result_when_complete()

# list available SHAP matrices
shap_matrices = dr.ShapMatrix.list(project_id)
shap_matrix = shap_matrices[0]

# get SHAP matrix as dataframe
shap_matrix_values = shap_matrix.get_as_dataframe()
classmethod create(cls, project_id, model_id, dataset_id)

Calculate SHAP based prediction explanations against previously uploaded dataset.

Parameters:
  • project_id (str) – id of the project the model belongs to

  • model_id (str) – id of the model for which prediction explanations are requested

  • dataset_id (str) – id of the prediction dataset for which prediction explanations are requested (as uploaded from Project.upload_dataset)

Returns:

job – The job computing the SHAP based prediction explanations

Return type:

ShapMatrixJob

Raises:
  • ClientError – If the server responded with 4xx status. Possible reasons are project, model or dataset don’t exist, user is not allowed or model doesn’t support SHAP based prediction explanations

  • ServerError – If the server responded with 5xx status

classmethod list(cls, project_id)

Fetch all the computed SHAP prediction explanations for a project.

Parameters:

project_id (str) – id of the project

Returns:

A list of ShapMatrix objects

Return type:

List of ShapMatrix

Raises:
classmethod get(cls, project_id, id)

Retrieve the specific SHAP matrix.

Parameters:
  • project_id (str) – id of the project the model belongs to

  • id (str) – id of the SHAP matrix

Return type:

ShapMatrix object representing specified record

get_as_dataframe(read_timeout=60)

Retrieve SHAP matrix values as dataframe.

Return type:

DataFrame

Returns:

  • dataframe (pandas.DataFrame) – A dataframe with SHAP scores

  • read_timeout (int (optional, default 60)) – .. versionadded:: 2.29

    Wait this many seconds for the server to respond.

Raises:
class datarobot.models.ClassListMode

Calculate prediction explanations for the specified classes in each row.

Variables:

class_names (list) – List of class names that will be explained for each dataset row.

get_api_parameters(batch_route=False)

Get parameters passed in corresponding API call

Parameters:

batch_route (bool) – Batch routes describe prediction calls with all possible parameters, so to distinguish explanation parameters from others they have prefix in parameters.

Return type:

dict

class datarobot.models.TopPredictionsMode

Calculate prediction explanations for the number of top predicted classes in each row.

Variables:

num_top_classes (int) – Number of top predicted classes [1..10] that will be explained for each dataset row.

get_api_parameters(batch_route=False)

Get parameters passed in corresponding API call

Parameters:

batch_route (bool) – Batch routes describe prediction calls with all possible parameters, so to distinguish explanation parameters from others they have prefix in parameters.

Return type:

dict

Rating table

class datarobot.models.RatingTable

Interface to modify and download rating tables.

Variables:
  • id (str) – The id of the rating table.

  • project_id (str) – The id of the project this rating table belongs to.

  • rating_table_name (str) – The name of the rating table.

  • original_filename (str) – The name of the file used to create the rating table.

  • parent_model_id (str) – The model id of the model the rating table was validated against.

  • model_id (str) – The model id of the model that was created from the rating table. Can be None if a model has not been created from the rating table.

  • model_job_id (str) – The id of the job to create a model from this rating table. Can be None if a model has not been created from the rating table.

  • validation_job_id (str) – The id of the created job to validate the rating table. Can be None if the rating table has not been validated.

  • validation_error (str) – Contains a description of any errors caused during validation.

classmethod from_server_data(data, should_warn=True, keep_attrs=None)

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters:
  • data (dict) – The directly translated dict of JSON from the server. No casing fixes have taken place

  • should_warn (bool) – Whether or not to issue a warning if an invalid rating table is being retrieved.

Return type:

RatingTable

classmethod get(project_id, rating_table_id)

Retrieve a single rating table

Parameters:
  • project_id (str) – The ID of the project the rating table is associated with.

  • rating_table_id (str) – The ID of the rating table

Returns:

rating_table – The queried instance

Return type:

RatingTable

classmethod create(project_id, parent_model_id, filename, rating_table_name='Uploaded Rating Table')

Uploads and validates a new rating table CSV

Parameters:
  • project_id (str) – id of the project the rating table belongs to

  • parent_model_id (str) – id of the model for which this rating table should be validated against

  • filename (str) – The path of the CSV file containing the modified rating table.

  • rating_table_name (Optional[str]) – A human friendly name for the new rating table. The string may be truncated and a suffix may be added to maintain unique names of all rating tables.

Returns:

job – an instance of created async job

Return type:

Job

Raises:
download(filepath)

Download a csv file containing the contents of this rating table

Parameters:

filepath (str) – The path at which to save the rating table file.

Return type:

None

rename(rating_table_name)

Renames a rating table to a different name.

Parameters:

rating_table_name (str) – The new name to rename the rating table to.

Return type:

None

create_model()

Creates a new model from this rating table record. This rating table must not already be associated with a model and must be valid.

Returns:

job – an instance of created async job

Return type:

Job

Raises:
  • ClientError – Raised if creating model from a RatingTable that failed validation

  • JobAlreadyRequested – Raised if creating model from a RatingTable that is already associated with a RatingTableModel

ROC curve

class datarobot.models.roc_curve.RocCurve

ROC curve data for model.

Variables:
  • source (str) – ROC curve data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.

  • roc_points (list of dict) – List of precalculated metrics associated with thresholds for ROC curve.

  • negative_class_predictions (list of float) – List of predictions from example for negative class

  • positive_class_predictions (list of float) – List of predictions from example for positive class

  • source_model_id (str) – ID of the model this ROC curve represents; in some cases, insights from the parent of a frozen model may be used

  • data_slice_id (str) – ID of the data slice this ROC curve represents.

classmethod from_server_data(data, keep_attrs=None, use_insights_format=False, **kwargs)

Overwrite APIObject.from_server_data to handle roc curve data retrieved from either legacy URL or /insights/ new URL.

Parameters:
  • data (dict) – The directly translated dict of JSON from the server. No casing fixes have taken place.

  • keep_attrs (iterable) – List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

  • use_insights_format (Optional[bool]) – Whether to repack the data from the format used in the GET /insights/RocCur/ URL to the format used in the legacy URL.

Return type:

RocCurve

class datarobot.models.roc_curve.LabelwiseRocCurve

Labelwise ROC curve data for one label and one source.

Variables:
  • source (str) – ROC curve data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.

  • roc_points (list of dict) – List of precalculated metrics associated with thresholds for ROC curve.

  • negative_class_predictions (list of float) – List of predictions from example for negative class

  • positive_class_predictions (list of float) – List of predictions from example for positive class

  • source_model_id (str) – ID of the model this ROC curve represents; in some cases, insights from the parent of a frozen model may be used

  • label (str) – Label name for

  • kolmogorov_smirnov_metric (float) – Kolmogorov-Smirnov metric value for label

  • auc (float) – AUC metric value for label

Word Cloud

class datarobot.models.word_cloud.WordCloud

Word cloud data for the model.

Notes

WordCloudNgram is a dict containing the following:

  • ngram (str) Word or ngram value.

  • coefficient (float) Value from [-1.0, 1.0] range, describes effect of this ngram on the target. Large negative value means strong effect toward negative class in classification and smaller target value in regression models. Large positive - toward positive class and bigger value respectively.

  • count (int) Number of rows in the training sample where this ngram appears.

  • frequency (float) Value from (0.0, 1.0] range, relative frequency of given ngram to most frequent ngram.

  • is_stopword (bool) True for ngrams that DataRobot evaluates as stopwords.

  • class (str or None) For classification - values of the target class for corresponding word or ngram. For regression - None.

Variables:

ngrams (list of dict) – List of dicts with schema described as WordCloudNgram above.

most_frequent(top_n=5)

Return most frequent ngrams in the word cloud.

Parameters:

top_n (int) – Number of ngrams to return

Returns:

Up to top_n top most frequent ngrams in the word cloud. If top_n bigger then total number of ngrams in word cloud - return all sorted by frequency in descending order.

Return type:

list of dict

most_important(top_n=5)

Return most important ngrams in the word cloud.

Parameters:

top_n (int) – Number of ngrams to return

Returns:

Up to top_n top most important ngrams in the word cloud. If top_n bigger then total number of ngrams in word cloud - return all sorted by absolute coefficient value in descending order.

Return type:

list of dict

ngrams_per_class()

Split ngrams per target class values. Useful for multiclass models.

Returns:

Dictionary in the format of (class label) -> (list of ngrams for that class)

Return type:

dict

class datarobot.models.word_cloud.WordCloudNgram