Models

Generic models

class datarobot.models.GenericModel

GenericModel [ModelRecord] is the object which is returned from /modelRecords list route. Contains most generic model information.

Model

class datarobot.models.Model

A model trained on a project’s dataset capable of making predictions.

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. See datetime partitioned project documentation for more information on duration strings.

Variables:
  • id (str) – ID of the model.

  • project_id (str) – ID of the project the model belongs to.

  • processes (List[str]) – Processes used by the model.

  • featurelist_name (str) – Name of the featurelist used by the model.

  • featurelist_id (str) – ID of the featurelist used by the model.

  • sample_pct (float or None) – Percentage of the project dataset used in model training. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date / training_end_date instead.

  • training_row_count (int or None) – Number of rows of the project dataset used in model training. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date is used for training_row_count.

  • training_duration (str or None) – For datetime partitioned projects only. If specified, defines the duration spanned by the data used to train the model and evaluate backtest scores.

  • training_start_date (datetime or None) – For frozen models in datetime partitioned projects only. If specified, the start date of the data used to train the model.

  • training_end_date (datetime or None) – For frozen models in datetime partitioned projects only. If specified, the end date of the data used to train the model.

  • model_type (str) – Type of model, for example ‘Nystroem Kernel SVM Regressor’.

  • model_category (str) – Category of model, for example ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models.

  • is_frozen (bool) – Whether this model is a frozen model.

  • is_n_clusters_dynamically_determined (bool) – (New in version v2.27) Optional. Whether this model determines the number of clusters dynamically.

  • blueprint_id (str) – ID of the blueprint used to build this model.

  • metrics (dict) – Mapping from each metric to the model’s score for that metric.

  • monotonic_increasing_featurelist_id (str) – Optional. ID of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

  • monotonic_decreasing_featurelist_id (str) – Optional. ID of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

  • n_clusters (int) – (New in version v2.27) Optional. Number of data clusters discovered by model.

  • has_empty_clusters (bool) – (New in version v2.27) Optional. Whether clustering model produces empty clusters.

  • supports_monotonic_constraints (bool) – Optional. Whether this model supports enforcing monotonic constraints.

  • is_starred (bool) – Whether this model is marked as a starred model.

  • prediction_threshold (float) – Binary classification projects only. Threshold used for predictions.

  • prediction_threshold_read_only (bool) – Whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

  • model_number (integer) – Model number assigned to the model.

  • parent_model_id (str or None) – (New in version v2.20) ID of the model that tuning parameters are derived from.

  • supports_composable_ml (bool or None) – (New in version v2.26) Whether this model is supported Composable ML.

__init__(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, is_n_clusters_dynamically_determined=None, blueprint_id=None, metrics=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, n_clusters=None, has_empty_clusters=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)
classmethod get(project, model_id)

Retrieve a specific model.

Parameters:
  • project (str) – Project ID.

  • model_id (str) – ID of the model to retrieve.

Returns:

model – Queried instance.

Return type:

Model

Raises:

ValueError – passed project parameter value is of not supported type

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters:
  • params (dict) – Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

  • description (str) – Human-readable string describing the newly advanced-tuned model

Returns:

The created job to build the model

Return type:

ModelJob

cross_validate()

Run cross validation on the model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:

The created job to build the model

Return type:

ModelJob

delete()

Delete a model from the project’s leaderboard.

Return type:

None

download_scoring_code(file_name, source_code=False)

Download the Scoring Code JAR.

Parameters:
  • file_name (str) – File path where scoring code will be saved.

  • source_code (Optional[bool]) – Set to True to download source code archive. It will not be executable.

Return type:

None

download_training_artifact(file_name)

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters:

file_name (str) – File path where trained model artifact(s) will be saved.

classmethod from_data(data)

Instantiate an object of this class using a dict.

Parameters:

data (dict) – Correctly snake_cased keys and their values.

Return type:

TypeVar(T, bound= APIObject)

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys

  • parameter_name : (str) name of the parameter (unique per task, see below)

  • parameter_id : (str) opaque ID string uniquely identifying parameter

  • default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)

  • current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.

  • task_name : (str) name of the task that this parameter belongs to

  • constraints: (dict) see the notes below

  • vertex_id: (str) ID of vertex that this parameter belongs to

Return type:

dict

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.

  • ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.

  • unicode: The parameter may be any Python unicode object.

  • int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].

  • float: The value may be an object of type float within the specified range (inclusive).

  • intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)

Retrieve a list of all confusion matrices available for the model.

Parameters:

fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:

Data for all available confusion charts for model.

Return type:

list of ConfusionChart

get_all_feature_impacts(data_slice_filter=None)

Retrieve a list of all feature impact results available for the model.

Parameters:

data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.

Returns:

Data for all available model feature impacts. Or an empty list if not data found.

Return type:

list of dicts

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()
get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all Lift charts available for the model.

Parameters:
  • fallback_to_parent_insights (Optional[bool]) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:

Data for all available model lift charts. Or an empty list if no data found.

Return type:

list of LiftChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()
get_all_multiclass_lift_charts(fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>, target_class=None)

Retrieve a list of all Lift charts available for the model.

Parameters:
  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

  • target_class (str, optional) – Lift chart target class name.

Returns:

Data for all available model lift charts.

Return type:

list of LiftChart

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all residuals charts available for the model.

Parameters:
  • fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:

Data for all available model residuals charts.

Return type:

list of ResidualsChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()
get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all ROC curves available for the model.

Parameters:
  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:

Data for all available model ROC curves. Or an empty list if no RocCurves are found.

Return type:

list of RocCurve

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()
get_confusion_chart(source, fallback_to_parent_insights=False)

Retrieve a multiclass model’s confusion matrix for the specified source.

Parameters:
  • source (str) – Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:

Model ConfusionChart data

Return type:

ConfusionChart

Raises:

ClientError – If the insight is not available for this model

get_cross_class_accuracy_scores()

Retrieves a list of Cross Class Accuracy scores for the model.

Return type:

json

get_cross_validation_scores(partition=None, metric=None)

Return a dictionary, keyed by metric, showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:
  • partition (float) – optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.

  • metric (unicode) – optional name of the metric to filter to resulting cross validation scores by

Returns:

cross_validation_scores – A dictionary keyed by metric showing cross validation scores per partition.

Return type:

dict

get_data_disparity_insights(feature, class_name1, class_name2)

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters:
  • feature (str) – Bias and Fairness protected feature name.

  • class_name1 (str) – One of the compared classes

  • class_name2 (str) – Another compared class

Return type:

json

get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)

Retrieve a list of Per Class Bias insights for the model.

Parameters:
  • fairness_metrics_set (Optional[str]) – Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

  • offset (Optional[int]) – Number of items to skip.

  • limit (Optional[int]) – Number of items to return.

Return type:

json

get_feature_effect(source, data_slice_id=None)

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
  • source (string) – The source Feature Effects are retrieved for.

  • data_slice_id (string, optional) – ID for the data slice used in the request. If None, retrieve unsliced insight data.

Returns:

feature_effects – The feature effects data.

Return type:

FeatureEffects

Raises:

ClientError – If the feature effects have not been computed or source is not valid value.

get_feature_effect_metadata()

Retrieve Feature Effects metadata. Response contains status and available model sources.

  • Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.

  • When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.

  • Feature Effects for holdout is not available when holdout was not unlocked for the project.

Use source to retrieve Feature Effects, selecting one of the provided sources.

Returns:

feature_effect_metadata

Return type:

FeatureEffectMetadata

get_feature_effects_multiclass(source='training', class_=None)

Retrieve Feature Effects for the multiclass model.

Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
  • source (str) – The source Feature Effects are retrieved for.

  • class (str or None) – The class name Feature Effects are retrieved for.

Returns:

The list of multiclass feature effects.

Return type:

list

Raises:

ClientError – If Feature Effects have not been computed or source is not valid value.

get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters:
  • with_metadata (bool) – The flag indicating if the result should include the metadata as well.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.

Returns:

The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:

  • featureImpacts - Feature Impact data as a dictionary. Each item is a dict with

    keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.

  • shapBased - A boolean that indicates whether Feature Impact was calculated using

    Shapley values.

  • ranRedundancyDetection - A boolean that indicates whether redundant feature

    identification was run while calculating this Feature Impact.

  • rowCount - An integer or None that indicates the number of rows that was used to

    calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.

  • count - An integer with the number of features under the featureImpacts.

Return type:

list or dict

Raises:
  • ClientError – If the feature impacts have not been computed.

  • ValueError – If data_slice_filter passed as None

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:

features – The names of the features used in the model.

Return type:

List[str]

get_frozen_child_models()

Retrieve the IDs for all models that are frozen from this model.

Return type:

A list of Models

get_labelwise_roc_curves(source, fallback_to_parent_insights=False)

Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .

Added in version v2.24.

Parameters:
  • source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns:

Labelwise ROC Curve instances for source and all labels

Return type:

list of LabelwiseRocCurve

Raises:

ClientError – If the insight is not available for this model

get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the model Lift chart for the specified source.

Parameters:
  • source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.

  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

Returns:

Model lift chart data

Return type:

LiftChart

Raises:
  • ClientError – If the insight is not available for this model

  • ValueError – If data_slice_filter passed as None

get_missing_report_info()

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns:

The queried model missing report, sorted by missing count (DESCENDING order).

Return type:

An iterable of MissingReportPerFeature

get_model_blueprint_chart()

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns:

The queried model blueprint chart.

Return type:

ModelBlueprintChart

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:

All documents available for the model.

Return type:

list of BlueprintTaskDocument

get_model_blueprint_json()

Get the blueprint json representation used by this model.

Returns:

Json representation of the blueprint stages.

Return type:

BlueprintJson

get_multiclass_feature_impact()

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:

feature_impacts – The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Return type:

list of dict

Raises:

ClientError – If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>, target_class=None)

Retrieve model Lift chart for the specified source.

Parameters:
  • source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

  • target_class (str, optional) – Lift chart target class name.

Returns:

Model lift chart data for each saved target class

Return type:

list of LiftChart

Raises:

ClientError – If the insight is not available for this model

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)

Retrieve model Lift charts for the specified source.

Added in version v2.24.

Parameters:
  • source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:

Model lift chart data for each saved target class

Return type:

list of LiftChart

Raises:

ClientError – If the insight is not available for this model

get_num_iterations_trained()

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns:

  • projectId (str) – id of project containing the model

  • modelId (str) – id of the model

  • data (array) – list of numEstimatorsItem objects, one for each modeling stage.

  • numEstimatorsItem will be of the form

  • stage (str) – indicates the modeling stage (for multi-stage models); None of single-stage models

  • numIterations (int) – the number of estimators or iterations trained by the model

get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)

Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source.

Parameters:
  • source (string) – The source Feature Effects are retrieved for.

  • max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effect job to complete before erroring.

  • row_count (Optional[int]) – (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

  • data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

feature_effects – The Feature Effects data.

Return type:

FeatureEffects

get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)

Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.

Parameters:
  • source (string) – The source Feature Effects retrieve for.

  • class (str or None) – The class name Feature Effects retrieve for.

  • row_count (int) – The number of rows from dataset to use for Feature Impact calculation.

  • top_n_features (int or None) – Number of top features (ranked by Feature Impact) used to calculate Feature Effects.

  • features (list or None) – The list of features used to calculate Feature Effects.

  • max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effects job to complete before erroring.

Returns:

feature_effects – The list of multiclass feature effects data.

Return type:

list of FeatureEffectsMulticlass

get_or_request_feature_impact(max_wait=600, **kwargs)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:
  • max_wait (Optional[int]) – The maximum time to wait for a requested feature impact job to complete before erroring

  • **kwargs – Arbitrary keyword arguments passed to request_feature_impact.

Returns:

feature_impacts – The feature impact data. See get_feature_impact for the exact schema.

Return type:

list or dict

get_parameters()

Retrieve model parameters.

Returns:

Model parameters for this model.

Return type:

ModelParameters

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:

Model ParetoFront data

Return type:

ParetoFront

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:

prime_eligibility – a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

Return type:

dict

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve model residuals chart for the specified source.

Parameters:
  • source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.

Returns:

Model residuals chart data

Return type:

ResidualsChart

Raises:
  • ClientError – If the insight is not available for this model

  • ValueError – If data_slice_filter passed as None

get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.

Parameters:
  • source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.

  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns:

Model ROC curve data

Return type:

RocCurve

Raises:
  • ClientError – If the insight is not available for this model

  • (New in version v3.0) TypeError – If the underlying project type is multilabel

  • ValueError – If data_slice_filter passed as None

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:

rulesets

Return type:

list of Ruleset

get_supported_capabilities()

Retrieves a summary of the capabilities supported by a model.

Added in version v2.14.

Returns:

  • supportsBlending (bool) – whether the model supports blending

  • supportsMonotonicConstraints (bool) – whether the model supports monotonic constraints

  • hasWordCloud (bool) – whether the model has word cloud data available

  • eligibleForPrime (bool) – (Deprecated in version v3.6) whether the model is eligible for Prime

  • hasParameters (bool) – whether the model has parameters that can be retrieved

  • supportsCodeGeneration (bool) – (New in version v2.18) whether the model supports code generation

  • supportsShap (bool) –

    (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based

    feature Importance

  • supportsEarlyStopping (bool) – (New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_uri()
Returns:

url – Permanent static hyperlink to this model at leaderboard.

Return type:

str

get_word_cloud(exclude_stop_words=False)

Retrieve word cloud data for the model.

Parameters:

exclude_stop_words (Optional[bool]) – Set to True if you want stopwords filtered out of response.

Returns:

Word cloud data for the model.

Return type:

WordCloud

incremental_train(data_stage_id, training_data_name=None)

Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.

Return type:

ModelJob

classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)

Retrieve paginated model records, sorted by scores, with optional filtering.

Parameters:
  • sort_by_partition (str, one of validation, backtesting, crossValidation or holdout) – Set the partition to use for sorted (by score) list of models. validation is the default.

  • sort_by_metric (str) – Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.

  • with_metric (str) – For a single-metric list of results, specify that project metric.

  • search_term (str) – If specified, only models containing the term in their name or processes are returned.

  • featurelists (List[str]) – If specified, only models trained on selected featurelists are returned.

  • families (List[str]) – If specified, only models belonging to selected families are returned.

  • blueprints (List[str]) – If specified, only models trained on specified blueprint IDs are returned.

  • labels (List[str], starred or prepared for deployment) – If specified, only models tagged with all listed labels are returned.

  • characteristics (List[str]) – If specified, only models matching all listed characteristics are returned.

  • training_filters (List[str]) – If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings

  • number_of_clusters (list of int) – Filter models by number of clusters. Applicable only in unsupervised clustering projects.

  • limit (int)

  • offset (int)

Returns:

generic_models

Return type:

list of GenericModel

open_in_browser()

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type:

None

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:

job – the job generating the rulesets

Return type:

Job

request_cross_class_accuracy_scores()

Request data disparity insights to be computed for the model.

Returns:

status_id – A statusId of computation request.

Return type:

str

request_data_disparity_insights(feature, compared_class_names)

Request data disparity insights to be computed for the model.

Parameters:
  • feature (str) – Bias and Fairness protected feature name.

  • compared_class_names (list(str)) – List of two classes to compare

Returns:

status_id – A statusId of computation request.

Return type:

str

request_external_test(dataset_id, actual_value_column=None)

Request external test to compute scores and insights on an external test dataset

Parameters:
  • dataset_id (string) – The dataset to make predictions against (as uploaded from Project.upload_dataset)

  • actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

Returns:

job – a Job representing external dataset insights computation

Return type:

Job

request_fairness_insights(fairness_metrics_set=None)

Request fairness insights to be computed for the model.

Parameters:

fairness_metrics_set (Optional[str]) – Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

Returns:

status_id – A statusId of computation request.

Return type:

str

request_feature_effect(row_count=None, data_slice_id=None)

Submit request to compute Feature Effects for the model.

See get_feature_effect for more information on the result of the job.

Parameters:
  • row_count (int) – (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

  • data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

job – A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.

Return type:

Job

Raises:

JobAlreadyRequested – If the feature effect have already been requested.

request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)

Request Feature Effects computation for the multiclass model.

See get_feature_effect for more information on the result of the job.

Parameters:
  • row_count (int) – The number of rows from dataset to use for Feature Impact calculation.

  • top_n_features (int or None) – Number of top features (ranked by feature impact) used to calculate Feature Effects.

  • features (list or None) – The list of features used to calculate Feature Effects.

Returns:

job – A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.

Return type:

Job

request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters:
  • row_count (Optional[int]) – The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.

  • with_metadata (Optional[bool]) – Flag indicating whether the result should include the metadata. If true, metadata is included.

  • data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

job – Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.

Return type:

Job or status_id

Raises:

JobAlreadyRequested – If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)

Train a new frozen model with parameters from this model.

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
  • training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

  • training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

  • training_start_date (datetime.datetime, optional) – the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

  • training_end_date (datetime.datetime, optional) – the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

  • time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

  • sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

Returns:

model_job – the modeling job training a frozen model

Return type:

ModelJob

request_frozen_model(sample_pct=None, training_row_count=None)

Train a new frozen model with parameters from this model

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:
  • sample_pct (float) – optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.

  • training_row_count (int) – (New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns:

model_job – the modeling job training a frozen model

Return type:

ModelJob

request_lift_chart(source, data_slice_id=None)

Request the model Lift Chart for the specified source.

Parameters:
  • source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

status_check_job – Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_per_class_fairness_insights(fairness_metrics_set=None)

Request per-class fairness insights be computed for the model.

Parameters:

fairness_metrics_set (Optional[str]) – The fairness metric used to calculate the fairness scores. Value can be any one of <datarobot.enums.FairnessMetricsSet>.

Returns:

status_check_job – The returned object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)

Requests predictions against a previously uploaded dataset.

Parameters:
  • dataset_id (string, optional) – The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)

  • dataset (Dataset, optional) – The dataset to make predictions against (as uploaded from Project.upload_dataset)

  • dataframe (pd.DataFrame, optional) – (New in v3.0) The dataframe to make predictions against

  • file_path (Optional[str]) – (New in v3.0) Path to file to make predictions against

  • file (IOBase, optional) – (New in v3.0) File to make predictions against

  • include_prediction_intervals (Optional[bool]) – (New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.

  • prediction_intervals_size (Optional[int]) – (New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).

  • forecast_point (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.

  • predictions_start_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.

  • predictions_end_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.

  • actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

  • explanation_algorithm ((New in version v2.21) optional; If set to 'shap', the) – response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).

  • max_explanations ((New in version v2.21) int optional; specifies the maximum number of) – explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.

  • max_ngram_explanations (optional;  int or str) – (New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.

Returns:

job – The job computing the predictions

Return type:

PredictJob

request_residuals_chart(source, data_slice_id=None)

Request the model residuals chart for the specified source.

Parameters:
  • source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

status_check_job – Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_roc_curve(source, data_slice_id=None)

Request the model Roc Curve for the specified source.

Parameters:
  • source (str) – Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

status_check_job – Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)

Start a job to build training predictions

Parameters:
  • data_subset (str) –

    data set definition to build predictions on. Choices are:

    • dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for

      models in datetime partitioned projects

    • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for

      all data except training set. Not valid for models in datetime partitioned projects

    • dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only

    • dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading

      the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.

  • explanation_algorithm (dr.enums.EXPLANATIONS_ALGORITHM) – (New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).

  • max_explanations (int) – (New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the max_explanations, the sum of remaining values will also be returned as shap_remaining_total. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored if explanation_algorithm is not set.

Returns:

an instance of created async job

Return type:

Job

retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)

Submit a job to the queue to train a blender model.

Parameters:
  • sample_pct (Optional[float]) – The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.

  • featurelist_id (Optional[str]) – The featurelist id

  • training_row_count (Optional[int]) – The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.

  • n_clusters (Optional[int]) – (new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.

Returns:

job – The created job that is retraining the model

Return type:

ModelJob

set_prediction_threshold(threshold)

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters:

threshold (float) – only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:

Session for setting up and running Advanced Tuning on a model

Return type:

AdvancedTuningSession

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Note

For datetime partitioned projects, see train_datetime instead.

Parameters:
  • sample_pct (Optional[float]) – The amount of data to use for training, as a percentage of the project dataset from 0 to 100.

  • featurelist_id (Optional[str]) – The identifier of the featurelist to use. If not defined, the featurelist of this model is used.

  • scoring_type (Optional[str]) – Either validation or crossValidation (also dr.SCORING_TYPE.validation or dr.SCORING_TYPE.cross_validation). validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, crossValidation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.

  • training_row_count (Optional[int]) – The number of rows to use to train the requested model.

  • monotonic_increasing_featurelist_id (str) – (new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

  • monotonic_decreasing_featurelist_id (str) – (new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:

model_job_id – id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Return type:

str

Examples

project = Project.get('project-id')
model = Model.get('project-id', 'model-id')
model_job_id = model.train(training_row_count=project.max_train_rows)
train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)

Trains this model on a different featurelist or sample size.

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
  • featurelist_id (Optional[str]) – the featurelist to use to train the model. If not specified, the featurelist of this model is used.

  • training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified, neither training_duration nor use_project_settings may be specified.

  • training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should span. If specified, neither training_row_count nor use_project_settings may be specified.

  • use_project_settings (Optional[bool]) – (New in version v2.20) defaults to False. If True, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neither training_row_count nor training_duration may be specified.

  • time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

  • sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

  • monotonic_increasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

  • monotonic_decreasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

  • n_clusters (Optional[int]) – (New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.

Returns:

job – the created job to build the model

Return type:

ModelJob

train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters:
  • data_stage_id (str) – The id of the data stage to use for training.

  • training_data_name (Optional[str]) – The name of the iteration or data stage to indicate what the incremental learning was performed on.

  • data_stage_encoding (Optional[str]) – The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252

  • data_stage_encoding – The delimiter used by the data in the data stage (default: ‘,’).

  • data_stage_compression (Optional[str]) – The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip

Returns:

job – The created job that is retraining the model

Return type:

ModelJob

unstar_model()

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

class datarobot.models.model.AdvancedTuningParamsType
class datarobot.models.model.BiasMitigationFeatureInfo

Prime models

class datarobot.models.PrimeModel

Represents a DataRobot Prime model approximating a parent model with downloadable code.

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Variables:
  • id (str) – the id of the model

  • project_id (str) – the id of the project the model belongs to

  • processes (List[str]) – the processes used by the model

  • featurelist_name (str) – the name of the featurelist used by the model

  • featurelist_id (str) – the id of the featurelist used by the model

  • sample_pct (float) – the percentage of the project dataset used in training the model

  • training_row_count (int or None) – the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.

  • training_duration (str or None) – only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

  • training_start_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

  • training_end_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

  • model_type (str) – what model this is, e.g. ‘DataRobot Prime’

  • model_category (str) – what kind of model this is - always ‘prime’ for DataRobot Prime models

  • is_frozen (bool) – whether this model is a frozen model

  • blueprint_id (str) – the id of the blueprint used in this model

  • metrics (dict) – a mapping from each metric to the model’s scores for that metric

  • ruleset (Ruleset) – the ruleset used in the Prime model

  • parent_model_id (str) – the id of the model that this Prime model approximates

  • monotonic_increasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

  • monotonic_decreasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

  • supports_monotonic_constraints (bool) – optional, whether this model supports enforcing monotonic constraints

  • is_starred (bool) – whether this model is marked as starred

  • prediction_threshold (float) – for binary classification projects, the threshold used for predictions

  • prediction_threshold_read_only (bool) – indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

  • supports_composable_ml (bool or None) – (New in version v2.26) whether this model is supported in the Composable ML.

__init__(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, ruleset_id=None, rule_count=None, score=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)
classmethod get(project_id, model_id)

Retrieve a specific prime model.

Parameters:
  • project_id (str) – The id of the project the prime model belongs to

  • model_id (str) – The model_id of the prime model to retrieve.

Returns:

model – The queried instance.

Return type:

PrimeModel

request_download_validation(language)

Prep and validate the downloadable code for the ruleset associated with this model.

Parameters:

language (str) – the language the code should be downloaded in - see datarobot.enums.PRIME_LANGUAGE for available languages

Returns:

job – A job tracking the code preparation and validation

Return type:

Job

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters:
  • params (dict) – Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

  • description (str) – Human-readable string describing the newly advanced-tuned model

Returns:

The created job to build the model

Return type:

ModelJob

cross_validate()

Run cross validation on the model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:

The created job to build the model

Return type:

ModelJob

delete()

Delete a model from the project’s leaderboard.

Return type:

None

download_scoring_code(file_name, source_code=False)

Download the Scoring Code JAR.

Parameters:
  • file_name (str) – File path where scoring code will be saved.

  • source_code (Optional[bool]) – Set to True to download source code archive. It will not be executable.

Return type:

None

download_training_artifact(file_name)

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters:

file_name (str) – File path where trained model artifact(s) will be saved.

classmethod from_data(data)

Instantiate an object of this class using a dict.

Parameters:

data (dict) – Correctly snake_cased keys and their values.

Return type:

TypeVar(T, bound= APIObject)

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys

  • parameter_name : (str) name of the parameter (unique per task, see below)

  • parameter_id : (str) opaque ID string uniquely identifying parameter

  • default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)

  • current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.

  • task_name : (str) name of the task that this parameter belongs to

  • constraints: (dict) see the notes below

  • vertex_id: (str) ID of vertex that this parameter belongs to

Return type:

dict

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.

  • ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.

  • unicode: The parameter may be any Python unicode object.

  • int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].

  • float: The value may be an object of type float within the specified range (inclusive).

  • intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)

Retrieve a list of all confusion matrices available for the model.

Parameters:

fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:

Data for all available confusion charts for model.

Return type:

list of ConfusionChart

get_all_feature_impacts(data_slice_filter=None)

Retrieve a list of all feature impact results available for the model.

Parameters:

data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.

Returns:

Data for all available model feature impacts. Or an empty list if not data found.

Return type:

list of dicts

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()
get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all Lift charts available for the model.

Parameters:
  • fallback_to_parent_insights (Optional[bool]) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:

Data for all available model lift charts. Or an empty list if no data found.

Return type:

list of LiftChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()
get_all_multiclass_lift_charts(fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>, target_class=None)

Retrieve a list of all Lift charts available for the model.

Parameters:
  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

  • target_class (str, optional) – Lift chart target class name.

Returns:

Data for all available model lift charts.

Return type:

list of LiftChart

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all residuals charts available for the model.

Parameters:
  • fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:

Data for all available model residuals charts.

Return type:

list of ResidualsChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()
get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all ROC curves available for the model.

Parameters:
  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:

Data for all available model ROC curves. Or an empty list if no RocCurves are found.

Return type:

list of RocCurve

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()
get_confusion_chart(source, fallback_to_parent_insights=False)

Retrieve a multiclass model’s confusion matrix for the specified source.

Parameters:
  • source (str) – Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:

Model ConfusionChart data

Return type:

ConfusionChart

Raises:

ClientError – If the insight is not available for this model

get_cross_class_accuracy_scores()

Retrieves a list of Cross Class Accuracy scores for the model.

Return type:

json

get_cross_validation_scores(partition=None, metric=None)

Return a dictionary, keyed by metric, showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:
  • partition (float) – optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.

  • metric (unicode) – optional name of the metric to filter to resulting cross validation scores by

Returns:

cross_validation_scores – A dictionary keyed by metric showing cross validation scores per partition.

Return type:

dict

get_data_disparity_insights(feature, class_name1, class_name2)

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters:
  • feature (str) – Bias and Fairness protected feature name.

  • class_name1 (str) – One of the compared classes

  • class_name2 (str) – Another compared class

Return type:

json

get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)

Retrieve a list of Per Class Bias insights for the model.

Parameters:
  • fairness_metrics_set (Optional[str]) – Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

  • offset (Optional[int]) – Number of items to skip.

  • limit (Optional[int]) – Number of items to return.

Return type:

json

get_feature_effect(source, data_slice_id=None)

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
  • source (string) – The source Feature Effects are retrieved for.

  • data_slice_id (string, optional) – ID for the data slice used in the request. If None, retrieve unsliced insight data.

Returns:

feature_effects – The feature effects data.

Return type:

FeatureEffects

Raises:

ClientError – If the feature effects have not been computed or source is not valid value.

get_feature_effect_metadata()

Retrieve Feature Effects metadata. Response contains status and available model sources.

  • Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.

  • When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.

  • Feature Effects for holdout is not available when holdout was not unlocked for the project.

Use source to retrieve Feature Effects, selecting one of the provided sources.

Returns:

feature_effect_metadata

Return type:

FeatureEffectMetadata

get_feature_effects_multiclass(source='training', class_=None)

Retrieve Feature Effects for the multiclass model.

Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
  • source (str) – The source Feature Effects are retrieved for.

  • class (str or None) – The class name Feature Effects are retrieved for.

Returns:

The list of multiclass feature effects.

Return type:

list

Raises:

ClientError – If Feature Effects have not been computed or source is not valid value.

get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters:
  • with_metadata (bool) – The flag indicating if the result should include the metadata as well.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.

Returns:

The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:

  • featureImpacts - Feature Impact data as a dictionary. Each item is a dict with

    keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.

  • shapBased - A boolean that indicates whether Feature Impact was calculated using

    Shapley values.

  • ranRedundancyDetection - A boolean that indicates whether redundant feature

    identification was run while calculating this Feature Impact.

  • rowCount - An integer or None that indicates the number of rows that was used to

    calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.

  • count - An integer with the number of features under the featureImpacts.

Return type:

list or dict

Raises:
  • ClientError – If the feature impacts have not been computed.

  • ValueError – If data_slice_filter passed as None

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:

features – The names of the features used in the model.

Return type:

List[str]

get_frozen_child_models()

Retrieve the IDs for all models that are frozen from this model.

Return type:

A list of Models

get_labelwise_roc_curves(source, fallback_to_parent_insights=False)

Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .

Added in version v2.24.

Parameters:
  • source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns:

Labelwise ROC Curve instances for source and all labels

Return type:

list of LabelwiseRocCurve

Raises:

ClientError – If the insight is not available for this model

get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the model Lift chart for the specified source.

Parameters:
  • source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.

  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

Returns:

Model lift chart data

Return type:

LiftChart

Raises:
  • ClientError – If the insight is not available for this model

  • ValueError – If data_slice_filter passed as None

get_missing_report_info()

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns:

The queried model missing report, sorted by missing count (DESCENDING order).

Return type:

An iterable of MissingReportPerFeature

get_model_blueprint_chart()

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns:

The queried model blueprint chart.

Return type:

ModelBlueprintChart

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:

All documents available for the model.

Return type:

list of BlueprintTaskDocument

get_model_blueprint_json()

Get the blueprint json representation used by this model.

Returns:

Json representation of the blueprint stages.

Return type:

BlueprintJson

get_multiclass_feature_impact()

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:

feature_impacts – The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Return type:

list of dict

Raises:

ClientError – If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>, target_class=None)

Retrieve model Lift chart for the specified source.

Parameters:
  • source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

  • target_class (str, optional) – Lift chart target class name.

Returns:

Model lift chart data for each saved target class

Return type:

list of LiftChart

Raises:

ClientError – If the insight is not available for this model

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)

Retrieve model Lift charts for the specified source.

Added in version v2.24.

Parameters:
  • source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:

Model lift chart data for each saved target class

Return type:

list of LiftChart

Raises:

ClientError – If the insight is not available for this model

get_num_iterations_trained()

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns:

  • projectId (str) – id of project containing the model

  • modelId (str) – id of the model

  • data (array) – list of numEstimatorsItem objects, one for each modeling stage.

  • numEstimatorsItem will be of the form

  • stage (str) – indicates the modeling stage (for multi-stage models); None of single-stage models

  • numIterations (int) – the number of estimators or iterations trained by the model

get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)

Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source.

Parameters:
  • source (string) – The source Feature Effects are retrieved for.

  • max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effect job to complete before erroring.

  • row_count (Optional[int]) – (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

  • data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

feature_effects – The Feature Effects data.

Return type:

FeatureEffects

get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)

Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.

Parameters:
  • source (string) – The source Feature Effects retrieve for.

  • class (str or None) – The class name Feature Effects retrieve for.

  • row_count (int) – The number of rows from dataset to use for Feature Impact calculation.

  • top_n_features (int or None) – Number of top features (ranked by Feature Impact) used to calculate Feature Effects.

  • features (list or None) – The list of features used to calculate Feature Effects.

  • max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effects job to complete before erroring.

Returns:

feature_effects – The list of multiclass feature effects data.

Return type:

list of FeatureEffectsMulticlass

get_or_request_feature_impact(max_wait=600, **kwargs)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:
  • max_wait (Optional[int]) – The maximum time to wait for a requested feature impact job to complete before erroring

  • **kwargs – Arbitrary keyword arguments passed to request_feature_impact.

Returns:

feature_impacts – The feature impact data. See get_feature_impact for the exact schema.

Return type:

list or dict

get_parameters()

Retrieve model parameters.

Returns:

Model parameters for this model.

Return type:

ModelParameters

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:

Model ParetoFront data

Return type:

ParetoFront

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:

prime_eligibility – a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

Return type:

dict

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve model residuals chart for the specified source.

Parameters:
  • source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.

Returns:

Model residuals chart data

Return type:

ResidualsChart

Raises:
  • ClientError – If the insight is not available for this model

  • ValueError – If data_slice_filter passed as None

get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.

Parameters:
  • source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.

  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns:

Model ROC curve data

Return type:

RocCurve

Raises:
  • ClientError – If the insight is not available for this model

  • (New in version v3.0) TypeError – If the underlying project type is multilabel

  • ValueError – If data_slice_filter passed as None

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:

rulesets

Return type:

list of Ruleset

get_supported_capabilities()

Retrieves a summary of the capabilities supported by a model.

Added in version v2.14.

Returns:

  • supportsBlending (bool) – whether the model supports blending

  • supportsMonotonicConstraints (bool) – whether the model supports monotonic constraints

  • hasWordCloud (bool) – whether the model has word cloud data available

  • eligibleForPrime (bool) – (Deprecated in version v3.6) whether the model is eligible for Prime

  • hasParameters (bool) – whether the model has parameters that can be retrieved

  • supportsCodeGeneration (bool) – (New in version v2.18) whether the model supports code generation

  • supportsShap (bool) –

    (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based

    feature Importance

  • supportsEarlyStopping (bool) – (New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_uri()
Returns:

url – Permanent static hyperlink to this model at leaderboard.

Return type:

str

get_word_cloud(exclude_stop_words=False)

Retrieve word cloud data for the model.

Parameters:

exclude_stop_words (Optional[bool]) – Set to True if you want stopwords filtered out of response.

Returns:

Word cloud data for the model.

Return type:

WordCloud

incremental_train(data_stage_id, training_data_name=None)

Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.

Return type:

ModelJob

classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)

Retrieve paginated model records, sorted by scores, with optional filtering.

Parameters:
  • sort_by_partition (str, one of validation, backtesting, crossValidation or holdout) – Set the partition to use for sorted (by score) list of models. validation is the default.

  • sort_by_metric (str) – Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.

  • with_metric (str) – For a single-metric list of results, specify that project metric.

  • search_term (str) – If specified, only models containing the term in their name or processes are returned.

  • featurelists (List[str]) – If specified, only models trained on selected featurelists are returned.

  • families (List[str]) – If specified, only models belonging to selected families are returned.

  • blueprints (List[str]) – If specified, only models trained on specified blueprint IDs are returned.

  • labels (List[str], starred or prepared for deployment) – If specified, only models tagged with all listed labels are returned.

  • characteristics (List[str]) – If specified, only models matching all listed characteristics are returned.

  • training_filters (List[str]) – If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings

  • number_of_clusters (list of int) – Filter models by number of clusters. Applicable only in unsupervised clustering projects.

  • limit (int)

  • offset (int)

Returns:

generic_models

Return type:

list of GenericModel

open_in_browser()

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type:

None

request_cross_class_accuracy_scores()

Request data disparity insights to be computed for the model.

Returns:

status_id – A statusId of computation request.

Return type:

str

request_data_disparity_insights(feature, compared_class_names)

Request data disparity insights to be computed for the model.

Parameters:
  • feature (str) – Bias and Fairness protected feature name.

  • compared_class_names (list(str)) – List of two classes to compare

Returns:

status_id – A statusId of computation request.

Return type:

str

request_external_test(dataset_id, actual_value_column=None)

Request external test to compute scores and insights on an external test dataset

Parameters:
  • dataset_id (string) – The dataset to make predictions against (as uploaded from Project.upload_dataset)

  • actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

Returns:

job – a Job representing external dataset insights computation

Return type:

Job

request_fairness_insights(fairness_metrics_set=None)

Request fairness insights to be computed for the model.

Parameters:

fairness_metrics_set (Optional[str]) – Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

Returns:

status_id – A statusId of computation request.

Return type:

str

request_feature_effect(row_count=None, data_slice_id=None)

Submit request to compute Feature Effects for the model.

See get_feature_effect for more information on the result of the job.

Parameters:
  • row_count (int) – (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

  • data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

job – A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.

Return type:

Job

Raises:

JobAlreadyRequested – If the feature effect have already been requested.

request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)

Request Feature Effects computation for the multiclass model.

See get_feature_effect for more information on the result of the job.

Parameters:
  • row_count (int) – The number of rows from dataset to use for Feature Impact calculation.

  • top_n_features (int or None) – Number of top features (ranked by feature impact) used to calculate Feature Effects.

  • features (list or None) – The list of features used to calculate Feature Effects.

Returns:

job – A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.

Return type:

Job

request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters:
  • row_count (Optional[int]) – The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.

  • with_metadata (Optional[bool]) – Flag indicating whether the result should include the metadata. If true, metadata is included.

  • data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

job – Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.

Return type:

Job or status_id

Raises:

JobAlreadyRequested – If the feature impacts have already been requested.

request_lift_chart(source, data_slice_id=None)

Request the model Lift Chart for the specified source.

Parameters:
  • source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

status_check_job – Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_per_class_fairness_insights(fairness_metrics_set=None)

Request per-class fairness insights be computed for the model.

Parameters:

fairness_metrics_set (Optional[str]) – The fairness metric used to calculate the fairness scores. Value can be any one of <datarobot.enums.FairnessMetricsSet>.

Returns:

status_check_job – The returned object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)

Requests predictions against a previously uploaded dataset.

Parameters:
  • dataset_id (string, optional) – The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)

  • dataset (Dataset, optional) – The dataset to make predictions against (as uploaded from Project.upload_dataset)

  • dataframe (pd.DataFrame, optional) – (New in v3.0) The dataframe to make predictions against

  • file_path (Optional[str]) – (New in v3.0) Path to file to make predictions against

  • file (IOBase, optional) – (New in v3.0) File to make predictions against

  • include_prediction_intervals (Optional[bool]) – (New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.

  • prediction_intervals_size (Optional[int]) – (New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).

  • forecast_point (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.

  • predictions_start_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.

  • predictions_end_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.

  • actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

  • explanation_algorithm ((New in version v2.21) optional; If set to 'shap', the) – response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).

  • max_explanations ((New in version v2.21) int optional; specifies the maximum number of) – explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.

  • max_ngram_explanations (optional;  int or str) – (New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.

Returns:

job – The job computing the predictions

Return type:

PredictJob

request_residuals_chart(source, data_slice_id=None)

Request the model residuals chart for the specified source.

Parameters:
  • source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

status_check_job – Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_roc_curve(source, data_slice_id=None)

Request the model Roc Curve for the specified source.

Parameters:
  • source (str) – Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

status_check_job – Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)

Start a job to build training predictions

Parameters:
  • data_subset (str) –

    data set definition to build predictions on. Choices are:

    • dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for

      models in datetime partitioned projects

    • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for

      all data except training set. Not valid for models in datetime partitioned projects

    • dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only

    • dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading

      the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.

  • explanation_algorithm (dr.enums.EXPLANATIONS_ALGORITHM) – (New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).

  • max_explanations (int) – (New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the max_explanations, the sum of remaining values will also be returned as shap_remaining_total. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored if explanation_algorithm is not set.

Returns:

an instance of created async job

Return type:

Job

retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)

Submit a job to the queue to train a blender model.

Parameters:
  • sample_pct (Optional[float]) – The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.

  • featurelist_id (Optional[str]) – The featurelist id

  • training_row_count (Optional[int]) – The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.

  • n_clusters (Optional[int]) – (new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.

Returns:

job – The created job that is retraining the model

Return type:

ModelJob

set_prediction_threshold(threshold)

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters:

threshold (float) – only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:

Session for setting up and running Advanced Tuning on a model

Return type:

AdvancedTuningSession

train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters:
  • data_stage_id (str) – The id of the data stage to use for training.

  • training_data_name (Optional[str]) – The name of the iteration or data stage to indicate what the incremental learning was performed on.

  • data_stage_encoding (Optional[str]) – The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252

  • data_stage_encoding – The delimiter used by the data in the data stage (default: ‘,’).

  • data_stage_compression (Optional[str]) – The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip

Returns:

job – The created job that is retraining the model

Return type:

ModelJob

unstar_model()

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

Blender models

class datarobot.models.BlenderModel

Represents blender model that combines prediction results from other models.

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Variables:
  • id (str) – the id of the model

  • project_id (str) – the id of the project the model belongs to

  • processes (List[str]) – the processes used by the model

  • featurelist_name (str) – the name of the featurelist used by the model

  • featurelist_id (str) – the id of the featurelist used by the model

  • sample_pct (float) – the percentage of the project dataset used in training the model

  • training_row_count (int or None) – the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.

  • training_duration (str or None) – only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

  • training_start_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

  • training_end_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

  • model_type (str) – what model this is, e.g. ‘DataRobot Prime’

  • model_category (str) – what kind of model this is - always ‘prime’ for DataRobot Prime models

  • is_frozen (bool) – whether this model is a frozen model

  • blueprint_id (str) – the id of the blueprint used in this model

  • metrics (dict) – a mapping from each metric to the model’s scores for that metric

  • model_ids (List[str]) – List of model ids used in blender

  • blender_method (str) – Method used to blend results from underlying models

  • monotonic_increasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

  • monotonic_decreasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

  • supports_monotonic_constraints (bool) – optional, whether this model supports enforcing monotonic constraints

  • is_starred (bool) – whether this model marked as starred

  • prediction_threshold (float) – for binary classification projects, the threshold used for predictions

  • prediction_threshold_read_only (bool) – indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

  • model_number (integer) – model number assigned to a model

  • parent_model_id (str or None) – (New in version v2.20) the id of the model that tuning parameters are derived from

  • supports_composable_ml (bool or None) – (New in version v2.26) whether this model is supported in the Composable ML.

__init__(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, model_ids=None, blender_method=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)
classmethod get(project_id, model_id)

Retrieve a specific blender.

Parameters:
  • project_id (str) – The project’s id.

  • model_id (str) – The model_id of the leaderboard item to retrieve.

Returns:

model – The queried instance.

Return type:

BlenderModel

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters:
  • params (dict) – Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

  • description (str) – Human-readable string describing the newly advanced-tuned model

Returns:

The created job to build the model

Return type:

ModelJob

cross_validate()

Run cross validation on the model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:

The created job to build the model

Return type:

ModelJob

delete()

Delete a model from the project’s leaderboard.

Return type:

None

download_scoring_code(file_name, source_code=False)

Download the Scoring Code JAR.

Parameters:
  • file_name (str) – File path where scoring code will be saved.

  • source_code (Optional[bool]) – Set to True to download source code archive. It will not be executable.

Return type:

None

download_training_artifact(file_name)

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters:

file_name (str) – File path where trained model artifact(s) will be saved.

classmethod from_data(data)

Instantiate an object of this class using a dict.

Parameters:

data (dict) – Correctly snake_cased keys and their values.

Return type:

TypeVar(T, bound= APIObject)

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys

  • parameter_name : (str) name of the parameter (unique per task, see below)

  • parameter_id : (str) opaque ID string uniquely identifying parameter

  • default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)

  • current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.

  • task_name : (str) name of the task that this parameter belongs to

  • constraints: (dict) see the notes below

  • vertex_id: (str) ID of vertex that this parameter belongs to

Return type:

dict

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.

  • ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.

  • unicode: The parameter may be any Python unicode object.

  • int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].

  • float: The value may be an object of type float within the specified range (inclusive).

  • intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)

Retrieve a list of all confusion matrices available for the model.

Parameters:

fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:

Data for all available confusion charts for model.

Return type:

list of ConfusionChart

get_all_feature_impacts(data_slice_filter=None)

Retrieve a list of all feature impact results available for the model.

Parameters:

data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.

Returns:

Data for all available model feature impacts. Or an empty list if not data found.

Return type:

list of dicts

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()
get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all Lift charts available for the model.

Parameters:
  • fallback_to_parent_insights (Optional[bool]) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:

Data for all available model lift charts. Or an empty list if no data found.

Return type:

list of LiftChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()
get_all_multiclass_lift_charts(fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>, target_class=None)

Retrieve a list of all Lift charts available for the model.

Parameters:
  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

  • target_class (str, optional) – Lift chart target class name.

Returns:

Data for all available model lift charts.

Return type:

list of LiftChart

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all residuals charts available for the model.

Parameters:
  • fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:

Data for all available model residuals charts.

Return type:

list of ResidualsChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()
get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all ROC curves available for the model.

Parameters:
  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:

Data for all available model ROC curves. Or an empty list if no RocCurves are found.

Return type:

list of RocCurve

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()
get_confusion_chart(source, fallback_to_parent_insights=False)

Retrieve a multiclass model’s confusion matrix for the specified source.

Parameters:
  • source (str) – Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:

Model ConfusionChart data

Return type:

ConfusionChart

Raises:

ClientError – If the insight is not available for this model

get_cross_class_accuracy_scores()

Retrieves a list of Cross Class Accuracy scores for the model.

Return type:

json

get_cross_validation_scores(partition=None, metric=None)

Return a dictionary, keyed by metric, showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:
  • partition (float) – optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.

  • metric (unicode) – optional name of the metric to filter to resulting cross validation scores by

Returns:

cross_validation_scores – A dictionary keyed by metric showing cross validation scores per partition.

Return type:

dict

get_data_disparity_insights(feature, class_name1, class_name2)

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters:
  • feature (str) – Bias and Fairness protected feature name.

  • class_name1 (str) – One of the compared classes

  • class_name2 (str) – Another compared class

Return type:

json

get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)

Retrieve a list of Per Class Bias insights for the model.

Parameters:
  • fairness_metrics_set (Optional[str]) – Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

  • offset (Optional[int]) – Number of items to skip.

  • limit (Optional[int]) – Number of items to return.

Return type:

json

get_feature_effect(source, data_slice_id=None)

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
  • source (string) – The source Feature Effects are retrieved for.

  • data_slice_id (string, optional) – ID for the data slice used in the request. If None, retrieve unsliced insight data.

Returns:

feature_effects – The feature effects data.

Return type:

FeatureEffects

Raises:

ClientError – If the feature effects have not been computed or source is not valid value.

get_feature_effect_metadata()

Retrieve Feature Effects metadata. Response contains status and available model sources.

  • Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.

  • When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.

  • Feature Effects for holdout is not available when holdout was not unlocked for the project.

Use source to retrieve Feature Effects, selecting one of the provided sources.

Returns:

feature_effect_metadata

Return type:

FeatureEffectMetadata

get_feature_effects_multiclass(source='training', class_=None)

Retrieve Feature Effects for the multiclass model.

Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
  • source (str) – The source Feature Effects are retrieved for.

  • class (str or None) – The class name Feature Effects are retrieved for.

Returns:

The list of multiclass feature effects.

Return type:

list

Raises:

ClientError – If Feature Effects have not been computed or source is not valid value.

get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters:
  • with_metadata (bool) – The flag indicating if the result should include the metadata as well.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.

Returns:

The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:

  • featureImpacts - Feature Impact data as a dictionary. Each item is a dict with

    keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.

  • shapBased - A boolean that indicates whether Feature Impact was calculated using

    Shapley values.

  • ranRedundancyDetection - A boolean that indicates whether redundant feature

    identification was run while calculating this Feature Impact.

  • rowCount - An integer or None that indicates the number of rows that was used to

    calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.

  • count - An integer with the number of features under the featureImpacts.

Return type:

list or dict

Raises:
  • ClientError – If the feature impacts have not been computed.

  • ValueError – If data_slice_filter passed as None

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:

features – The names of the features used in the model.

Return type:

List[str]

get_frozen_child_models()

Retrieve the IDs for all models that are frozen from this model.

Return type:

A list of Models

get_labelwise_roc_curves(source, fallback_to_parent_insights=False)

Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .

Added in version v2.24.

Parameters:
  • source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns:

Labelwise ROC Curve instances for source and all labels

Return type:

list of LabelwiseRocCurve

Raises:

ClientError – If the insight is not available for this model

get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the model Lift chart for the specified source.

Parameters:
  • source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.

  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

Returns:

Model lift chart data

Return type:

LiftChart

Raises:
  • ClientError – If the insight is not available for this model

  • ValueError – If data_slice_filter passed as None

get_missing_report_info()

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns:

The queried model missing report, sorted by missing count (DESCENDING order).

Return type:

An iterable of MissingReportPerFeature

get_model_blueprint_chart()

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns:

The queried model blueprint chart.

Return type:

ModelBlueprintChart

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:

All documents available for the model.

Return type:

list of BlueprintTaskDocument

get_model_blueprint_json()

Get the blueprint json representation used by this model.

Returns:

Json representation of the blueprint stages.

Return type:

BlueprintJson

get_multiclass_feature_impact()

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:

feature_impacts – The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Return type:

list of dict

Raises:

ClientError – If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>, target_class=None)

Retrieve model Lift chart for the specified source.

Parameters:
  • source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

  • target_class (str, optional) – Lift chart target class name.

Returns:

Model lift chart data for each saved target class

Return type:

list of LiftChart

Raises:

ClientError – If the insight is not available for this model

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)

Retrieve model Lift charts for the specified source.

Added in version v2.24.

Parameters:
  • source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:

Model lift chart data for each saved target class

Return type:

list of LiftChart

Raises:

ClientError – If the insight is not available for this model

get_num_iterations_trained()

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns:

  • projectId (str) – id of project containing the model

  • modelId (str) – id of the model

  • data (array) – list of numEstimatorsItem objects, one for each modeling stage.

  • numEstimatorsItem will be of the form

  • stage (str) – indicates the modeling stage (for multi-stage models); None of single-stage models

  • numIterations (int) – the number of estimators or iterations trained by the model

get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)

Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source.

Parameters:
  • source (string) – The source Feature Effects are retrieved for.

  • max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effect job to complete before erroring.

  • row_count (Optional[int]) – (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

  • data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

feature_effects – The Feature Effects data.

Return type:

FeatureEffects

get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)

Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.

Parameters:
  • source (string) – The source Feature Effects retrieve for.

  • class (str or None) – The class name Feature Effects retrieve for.

  • row_count (int) – The number of rows from dataset to use for Feature Impact calculation.

  • top_n_features (int or None) – Number of top features (ranked by Feature Impact) used to calculate Feature Effects.

  • features (list or None) – The list of features used to calculate Feature Effects.

  • max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effects job to complete before erroring.

Returns:

feature_effects – The list of multiclass feature effects data.

Return type:

list of FeatureEffectsMulticlass

get_or_request_feature_impact(max_wait=600, **kwargs)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:
  • max_wait (Optional[int]) – The maximum time to wait for a requested feature impact job to complete before erroring

  • **kwargs – Arbitrary keyword arguments passed to request_feature_impact.

Returns:

feature_impacts – The feature impact data. See get_feature_impact for the exact schema.

Return type:

list or dict

get_parameters()

Retrieve model parameters.

Returns:

Model parameters for this model.

Return type:

ModelParameters

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:

Model ParetoFront data

Return type:

ParetoFront

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:

prime_eligibility – a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

Return type:

dict

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve model residuals chart for the specified source.

Parameters:
  • source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.

Returns:

Model residuals chart data

Return type:

ResidualsChart

Raises:
  • ClientError – If the insight is not available for this model

  • ValueError – If data_slice_filter passed as None

get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.

Parameters:
  • source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.

  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns:

Model ROC curve data

Return type:

RocCurve

Raises:
  • ClientError – If the insight is not available for this model

  • (New in version v3.0) TypeError – If the underlying project type is multilabel

  • ValueError – If data_slice_filter passed as None

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:

rulesets

Return type:

list of Ruleset

get_supported_capabilities()

Retrieves a summary of the capabilities supported by a model.

Added in version v2.14.

Returns:

  • supportsBlending (bool) – whether the model supports blending

  • supportsMonotonicConstraints (bool) – whether the model supports monotonic constraints

  • hasWordCloud (bool) – whether the model has word cloud data available

  • eligibleForPrime (bool) – (Deprecated in version v3.6) whether the model is eligible for Prime

  • hasParameters (bool) – whether the model has parameters that can be retrieved

  • supportsCodeGeneration (bool) – (New in version v2.18) whether the model supports code generation

  • supportsShap (bool) –

    (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based

    feature Importance

  • supportsEarlyStopping (bool) – (New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_uri()
Returns:

url – Permanent static hyperlink to this model at leaderboard.

Return type:

str

get_word_cloud(exclude_stop_words=False)

Retrieve word cloud data for the model.

Parameters:

exclude_stop_words (Optional[bool]) – Set to True if you want stopwords filtered out of response.

Returns:

Word cloud data for the model.

Return type:

WordCloud

incremental_train(data_stage_id, training_data_name=None)

Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.

Return type:

ModelJob

classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)

Retrieve paginated model records, sorted by scores, with optional filtering.

Parameters:
  • sort_by_partition (str, one of validation, backtesting, crossValidation or holdout) – Set the partition to use for sorted (by score) list of models. validation is the default.

  • sort_by_metric (str) – Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.

  • with_metric (str) – For a single-metric list of results, specify that project metric.

  • search_term (str) – If specified, only models containing the term in their name or processes are returned.

  • featurelists (List[str]) – If specified, only models trained on selected featurelists are returned.

  • families (List[str]) – If specified, only models belonging to selected families are returned.

  • blueprints (List[str]) – If specified, only models trained on specified blueprint IDs are returned.

  • labels (List[str], starred or prepared for deployment) – If specified, only models tagged with all listed labels are returned.

  • characteristics (List[str]) – If specified, only models matching all listed characteristics are returned.

  • training_filters (List[str]) – If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings

  • number_of_clusters (list of int) – Filter models by number of clusters. Applicable only in unsupervised clustering projects.

  • limit (int)

  • offset (int)

Returns:

generic_models

Return type:

list of GenericModel

open_in_browser()

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type:

None

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:

job – the job generating the rulesets

Return type:

Job

request_cross_class_accuracy_scores()

Request data disparity insights to be computed for the model.

Returns:

status_id – A statusId of computation request.

Return type:

str

request_data_disparity_insights(feature, compared_class_names)

Request data disparity insights to be computed for the model.

Parameters:
  • feature (str) – Bias and Fairness protected feature name.

  • compared_class_names (list(str)) – List of two classes to compare

Returns:

status_id – A statusId of computation request.

Return type:

str

request_external_test(dataset_id, actual_value_column=None)

Request external test to compute scores and insights on an external test dataset

Parameters:
  • dataset_id (string) – The dataset to make predictions against (as uploaded from Project.upload_dataset)

  • actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

Returns:

job – a Job representing external dataset insights computation

Return type:

Job

request_fairness_insights(fairness_metrics_set=None)

Request fairness insights to be computed for the model.

Parameters:

fairness_metrics_set (Optional[str]) – Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

Returns:

status_id – A statusId of computation request.

Return type:

str

request_feature_effect(row_count=None, data_slice_id=None)

Submit request to compute Feature Effects for the model.

See get_feature_effect for more information on the result of the job.

Parameters:
  • row_count (int) – (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

  • data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

job – A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.

Return type:

Job

Raises:

JobAlreadyRequested – If the feature effect have already been requested.

request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)

Request Feature Effects computation for the multiclass model.

See get_feature_effect for more information on the result of the job.

Parameters:
  • row_count (int) – The number of rows from dataset to use for Feature Impact calculation.

  • top_n_features (int or None) – Number of top features (ranked by feature impact) used to calculate Feature Effects.

  • features (list or None) – The list of features used to calculate Feature Effects.

Returns:

job – A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.

Return type:

Job

request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters:
  • row_count (Optional[int]) – The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.

  • with_metadata (Optional[bool]) – Flag indicating whether the result should include the metadata. If true, metadata is included.

  • data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

job – Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.

Return type:

Job or status_id

Raises:

JobAlreadyRequested – If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)

Train a new frozen model with parameters from this model.

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
  • training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

  • training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

  • training_start_date (datetime.datetime, optional) – the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

  • training_end_date (datetime.datetime, optional) – the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

  • time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

  • sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

Returns:

model_job – the modeling job training a frozen model

Return type:

ModelJob

request_frozen_model(sample_pct=None, training_row_count=None)

Train a new frozen model with parameters from this model

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:
  • sample_pct (float) – optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.

  • training_row_count (int) – (New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns:

model_job – the modeling job training a frozen model

Return type:

ModelJob

request_lift_chart(source, data_slice_id=None)

Request the model Lift Chart for the specified source.

Parameters:
  • source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

status_check_job – Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_per_class_fairness_insights(fairness_metrics_set=None)

Request per-class fairness insights be computed for the model.

Parameters:

fairness_metrics_set (Optional[str]) – The fairness metric used to calculate the fairness scores. Value can be any one of <datarobot.enums.FairnessMetricsSet>.

Returns:

status_check_job – The returned object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)

Requests predictions against a previously uploaded dataset.

Parameters:
  • dataset_id (string, optional) – The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)

  • dataset (Dataset, optional) – The dataset to make predictions against (as uploaded from Project.upload_dataset)

  • dataframe (pd.DataFrame, optional) – (New in v3.0) The dataframe to make predictions against

  • file_path (Optional[str]) – (New in v3.0) Path to file to make predictions against

  • file (IOBase, optional) – (New in v3.0) File to make predictions against

  • include_prediction_intervals (Optional[bool]) – (New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.

  • prediction_intervals_size (Optional[int]) – (New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).

  • forecast_point (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.

  • predictions_start_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.

  • predictions_end_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.

  • actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

  • explanation_algorithm ((New in version v2.21) optional; If set to 'shap', the) – response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).

  • max_explanations ((New in version v2.21) int optional; specifies the maximum number of) – explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.

  • max_ngram_explanations (optional;  int or str) – (New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.

Returns:

job – The job computing the predictions

Return type:

PredictJob

request_residuals_chart(source, data_slice_id=None)

Request the model residuals chart for the specified source.

Parameters:
  • source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

status_check_job – Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_roc_curve(source, data_slice_id=None)

Request the model Roc Curve for the specified source.

Parameters:
  • source (str) – Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

status_check_job – Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)

Start a job to build training predictions

Parameters:
  • data_subset (str) –

    data set definition to build predictions on. Choices are:

    • dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for

      models in datetime partitioned projects

    • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for

      all data except training set. Not valid for models in datetime partitioned projects

    • dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only

    • dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading

      the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.

  • explanation_algorithm (dr.enums.EXPLANATIONS_ALGORITHM) – (New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).

  • max_explanations (int) – (New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the max_explanations, the sum of remaining values will also be returned as shap_remaining_total. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored if explanation_algorithm is not set.

Returns:

an instance of created async job

Return type:

Job

retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)

Submit a job to the queue to train a blender model.

Parameters:
  • sample_pct (Optional[float]) – The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.

  • featurelist_id (Optional[str]) – The featurelist id

  • training_row_count (Optional[int]) – The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.

  • n_clusters (Optional[int]) – (new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.

Returns:

job – The created job that is retraining the model

Return type:

ModelJob

set_prediction_threshold(threshold)

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters:

threshold (float) – only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:

Session for setting up and running Advanced Tuning on a model

Return type:

AdvancedTuningSession

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Note

For datetime partitioned projects, see train_datetime instead.

Parameters:
  • sample_pct (Optional[float]) – The amount of data to use for training, as a percentage of the project dataset from 0 to 100.

  • featurelist_id (Optional[str]) – The identifier of the featurelist to use. If not defined, the featurelist of this model is used.

  • scoring_type (Optional[str]) – Either validation or crossValidation (also dr.SCORING_TYPE.validation or dr.SCORING_TYPE.cross_validation). validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, crossValidation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.

  • training_row_count (Optional[int]) – The number of rows to use to train the requested model.

  • monotonic_increasing_featurelist_id (str) – (new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

  • monotonic_decreasing_featurelist_id (str) – (new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:

model_job_id – id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Return type:

str

Examples

project = Project.get('project-id')
model = Model.get('project-id', 'model-id')
model_job_id = model.train(training_row_count=project.max_train_rows)
train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)

Trains this model on a different featurelist or sample size.

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
  • featurelist_id (Optional[str]) – the featurelist to use to train the model. If not specified, the featurelist of this model is used.

  • training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified, neither training_duration nor use_project_settings may be specified.

  • training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should span. If specified, neither training_row_count nor use_project_settings may be specified.

  • use_project_settings (Optional[bool]) – (New in version v2.20) defaults to False. If True, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neither training_row_count nor training_duration may be specified.

  • time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

  • sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

  • monotonic_increasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

  • monotonic_decreasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

  • n_clusters (Optional[int]) – (New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.

Returns:

job – the created job to build the model

Return type:

ModelJob

train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters:
  • data_stage_id (str) – The id of the data stage to use for training.

  • training_data_name (Optional[str]) – The name of the iteration or data stage to indicate what the incremental learning was performed on.

  • data_stage_encoding (Optional[str]) – The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252

  • data_stage_encoding – The delimiter used by the data in the data stage (default: ‘,’).

  • data_stage_compression (Optional[str]) – The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip

Returns:

job – The created job that is retraining the model

Return type:

ModelJob

unstar_model()

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

Datetime models

class datarobot.models.DatetimeModel

Represents a model from a datetime partitioned project

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Note that only one of training_row_count, training_duration, and training_start_date and training_end_date will be specified, depending on the data_selection_method of the model. Whichever method was selected determines the amount of data used to train on when making predictions and scoring the backtests and the holdout.

Variables:
  • id (str) – the id of the model

  • project_id (str) – the id of the project the model belongs to

  • processes (List[str]) – the processes used by the model

  • featurelist_name (str) – the name of the featurelist used by the model

  • featurelist_id (str) – the id of the featurelist used by the model

  • sample_pct (float) – the percentage of the project dataset used in training the model

  • training_row_count (int or None) – If specified, an int specifying the number of rows used to train the model and evaluate backtest scores.

  • training_duration (str or None) – If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

  • training_start_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

  • training_end_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

  • time_window_sample_pct (int or None) – An integer between 1 and 99 indicating the percentage of sampling within the training window. The points kept are determined by a random uniform sample. If not specified, no sampling was done.

  • sampling_method (str or None) – (New in v2.23) indicates the way training data has been selected (either how rows have been selected within backtest or how time_window_sample_pct has been applied).

  • model_type (str) – what model this is, e.g. ‘Nystroem Kernel SVM Regressor’

  • model_category (str) – what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models

  • is_frozen (bool) – whether this model is a frozen model

  • blueprint_id (str) – the id of the blueprint used in this model

  • metrics (dict) – a mapping from each metric to the model’s scores for that metric. The keys in metrics are the different metrics used to evaluate the model, and the values are the results. The dictionaries inside of metrics will be as described here: ‘validation’, the score for a single backtest; ‘crossValidation’, always None; ‘backtesting’, the average score for all backtests if all are available and computed, or None otherwise; ‘backtestingScores’, a list of scores for all backtests where the score is None if that backtest does not have a score available; and ‘holdout’, the score for the holdout or None if the holdout is locked or the score is unavailable.

  • backtests (list of dict) – describes what data was used to fit each backtest, the score for the project metric, and why the backtest score is unavailable if it is not provided.

  • data_selection_method (str) – which of training_row_count, training_duration, or training_start_data and training_end_date were used to determine the data used to fit the model. One of ‘rowCount’, ‘duration’, or ‘selectedDateRange’.

  • training_info (dict) – describes which data was used to train on when scoring the holdout and making predictions. training_info` will have the following keys: holdout_training_start_date, holdout_training_duration, holdout_training_row_count, holdout_training_end_date, prediction_training_start_date, prediction_training_duration, prediction_training_row_count, prediction_training_end_date. Start and end dates will be datetimes, durations will be duration strings, and rows will be integers.

  • holdout_score (float or None) – the score against the holdout, if available and the holdout is unlocked, according to the project metric.

  • holdout_status (string or None) – the status of the holdout score, e.g. “COMPLETED”, “HOLDOUT_BOUNDARIES_EXCEEDED”. Unavailable if the holdout fold was disabled in the partitioning configuration.

  • monotonic_increasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

  • monotonic_decreasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

  • supports_monotonic_constraints (bool) – optional, whether this model supports enforcing monotonic constraints

  • is_starred (bool) – whether this model marked as starred

  • prediction_threshold (float) – for binary classification projects, the threshold used for predictions

  • prediction_threshold_read_only (bool) – indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

  • effective_feature_derivation_window_start (int or None) – (New in v2.16) For time series projects only. How many units of the windows_basis_unit into the past relative to the forecast point the user needs to provide history for at prediction time. This can differ from the feature_derivation_window_start set on the project due to the differencing method and period selected, or if the model is a time series native model such as ARIMA. Will be a negative integer in time series projects and None otherwise.

  • effective_feature_derivation_window_end (int or None) – (New in v2.16) For time series projects only. How many units of the windows_basis_unit into the past relative to the forecast point the feature derivation window should end. Will be a non-positive integer in time series projects and None otherwise.

  • forecast_window_start (int or None) – (New in v2.16) For time series projects only. How many units of the windows_basis_unit into the future relative to the forecast point the forecast window should start. Note that this field will be the same as what is shown in the project settings. Will be a non-negative integer in time series projects and None otherwise.

  • forecast_window_end (int or None) – (New in v2.16) For time series projects only. How many units of the windows_basis_unit into the future relative to the forecast point the forecast window should end. Note that this field will be the same as what is shown in the project settings. Will be a non-negative integer in time series projects and None otherwise.

  • windows_basis_unit (str or None) – (New in v2.16) For time series projects only. Indicates which unit is the basis for the feature derivation window and the forecast window. Note that this field will be the same as what is shown in the project settings. In time series projects, will be either the detected time unit or “ROW”, and None otherwise.

  • model_number (integer) – model number assigned to a model

  • parent_model_id (str or None) – (New in version v2.20) the id of the model that tuning parameters are derived from

  • supports_composable_ml (bool or None) – (New in version v2.26) whether this model is supported in the Composable ML.

  • is_n_clusters_dynamically_determined (Optional[bool]) – (New in version 2.27) if True, indicates that model determines number of clusters automatically.

  • n_clusters (Optional[int]) – (New in version 2.27) Number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.

__init__(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, training_info=None, holdout_score=None, holdout_status=None, data_selection_method=None, backtests=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, effective_feature_derivation_window_start=None, effective_feature_derivation_window_end=None, forecast_window_start=None, forecast_window_end=None, windows_basis_unit=None, model_number=None, parent_model_id=None, supports_composable_ml=None, n_clusters=None, is_n_clusters_dynamically_determined=None, has_empty_clusters=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None, **kwargs)
classmethod get(project, model_id)

Retrieve a specific datetime model.

If the project does not use datetime partitioning, a ClientError will occur.

Parameters:
  • project (str) – the id of the project the model belongs to

  • model_id (str) – the id of the model to retrieve

Returns:

model – the model

Return type:

DatetimeModel

score_backtests()

Compute the scores for all available backtests.

Some backtests may be unavailable if the model is trained into their validation data.

Returns:

job – a job tracking the backtest computation. When it is complete, all available backtests will have scores computed.

Return type:

Job

cross_validate()

Inherited from the model. DatetimeModels cannot request cross validation scores; use backtests instead.

Return type:

NoReturn

get_cross_validation_scores(partition=None, metric=None)

Inherited from Model - DatetimeModels cannot request Cross Validation scores,

Use backtests instead.

Return type:

NoReturn

request_training_predictions(data_subset, *args, **kwargs)

Start a job that builds training predictions.

Parameters:

data_subset (str) –

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only

  • dr.enums.DATA_SUBSET.ALL_BACKTESTS for downloading the predictions for all

    backtest validation folds. Requires the model to have successfully scored all backtests.

Returns:

an instance of created async job

Return type:

Job

get_series_accuracy_as_dataframe(offset=0, limit=100, metric=None, multiseries_value=None, order_by=None, reverse=False)

Retrieve series accuracy results for the specified model as a pandas.DataFrame.

Parameters:
  • offset (Optional[int]) – The number of results to skip. Defaults to 0 if not specified.

  • limit (Optional[int]) – The maximum number of results to return. Defaults to 100 if not specified.

  • metric (Optional[str]) – The name of the metric to retrieve scores for. If omitted, the default project metric will be used.

  • multiseries_value (Optional[str]) – If specified, only the series containing the given value in one of the series ID columns will be returned.

  • order_by (Optional[str]) – Used for sorting the series. Attribute must be one of datarobot.enums.SERIES_ACCURACY_ORDER_BY.

  • reverse (Optional[bool]) – Used for sorting the series. If True, will sort the series in descending order by the attribute specified by order_by.

Returns:

A pandas.DataFrame with the Series Accuracy for the specified model.

Return type:

data

download_series_accuracy_as_csv(filename, encoding='utf-8', offset=0, limit=100, metric=None, multiseries_value=None, order_by=None, reverse=False)

Save series accuracy results for the specified model in a CSV file.

Parameters:
  • filename (str or file object) – The path or file object to save the data to.

  • encoding (Optional[str]) – A string representing the encoding to use in the output csv file. Defaults to ‘utf-8’.

  • offset (Optional[int]) – The number of results to skip. Defaults to 0 if not specified.

  • limit (Optional[int]) – The maximum number of results to return. Defaults to 100 if not specified.

  • metric (Optional[str]) – The name of the metric to retrieve scores for. If omitted, the default project metric will be used.

  • multiseries_value (Optional[str]) – If specified, only the series containing the given value in one of the series ID columns will be returned.

  • order_by (Optional[str]) – Used for sorting the series. Attribute must be one of datarobot.enums.SERIES_ACCURACY_ORDER_BY.

  • reverse (Optional[bool]) – Used for sorting the series. If True, will sort the series in descending order by the attribute specified by order_by.

get_series_clusters(offset=0, limit=100, order_by=None, reverse=False)

Retrieve a dictionary of series and the clusters assigned to each series. This is only usable for clustering projects.

Parameters:
  • offset (Optional[int]) – The number of results to skip. Defaults to 0 if not specified.

  • limit (Optional[int]) – The maximum number of results to return. Defaults to 100 if not specified.

  • order_by (Optional[str]) – Used for sorting the series. Attribute must be one of datarobot.enums.SERIES_ACCURACY_ORDER_BY.

  • reverse (Optional[bool]) – Used for sorting the series. If True, will sort the series in descending order by the attribute specified by order_by.

Returns:

A dictionary of the series in the dataset with their associated cluster

Return type:

Dict

Raises:
  • ValueError – If the model type returns an unsupported insight

  • ClientError – If the insight is not available for this model

compute_series_accuracy(compute_all_series=False)

Compute series accuracy for the model.

Parameters:

compute_all_series (Optional[bool]) – Calculate accuracy for all series or only first 1000.

Returns:

an instance of the created async job

Return type:

Job

retrain(time_window_sample_pct=None, featurelist_id=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, sampling_method=None, n_clusters=None)

Retrain an existing datetime model using a new training period for the model’s training set (with optional time window sampling) or a different feature list.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
  • featurelist_id (Optional[str]) – The ID of the featurelist to use.

  • training_row_count (Optional[int]) – The number of rows to train the model on. If this parameter is used then sample_pct cannot be specified.

  • time_window_sample_pct (Optional[int]) – An int between 1 and 99 indicating the percentage of sampling within the time window. The points kept are determined by a random uniform sample. If specified, training_row_count must not be specified and either training_duration or training_start_date and training_end_date must be specified.

  • training_duration (Optional[str]) – A duration string representing the training duration for the submitted model. If specified then training_row_count, training_start_date, and training_end_date cannot be specified.

  • training_start_date (Optional[str]) – A datetime string representing the start date of the data to use for training this model. If specified, training_end_date must also be specified, and training_duration cannot be specified. The value must be before the training_end_date value.

  • training_end_date (Optional[str]) – A datetime string representing the end date of the data to use for training this model. If specified, training_start_date must also be specified, and training_duration cannot be specified. The value must be after the training_start_date value.

  • sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

  • n_clusters (Optional[int]) – (New in version 2.27) Number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.

Returns:

job – The created job that is retraining the model

Return type:

ModelJob

get_feature_effect_metadata()

Retrieve Feature Effect metadata for each backtest. Response contains status and available sources for each backtest of the model.

  • Each backtest is available for training and validation

  • If holdout is configured for the project it has holdout as backtestIndex. It has training and holdout sources available.

Start/stop models contain a single response item with startstop value for backtestIndex.

  • Feature Effect of training is always available (except for the old project which supports only Feature Effect for validation).

  • When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Effect is not available for validation or holdout.

  • Feature Effect for holdout is not available when there is no holdout configured for the project.

source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.

backtestIndex is expected parameter to submit compute request and retrieve Feature Effect. One of provided backtest indexes shall be used.

Returns:

feature_effect_metadata

Return type:

FeatureEffectMetadataDatetime

request_feature_effect(backtest_index, data_slice_filter=<datarobot.models.model.Sentinel object>)

Request feature effects to be computed for the model.

See get_feature_effect for more information on the result of the job.

See get_feature_effect_metadata for retrieving information of backtest_index.

Parameters:

backtest_index (string, FeatureEffectMetadataDatetime.backtest_index.) – The backtest index to retrieve Feature Effects for.

Returns:

job – A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.

Return type:

Job

Raises:

JobAlreadyRequested – If the feature effect have already been requested.

get_feature_effect(source, backtest_index, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information of source, backtest_index.

Parameters:
  • source (string) – The source Feature Effects are retrieved for. One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the available sources for feature effect.

  • backtest_index (string, FeatureEffectMetadataDatetime.backtest_index.) – The backtest index to retrieve Feature Effects for.

Returns:

feature_effects – The feature effects data.

Return type:

FeatureEffects

Raises:

ClientError – If the feature effects have not been computed or source is not valid value.

get_or_request_feature_effect(source, backtest_index, max_wait=600, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve Feature Effects computations for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source, backtest_index.

Parameters:
  • max_wait (Optional[int]) – The maximum time to wait for a requested feature effect job to complete before erroring

  • source (string) – The source Feature Effects are retrieved for. One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the available sources for feature effect.

  • backtest_index (string, FeatureEffectMetadataDatetime.backtest_index.) – The backtest index to retrieve Feature Effects for.

Returns:

feature_effects – The feature effects data.

Return type:

FeatureEffects

request_feature_effects_multiclass(backtest_index, row_count=None, top_n_features=None, features=None)

Request feature effects to be computed for the multiclass datetime model.

See get_feature_effect for more information on the result of the job.

Parameters:
  • backtest_index (str) – The backtest index to use for Feature Effects calculation.

  • row_count (int) – The number of rows from dataset to use for Feature Impact calculation.

  • top_n_features (int or None) – Number of top features (ranked by Feature Impact) used to calculate Feature Effects.

  • features (list or None) – The list of features to use to calculate Feature Effects.

Returns:

job – A Job representing Feature Effects computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.

Return type:

Job

get_feature_effects_multiclass(backtest_index, source='training', class_=None)

Retrieve Feature Effects for the multiclass datetime model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
  • backtest_index (str) – The backtest index to retrieve Feature Effects for.

  • source (str) – The source Feature Effects are retrieved for.

  • class (str or None) – The class name Feature Effects are retrieved for.

Returns:

The list of multiclass Feature Effects.

Return type:

list

Raises:

ClientError – If the Feature Effects have not been computed or source is not valid value.

get_or_request_feature_effects_multiclass(backtest_index, source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)

Retrieve Feature Effects for a datetime multiclass model, and request a job if it hasn’t been run previously.

Parameters:
  • backtest_index (str) – The backtest index to retrieve Feature Effects for.

  • source (string) – The source from which Feature Effects are retrieved.

  • class (str or None) – The class name Feature Effects retrieve for.

  • row_count (int) – The number of rows used from the dataset for Feature Impact calculation.

  • top_n_features (int or None) – Number of top features (ranked by feature impact) used to calculate Feature Effects.

  • features (list or None) – The list of features used to calculate Feature Effects.

  • max_wait (Optional[int]) – The maximum time to wait for a requested feature effect job to complete before erroring.

Returns:

feature_effects – The list of multiclass feature effects data.

Return type:

list of FeatureEffectsMulticlass

calculate_prediction_intervals(prediction_intervals_size)

Calculate prediction intervals for this DatetimeModel for the specified size.

Added in version v2.19.

Parameters:

prediction_intervals_size (int) – The prediction interval’s size to calculate for this model. See the prediction intervals documentation for more information.

Returns:

job – a Job tracking the prediction intervals computation

Return type:

Job

get_calculated_prediction_intervals(offset=None, limit=None)

Retrieve a list of already-calculated prediction intervals for this model

Added in version v2.19.

Parameters:
  • offset (Optional[int]) – If provided, this many results will be skipped

  • limit (Optional[int]) – If provided, at most this many results will be returned. If not provided, will return at most 100 results.

Returns:

A descending-ordered list of already-calculated prediction interval sizes

Return type:

list[int]

compute_datetime_trend_plots(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance_start=None, forecast_distance_end=None)

Computes datetime trend plots (Accuracy over Time, Forecast vs Actual, Anomaly over Time) for this model

Added in version v2.25.

Parameters:
  • backtest (int or string, optional) – Compute plots for a specific backtest (use the backtest index starting from zero). To compute plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT

  • source (string, optional) – The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE

  • forecast_distance_start (Optional[int]:) – The start of forecast distance range (forecast window) to compute. If not specified, the first forecast distance for this project will be used. Only for time series supervised models

  • forecast_distance_end (Optional[int]:) – The end of forecast distance range (forecast window) to compute. If not specified, the last forecast distance for this project will be used. Only for time series supervised models

Returns:

job – a Job tracking the datetime trend plots computation

Return type:

Job

Notes

  • Forecast distance specifies the number of time steps between the predicted point and the origin point.

  • For the multiseries models only first 1000 series in alphabetical order and an average plot for them will be computed.

  • Maximum 100 forecast distances can be requested for calculation in time series supervised projects.

get_accuracy_over_time_plots_metadata(forecast_distance=None)

Retrieve Accuracy over Time plots metadata for this model.

Added in version v2.25.

Parameters:

forecast_distance (Optional[int]) – Forecast distance to retrieve the metadata for. If not specified, the first forecast distance for this project will be used. Only available for time series projects.

Returns:

metadata – a AccuracyOverTimePlotsMetadata representing Accuracy over Time plots metadata

Return type:

AccuracyOverTimePlotsMetadata

get_accuracy_over_time_plot(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance=None, series_id=None, resolution=None, max_bin_size=None, start_date=None, end_date=None, max_wait=600)

Retrieve Accuracy over Time plots for this model.

Added in version v2.25.

Parameters:
  • backtest (int or string, optional) – Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT

  • source (string, optional) – The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE

  • forecast_distance (Optional[int]) – Forecast distance to retrieve the plots for. If not specified, the first forecast distance for this project will be used. Only available for time series projects.

  • series_id (string, optional) – The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.

  • resolution (string, optional) – Specifying at which resolution the data should be binned. If not provided an optimal resolution will be used to build chart data with number of bins <= max_bin_size. One of dr.enums.DATETIME_TREND_PLOTS_RESOLUTION.

  • max_bin_size (Optional[int]) – An int between 1 and 1000, which specifies the maximum number of bins for the retrieval. Default is 500.

  • start_date (datetime.datetime, optional) – The start of the date range to return. If not specified, start date for requested plot will be used.

  • end_date (datetime.datetime, optional) – The end of the date range to return. If not specified, end date for requested plot will be used.

  • max_wait (int or None, optional) – The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.

Returns:

plot – a AccuracyOverTimePlot representing Accuracy over Time plot

Return type:

AccuracyOverTimePlot

Examples

import datarobot as dr
import pandas as pd
model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_accuracy_over_time_plot()
df = pd.DataFrame.from_dict(plot.bins)
figure = df.plot("start_date", ["actual", "predicted"]).get_figure()
figure.savefig("accuracy_over_time.png")
get_accuracy_over_time_plot_preview(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance=None, series_id=None, max_wait=600)

Retrieve Accuracy over Time preview plots for this model.

Added in version v2.25.

Parameters:
  • backtest (int or string, optional) – Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT

  • source (string, optional) – The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE

  • forecast_distance (Optional[int]) – Forecast distance to retrieve the plots for. If not specified, the first forecast distance for this project will be used. Only available for time series projects.

  • series_id (string, optional) – The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.

  • max_wait (int or None, optional) – The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.

Returns:

plot – a AccuracyOverTimePlotPreview representing Accuracy over Time plot preview

Return type:

AccuracyOverTimePlotPreview

Examples

import datarobot as dr
import pandas as pd
model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_accuracy_over_time_plot_preview()
df = pd.DataFrame.from_dict(plot.bins)
figure = df.plot("start_date", ["actual", "predicted"]).get_figure()
figure.savefig("accuracy_over_time_preview.png")
get_forecast_vs_actual_plots_metadata()

Retrieve Forecast vs Actual plots metadata for this model.

Added in version v2.25.

Returns:

metadata – a ForecastVsActualPlotsMetadata representing Forecast vs Actual plots metadata

Return type:

ForecastVsActualPlotsMetadata

get_forecast_vs_actual_plot(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance_start=None, forecast_distance_end=None, series_id=None, resolution=None, max_bin_size=None, start_date=None, end_date=None, max_wait=600)

Retrieve Forecast vs Actual plots for this model.

Added in version v2.25.

Parameters:
  • backtest (int or string, optional) – Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT

  • source (string, optional) – The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE

  • forecast_distance_start (Optional[int]:) – The start of forecast distance range (forecast window) to retrieve. If not specified, the first forecast distance for this project will be used.

  • forecast_distance_end (Optional[int]:) – The end of forecast distance range (forecast window) to retrieve. If not specified, the last forecast distance for this project will be used.

  • series_id (string, optional) – The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.

  • resolution (string, optional) – Specifying at which resolution the data should be binned. If not provided an optimal resolution will be used to build chart data with number of bins <= max_bin_size. One of dr.enums.DATETIME_TREND_PLOTS_RESOLUTION.

  • max_bin_size (Optional[int]) – An int between 1 and 1000, which specifies the maximum number of bins for the retrieval. Default is 500.

  • start_date (datetime.datetime, optional) – The start of the date range to return. If not specified, start date for requested plot will be used.

  • end_date (datetime.datetime, optional) – The end of the date range to return. If not specified, end date for requested plot will be used.

  • max_wait (int or None, optional) – The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.

Returns:

plot – a ForecastVsActualPlot representing Forecast vs Actual plot

Return type:

ForecastVsActualPlot

Examples

import datarobot as dr
import pandas as pd
import matplotlib.pyplot as plt

model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_forecast_vs_actual_plot()
df = pd.DataFrame.from_dict(plot.bins)

# As an example, get the forecasts for the 10th point
forecast_point_index = 10
# Pad the forecasts for plotting. The forecasts length must match the df length
forecasts = [None] * forecast_point_index + df.forecasts[forecast_point_index]
forecasts = forecasts + [None] * (len(df) - len(forecasts))

plt.plot(df.start_date, df.actual, label="Actual")
plt.plot(df.start_date, forecasts, label="Forecast")
forecast_point = df.start_date[forecast_point_index]
plt.title("Forecast vs Actual (Forecast Point {})".format(forecast_point))
plt.legend()
plt.savefig("forecast_vs_actual.png")
get_forecast_vs_actual_plot_preview(backtest=0, source=SOURCE_TYPE.VALIDATION, series_id=None, max_wait=600)

Retrieve Forecast vs Actual preview plots for this model.

Added in version v2.25.

Parameters:
  • backtest (int or string, optional) – Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT

  • source (string, optional) – The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE

  • series_id (string, optional) – The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.

  • max_wait (int or None, optional) – The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.

Returns:

plot – a ForecastVsActualPlotPreview representing Forecast vs Actual plot preview

Return type:

ForecastVsActualPlotPreview

Examples

import datarobot as dr
import pandas as pd
model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_forecast_vs_actual_plot_preview()
df = pd.DataFrame.from_dict(plot.bins)
figure = df.plot("start_date", ["actual", "predicted"]).get_figure()
figure.savefig("forecast_vs_actual_preview.png")
get_anomaly_over_time_plots_metadata()

Retrieve Anomaly over Time plots metadata for this model.

Added in version v2.25.

Returns:

metadata – a AnomalyOverTimePlotsMetadata representing Anomaly over Time plots metadata

Return type:

AnomalyOverTimePlotsMetadata

get_anomaly_over_time_plot(backtest=0, source=SOURCE_TYPE.VALIDATION, series_id=None, resolution=None, max_bin_size=None, start_date=None, end_date=None, max_wait=600)

Retrieve Anomaly over Time plots for this model.

Added in version v2.25.

Parameters:
  • backtest (int or string, optional) – Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT

  • source (string, optional) – The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE

  • series_id (string, optional) – The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.

  • resolution (string, optional) – Specifying at which resolution the data should be binned. If not provided an optimal resolution will be used to build chart data with number of bins <= max_bin_size. One of dr.enums.DATETIME_TREND_PLOTS_RESOLUTION.

  • max_bin_size (Optional[int]) – An int between 1 and 1000, which specifies the maximum number of bins for the retrieval. Default is 500.

  • start_date (datetime.datetime, optional) – The start of the date range to return. If not specified, start date for requested plot will be used.

  • end_date (datetime.datetime, optional) – The end of the date range to return. If not specified, end date for requested plot will be used.

  • max_wait (int or None, optional) – The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.

Returns:

plot – a AnomalyOverTimePlot representing Anomaly over Time plot

Return type:

AnomalyOverTimePlot

Examples

import datarobot as dr
import pandas as pd
model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_anomaly_over_time_plot()
df = pd.DataFrame.from_dict(plot.bins)
figure = df.plot("start_date", "predicted").get_figure()
figure.savefig("anomaly_over_time.png")
get_anomaly_over_time_plot_preview(prediction_threshold=0.5, backtest=0, source=SOURCE_TYPE.VALIDATION, series_id=None, max_wait=600)

Retrieve Anomaly over Time preview plots for this model.

Added in version v2.25.

Parameters:
  • prediction_threshold (Optional[float]) – Only bins with predictions exceeding this threshold will be returned in the response.

  • backtest (int or string, optional) – Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT

  • source (string, optional) – The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE

  • series_id (string, optional) – The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.

  • max_wait (int or None, optional) – The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.

Returns:

plot – a AnomalyOverTimePlotPreview representing Anomaly over Time plot preview

Return type:

AnomalyOverTimePlotPreview

Examples

import datarobot as dr
import pandas as pd
import matplotlib.pyplot as plt

model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_anomaly_over_time_plot_preview(prediction_threshold=0.01)
df = pd.DataFrame.from_dict(plot.bins)
x = pd.date_range(
    plot.start_date, plot.end_date, freq=df.end_date[0] - df.start_date[0]
)
plt.plot(x, [0] * len(x), label="Date range")
plt.plot(df.start_date, [0] * len(df.start_date), "ro", label="Anomaly")
plt.yticks([])
plt.legend()
plt.savefig("anomaly_over_time_preview.png")
initialize_anomaly_assessment(backtest, source, series_id=None)

Initialize the anomaly assessment insight and calculate Shapley explanations for the most anomalous points in the subset. The insight is available for anomaly detection models in time series unsupervised projects which also support calculation of Shapley values.

Parameters:
  • backtest (int starting with 0 or "holdout") – The backtest to compute insight for.

  • source ("training" or "validation") – The source to compute insight for.

  • series_id (string) – Required for multiseries projects. The series id to compute insight for. Say if there is a series column containing cities, the example of the series name to pass would be “Boston”

Return type:

AnomalyAssessmentRecord

get_anomaly_assessment_records(backtest=None, source=None, series_id=None, limit=100, offset=0, with_data_only=False)

Retrieve computed Anomaly Assessment records for this model. Model must be an anomaly detection model in time series unsupervised project which also supports calculation of Shapley values.

Records can be filtered by the data backtest, source and series_id. The results can be limited.

Added in version v2.25.

Parameters:
  • backtest (int starting with 0 or "holdout") – The backtest of the data to filter records by.

  • source ("training" or "validation") – The source of the data to filter records by.

  • series_id (string) – The series id to filter records by.

  • limit (Optional[int])

  • offset (Optional[int])

  • with_data_only (Optional[bool]) – Whether to return only records with preview and explanations available. False by default.

Returns:

records – a AnomalyAssessmentRecord representing Anomaly Assessment Record

Return type:

list of AnomalyAssessmentRecord

get_feature_impact(with_metadata=False, backtest=None, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Else where this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters:
  • with_metadata (bool) – The flag indicating if the result should include the metadata as well.

  • backtest (int or string) – The index of the backtest unless it is holdout then it is string ‘holdout’. This is supported only in DatetimeModels

  • data_slice_filter (DataSlice, optional) – (New in version v3.4) A data slice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns:

The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:

  • featureImpacts - Feature Impact data as a dictionary. Each item is a dict with

    keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.

  • shapBased - A boolean that indicates whether Feature Impact was calculated using

    Shapley values.

  • ranRedundancyDetection - A boolean that indicates whether redundant feature

    identification was run while calculating this Feature Impact.

  • rowCount - An integer or None that indicates the number of rows that was used to

    calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.

  • count - An integer with the number of features under the featureImpacts.

Return type:

list or dict

Raises:

ClientError – If the feature impacts have not been computed.

request_feature_impact(row_count=None, with_metadata=False, backtest=None, data_slice_filter=<datarobot.models.model.Sentinel object>)

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters:
  • row_count (int) – The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multi-class (that has a separate method) and time series projects.

  • with_metadata (bool) – The flag indicating if the result should include the metadata as well.

  • backtest (int or string) – The index of the backtest unless it is holdout then it is string ‘holdout’. This is supported only in DatetimeModels

  • data_slice_filter (DataSlice, optional) – (New in version v3.4) A data slice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns:

job – A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Return type:

Job

Raises:

JobAlreadyRequested – If the feature impacts have already been requested.

get_or_request_feature_impact(max_wait=600, row_count=None, with_metadata=False, backtest=None, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:
  • max_wait (Optional[int]) – The maximum time to wait for a requested feature impact job to complete before erroring

  • row_count (int) – The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multi-class (that has a separate method) and time series projects.

  • with_metadata (bool) – The flag indicating if the result should include the metadata as well.

  • backtest (str) – Feature Impact backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.

  • data_slice_filter (DataSlice, optional) – (New in version v3.4) A data slice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns:

feature_impacts – The feature impact data. See get_feature_impact for the exact schema.

Return type:

list or dict

request_lift_chart(source=None, backtest_index=None, data_slice_filter=<datarobot.models.model.Sentinel object>)

(New in version v3.4) Request the model Lift Chart for the specified backtest data slice.

Parameters:
  • source (str) – (Deprecated in version v3.4) Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. If backtest_index is present then this will be ignored.

  • backtest_index (str) – Lift chart data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.

  • data_slice_filter (DataSlice, optional) – A data slice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then request_lift_chart will raise a ValueError.

Returns:

status_check_job – Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

get_lift_chart(source=None, backtest_index=None, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

(New in version v3.4) Retrieve the model Lift chart for the specified backtest and data slice.

Parameters:
  • source (str) – (Deprecated in version v3.4) Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model. If backtest_index is present then this will be ignored.

  • backtest_index (str) – Lift chart data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A data slice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

Returns:

Model lift chart data

Return type:

LiftChart

Raises:
  • ClientError – If the insight is not available for this model

  • ValueError – If data_slice_filter passed as None

request_roc_curve(source=None, backtest_index=None, data_slice_filter=<datarobot.models.model.Sentinel object>)

(New in version v3.4) Request the binary model Roc Curve for the specified backtest and data slice.

Parameters:
  • source (str) – (Deprecated in version v3.4) Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. If backtest_index is present then this will be ignored.

  • backtest_index (str) – ROC curve data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.

  • data_slice_filter (DataSlice, optional) – A data slice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then request_roc_curve will raise a ValueError.

Returns:

status_check_job – Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

get_roc_curve(source=None, backtest_index=None, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

(New in version v3.4) Retrieve the ROC curve for a binary model for the specified backtest and data slice.

Parameters:
  • source (str) – (Deprecated in version v3.4) ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model. If backtest_index is present then this will be ignored.

  • backtest_index (str) – ROC curve data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A data slice used to filter the return values based on the data slice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns:

Model ROC curve data

Return type:

RocCurve

Raises:
  • ClientError – If the insight is not available for this model

  • TypeError – If the underlying project type is multilabel

  • ValueError – If data_slice_filter passed as None

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters:
  • params (dict) – Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

  • description (str) – Human-readable string describing the newly advanced-tuned model

Returns:

The created job to build the model

Return type:

ModelJob

delete()

Delete a model from the project’s leaderboard.

Return type:

None

download_scoring_code(file_name, source_code=False)

Download the Scoring Code JAR.

Parameters:
  • file_name (str) – File path where scoring code will be saved.

  • source_code (Optional[bool]) – Set to True to download source code archive. It will not be executable.

Return type:

None

download_training_artifact(file_name)

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters:

file_name (str) – File path where trained model artifact(s) will be saved.

classmethod from_data(data)

Instantiate an object of this class using a dict.

Parameters:

data (dict) – Correctly snake_cased keys and their values.

Return type:

TypeVar(T, bound= APIObject)

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys

  • parameter_name : (str) name of the parameter (unique per task, see below)

  • parameter_id : (str) opaque ID string uniquely identifying parameter

  • default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)

  • current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.

  • task_name : (str) name of the task that this parameter belongs to

  • constraints: (dict) see the notes below

  • vertex_id: (str) ID of vertex that this parameter belongs to

Return type:

dict

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.

  • ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.

  • unicode: The parameter may be any Python unicode object.

  • int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].

  • float: The value may be an object of type float within the specified range (inclusive).

  • intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)

Retrieve a list of all confusion matrices available for the model.

Parameters:

fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:

Data for all available confusion charts for model.

Return type:

list of ConfusionChart

get_all_feature_impacts(data_slice_filter=None)

Retrieve a list of all feature impact results available for the model.

Parameters:

data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.

Returns:

Data for all available model feature impacts. Or an empty list if not data found.

Return type:

list of dicts

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()
get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all Lift charts available for the model.

Parameters:
  • fallback_to_parent_insights (Optional[bool]) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:

Data for all available model lift charts. Or an empty list if no data found.

Return type:

list of LiftChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()
get_all_multiclass_lift_charts(fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>, target_class=None)

Retrieve a list of all Lift charts available for the model.

Parameters:
  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

  • target_class (str, optional) – Lift chart target class name.

Returns:

Data for all available model lift charts.

Return type:

list of LiftChart

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all residuals charts available for the model.

Parameters:
  • fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:

Data for all available model residuals charts.

Return type:

list of ResidualsChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()
get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all ROC curves available for the model.

Parameters:
  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:

Data for all available model ROC curves. Or an empty list if no RocCurves are found.

Return type:

list of RocCurve

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()
get_confusion_chart(source, fallback_to_parent_insights=False)

Retrieve a multiclass model’s confusion matrix for the specified source.

Parameters:
  • source (str) – Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:

Model ConfusionChart data

Return type:

ConfusionChart

Raises:

ClientError – If the insight is not available for this model

get_cross_class_accuracy_scores()

Retrieves a list of Cross Class Accuracy scores for the model.

Return type:

json

get_data_disparity_insights(feature, class_name1, class_name2)

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters:
  • feature (str) – Bias and Fairness protected feature name.

  • class_name1 (str) – One of the compared classes

  • class_name2 (str) – Another compared class

Return type:

json

get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)

Retrieve a list of Per Class Bias insights for the model.

Parameters:
  • fairness_metrics_set (Optional[str]) – Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

  • offset (Optional[int]) – Number of items to skip.

  • limit (Optional[int]) – Number of items to return.

Return type:

json

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:

features – The names of the features used in the model.

Return type:

List[str]

get_frozen_child_models()

Retrieve the IDs for all models that are frozen from this model.

Return type:

A list of Models

get_labelwise_roc_curves(source, fallback_to_parent_insights=False)

Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .

Added in version v2.24.

Parameters:
  • source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns:

Labelwise ROC Curve instances for source and all labels

Return type:

list of LabelwiseRocCurve

Raises:

ClientError – If the insight is not available for this model

get_missing_report_info()

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns:

The queried model missing report, sorted by missing count (DESCENDING order).

Return type:

An iterable of MissingReportPerFeature

get_model_blueprint_chart()

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns:

The queried model blueprint chart.

Return type:

ModelBlueprintChart

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:

All documents available for the model.

Return type:

list of BlueprintTaskDocument

get_model_blueprint_json()

Get the blueprint json representation used by this model.

Returns:

Json representation of the blueprint stages.

Return type:

BlueprintJson

get_multiclass_feature_impact()

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:

feature_impacts – The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Return type:

list of dict

Raises:

ClientError – If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>, target_class=None)

Retrieve model Lift chart for the specified source.

Parameters:
  • source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

  • target_class (str, optional) – Lift chart target class name.

Returns:

Model lift chart data for each saved target class

Return type:

list of LiftChart

Raises:

ClientError – If the insight is not available for this model

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)

Retrieve model Lift charts for the specified source.

Added in version v2.24.

Parameters:
  • source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:

Model lift chart data for each saved target class

Return type:

list of LiftChart

Raises:

ClientError – If the insight is not available for this model

get_num_iterations_trained()

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns:

  • projectId (str) – id of project containing the model

  • modelId (str) – id of the model

  • data (array) – list of numEstimatorsItem objects, one for each modeling stage.

  • numEstimatorsItem will be of the form

  • stage (str) – indicates the modeling stage (for multi-stage models); None of single-stage models

  • numIterations (int) – the number of estimators or iterations trained by the model

get_parameters()

Retrieve model parameters.

Returns:

Model parameters for this model.

Return type:

ModelParameters

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:

Model ParetoFront data

Return type:

ParetoFront

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:

prime_eligibility – a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

Return type:

dict

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve model residuals chart for the specified source.

Parameters:
  • source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.

Returns:

Model residuals chart data

Return type:

ResidualsChart

Raises:
  • ClientError – If the insight is not available for this model

  • ValueError – If data_slice_filter passed as None

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:

rulesets

Return type:

list of Ruleset

get_supported_capabilities()

Retrieves a summary of the capabilities supported by a model.

Added in version v2.14.

Returns:

  • supportsBlending (bool) – whether the model supports blending

  • supportsMonotonicConstraints (bool) – whether the model supports monotonic constraints

  • hasWordCloud (bool) – whether the model has word cloud data available

  • eligibleForPrime (bool) – (Deprecated in version v3.6) whether the model is eligible for Prime

  • hasParameters (bool) – whether the model has parameters that can be retrieved

  • supportsCodeGeneration (bool) – (New in version v2.18) whether the model supports code generation

  • supportsShap (bool) –

    (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based

    feature Importance

  • supportsEarlyStopping (bool) – (New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_uri()
Returns:

url – Permanent static hyperlink to this model at leaderboard.

Return type:

str

get_word_cloud(exclude_stop_words=False)

Retrieve word cloud data for the model.

Parameters:

exclude_stop_words (Optional[bool]) – Set to True if you want stopwords filtered out of response.

Returns:

Word cloud data for the model.

Return type:

WordCloud

incremental_train(data_stage_id, training_data_name=None)

Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.

Return type:

ModelJob

classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)

Retrieve paginated model records, sorted by scores, with optional filtering.

Parameters:
  • sort_by_partition (str, one of validation, backtesting, crossValidation or holdout) – Set the partition to use for sorted (by score) list of models. validation is the default.

  • sort_by_metric (str) – Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.

  • with_metric (str) – For a single-metric list of results, specify that project metric.

  • search_term (str) – If specified, only models containing the term in their name or processes are returned.

  • featurelists (List[str]) – If specified, only models trained on selected featurelists are returned.

  • families (List[str]) – If specified, only models belonging to selected families are returned.

  • blueprints (List[str]) – If specified, only models trained on specified blueprint IDs are returned.

  • labels (List[str], starred or prepared for deployment) – If specified, only models tagged with all listed labels are returned.

  • characteristics (List[str]) – If specified, only models matching all listed characteristics are returned.

  • training_filters (List[str]) – If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings

  • number_of_clusters (list of int) – Filter models by number of clusters. Applicable only in unsupervised clustering projects.

  • limit (int)

  • offset (int)

Returns:

generic_models

Return type:

list of GenericModel

open_in_browser()

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type:

None

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:

job – the job generating the rulesets

Return type:

Job

request_cross_class_accuracy_scores()

Request data disparity insights to be computed for the model.

Returns:

status_id – A statusId of computation request.

Return type:

str

request_data_disparity_insights(feature, compared_class_names)

Request data disparity insights to be computed for the model.

Parameters:
  • feature (str) – Bias and Fairness protected feature name.

  • compared_class_names (list(str)) – List of two classes to compare

Returns:

status_id – A statusId of computation request.

Return type:

str

request_external_test(dataset_id, actual_value_column=None)

Request external test to compute scores and insights on an external test dataset

Parameters:
  • dataset_id (string) – The dataset to make predictions against (as uploaded from Project.upload_dataset)

  • actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

Returns:

job – a Job representing external dataset insights computation

Return type:

Job

request_fairness_insights(fairness_metrics_set=None)

Request fairness insights to be computed for the model.

Parameters:

fairness_metrics_set (Optional[str]) – Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

Returns:

status_id – A statusId of computation request.

Return type:

str

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)

Train a new frozen model with parameters from this model.

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
  • training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

  • training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

  • training_start_date (datetime.datetime, optional) – the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

  • training_end_date (datetime.datetime, optional) – the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

  • time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

  • sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

Returns:

model_job – the modeling job training a frozen model

Return type:

ModelJob

request_per_class_fairness_insights(fairness_metrics_set=None)

Request per-class fairness insights be computed for the model.

Parameters:

fairness_metrics_set (Optional[str]) – The fairness metric used to calculate the fairness scores. Value can be any one of <datarobot.enums.FairnessMetricsSet>.

Returns:

status_check_job – The returned object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)

Requests predictions against a previously uploaded dataset.

Parameters:
  • dataset_id (string, optional) – The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)

  • dataset (Dataset, optional) – The dataset to make predictions against (as uploaded from Project.upload_dataset)

  • dataframe (pd.DataFrame, optional) – (New in v3.0) The dataframe to make predictions against

  • file_path (Optional[str]) – (New in v3.0) Path to file to make predictions against

  • file (IOBase, optional) – (New in v3.0) File to make predictions against

  • include_prediction_intervals (Optional[bool]) – (New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.

  • prediction_intervals_size (Optional[int]) – (New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).

  • forecast_point (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.

  • predictions_start_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.

  • predictions_end_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.

  • actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

  • explanation_algorithm ((New in version v2.21) optional; If set to 'shap', the) – response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).

  • max_explanations ((New in version v2.21) int optional; specifies the maximum number of) – explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.

  • max_ngram_explanations (optional;  int or str) – (New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.

Returns:

job – The job computing the predictions

Return type:

PredictJob

request_residuals_chart(source, data_slice_id=None)

Request the model residuals chart for the specified source.

Parameters:
  • source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

status_check_job – Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

set_prediction_threshold(threshold)

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters:

threshold (float) – only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:

Session for setting up and running Advanced Tuning on a model

Return type:

AdvancedTuningSession

train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)

Trains this model on a different featurelist or sample size.

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
  • featurelist_id (Optional[str]) – the featurelist to use to train the model. If not specified, the featurelist of this model is used.

  • training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified, neither training_duration nor use_project_settings may be specified.

  • training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should span. If specified, neither training_row_count nor use_project_settings may be specified.

  • use_project_settings (Optional[bool]) – (New in version v2.20) defaults to False. If True, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neither training_row_count nor training_duration may be specified.

  • time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

  • sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

  • monotonic_increasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

  • monotonic_decreasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

  • n_clusters (Optional[int]) – (New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.

Returns:

job – the created job to build the model

Return type:

ModelJob

train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters:
  • data_stage_id (str) – The id of the data stage to use for training.

  • training_data_name (Optional[str]) – The name of the iteration or data stage to indicate what the incremental learning was performed on.

  • data_stage_encoding (Optional[str]) – The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252

  • data_stage_encoding – The delimiter used by the data in the data stage (default: ‘,’).

  • data_stage_compression (Optional[str]) – The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip

Returns:

job – The created job that is retraining the model

Return type:

ModelJob

unstar_model()

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

Frozen models

class datarobot.models.FrozenModel

Represents a model tuned with parameters which are derived from another model

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Variables:
  • id (str) – the id of the model

  • project_id (str) – the id of the project the model belongs to

  • processes (List[str]) – the processes used by the model

  • featurelist_name (str) – the name of the featurelist used by the model

  • featurelist_id (str) – the id of the featurelist used by the model

  • sample_pct (float) – the percentage of the project dataset used in training the model

  • training_row_count (int or None) – the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.

  • training_duration (str or None) – only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

  • training_start_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

  • training_end_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

  • model_type (str) – what model this is, e.g. ‘Nystroem Kernel SVM Regressor’

  • model_category (str) – what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models

  • is_frozen (bool) – whether this model is a frozen model

  • parent_model_id (str) – the id of the model that tuning parameters are derived from

  • blueprint_id (str) – the id of the blueprint used in this model

  • metrics (dict) – a mapping from each metric to the model’s scores for that metric

  • monotonic_increasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

  • monotonic_decreasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

  • supports_monotonic_constraints (bool) – optional, whether this model supports enforcing monotonic constraints

  • is_starred (bool) – whether this model marked as starred

  • prediction_threshold (float) – for binary classification projects, the threshold used for predictions

  • prediction_threshold_read_only (bool) – indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

  • model_number (integer) – model number assigned to a model

  • supports_composable_ml (bool or None) – (New in version v2.26) whether this model is supported in the Composable ML.

classmethod get(project_id, model_id)

Retrieve a specific frozen model.

Parameters:
  • project_id (str) – The project’s id.

  • model_id (str) – The model_id of the leaderboard item to retrieve.

Returns:

model – The queried instance.

Return type:

FrozenModel

Rating table models

class datarobot.models.RatingTableModel

A model that has a rating table.

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Variables:
  • id (str) – the id of the model

  • project_id (str) – the id of the project the model belongs to

  • processes (List[str]) – the processes used by the model

  • featurelist_name (str) – the name of the featurelist used by the model

  • featurelist_id (str) – the id of the featurelist used by the model

  • sample_pct (float or None) – the percentage of the project dataset used in training the model. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date and training_end_date instead.

  • training_row_count (int or None) – the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.

  • training_duration (str or None) – only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

  • training_start_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

  • training_end_date (datetime or None) – only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

  • model_type (str) – what model this is, e.g. ‘Nystroem Kernel SVM Regressor’

  • model_category (str) – what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models

  • is_frozen (bool) – whether this model is a frozen model

  • blueprint_id (str) – the id of the blueprint used in this model

  • metrics (dict) – a mapping from each metric to the model’s scores for that metric

  • rating_table_id (str) – the id of the rating table that belongs to this model

  • monotonic_increasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

  • monotonic_decreasing_featurelist_id (str) – optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

  • supports_monotonic_constraints (bool) – optional, whether this model supports enforcing monotonic constraints

  • is_starred (bool) – whether this model marked as starred

  • prediction_threshold (float) – for binary classification projects, the threshold used for predictions

  • prediction_threshold_read_only (bool) – indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

  • model_number (integer) – model number assigned to a model

  • supports_composable_ml (bool or None) – (New in version v2.26) whether this model is supported in the Composable ML.

__init__(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, rating_table_id=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)
classmethod get(project_id, model_id)

Retrieve a specific rating table model

If the project does not have a rating table, a ClientError will occur.

Parameters:
  • project_id (str) – the id of the project the model belongs to

  • model_id (str) – the id of the model to retrieve

Returns:

model – the model

Return type:

RatingTableModel

classmethod create_from_rating_table(project_id, rating_table_id)

Creates a new model from a validated rating table record. The RatingTable must not be associated with an existing model.

Parameters:
  • project_id (str) – the id of the project the rating table belongs to

  • rating_table_id (str) – the id of the rating table to create this model from

Returns:

job – an instance of created async job

Return type:

Job

Raises:
  • ClientError – Raised if creating model from a RatingTable that failed validation

  • JobAlreadyRequested – Raised if creating model from a RatingTable that is already associated with a RatingTableModel

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters:
  • params (dict) – Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

  • description (str) – Human-readable string describing the newly advanced-tuned model

Returns:

The created job to build the model

Return type:

ModelJob

cross_validate()

Run cross validation on the model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:

The created job to build the model

Return type:

ModelJob

delete()

Delete a model from the project’s leaderboard.

Return type:

None

download_scoring_code(file_name, source_code=False)

Download the Scoring Code JAR.

Parameters:
  • file_name (str) – File path where scoring code will be saved.

  • source_code (Optional[bool]) – Set to True to download source code archive. It will not be executable.

Return type:

None

download_training_artifact(file_name)

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters:

file_name (str) – File path where trained model artifact(s) will be saved.

classmethod from_data(data)

Instantiate an object of this class using a dict.

Parameters:

data (dict) – Correctly snake_cased keys and their values.

Return type:

TypeVar(T, bound= APIObject)

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys

  • parameter_name : (str) name of the parameter (unique per task, see below)

  • parameter_id : (str) opaque ID string uniquely identifying parameter

  • default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)

  • current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.

  • task_name : (str) name of the task that this parameter belongs to

  • constraints: (dict) see the notes below

  • vertex_id: (str) ID of vertex that this parameter belongs to

Return type:

dict

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.

  • ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.

  • unicode: The parameter may be any Python unicode object.

  • int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].

  • float: The value may be an object of type float within the specified range (inclusive).

  • intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)

Retrieve a list of all confusion matrices available for the model.

Parameters:

fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:

Data for all available confusion charts for model.

Return type:

list of ConfusionChart

get_all_feature_impacts(data_slice_filter=None)

Retrieve a list of all feature impact results available for the model.

Parameters:

data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.

Returns:

Data for all available model feature impacts. Or an empty list if not data found.

Return type:

list of dicts

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()
get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all Lift charts available for the model.

Parameters:
  • fallback_to_parent_insights (Optional[bool]) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:

Data for all available model lift charts. Or an empty list if no data found.

Return type:

list of LiftChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()
get_all_multiclass_lift_charts(fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>, target_class=None)

Retrieve a list of all Lift charts available for the model.

Parameters:
  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

  • target_class (str, optional) – Lift chart target class name.

Returns:

Data for all available model lift charts.

Return type:

list of LiftChart

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all residuals charts available for the model.

Parameters:
  • fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:

Data for all available model residuals charts.

Return type:

list of ResidualsChart

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()
get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all ROC curves available for the model.

Parameters:
  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:

Data for all available model ROC curves. Or an empty list if no RocCurves are found.

Return type:

list of RocCurve

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()
get_confusion_chart(source, fallback_to_parent_insights=False)

Retrieve a multiclass model’s confusion matrix for the specified source.

Parameters:
  • source (str) – Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:

Model ConfusionChart data

Return type:

ConfusionChart

Raises:

ClientError – If the insight is not available for this model

get_cross_class_accuracy_scores()

Retrieves a list of Cross Class Accuracy scores for the model.

Return type:

json

get_cross_validation_scores(partition=None, metric=None)

Return a dictionary, keyed by metric, showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:
  • partition (float) – optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.

  • metric (unicode) – optional name of the metric to filter to resulting cross validation scores by

Returns:

cross_validation_scores – A dictionary keyed by metric showing cross validation scores per partition.

Return type:

dict

get_data_disparity_insights(feature, class_name1, class_name2)

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters:
  • feature (str) – Bias and Fairness protected feature name.

  • class_name1 (str) – One of the compared classes

  • class_name2 (str) – Another compared class

Return type:

json

get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)

Retrieve a list of Per Class Bias insights for the model.

Parameters:
  • fairness_metrics_set (Optional[str]) – Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

  • offset (Optional[int]) – Number of items to skip.

  • limit (Optional[int]) – Number of items to return.

Return type:

json

get_feature_effect(source, data_slice_id=None)

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
  • source (string) – The source Feature Effects are retrieved for.

  • data_slice_id (string, optional) – ID for the data slice used in the request. If None, retrieve unsliced insight data.

Returns:

feature_effects – The feature effects data.

Return type:

FeatureEffects

Raises:

ClientError – If the feature effects have not been computed or source is not valid value.

get_feature_effect_metadata()

Retrieve Feature Effects metadata. Response contains status and available model sources.

  • Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.

  • When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.

  • Feature Effects for holdout is not available when holdout was not unlocked for the project.

Use source to retrieve Feature Effects, selecting one of the provided sources.

Returns:

feature_effect_metadata

Return type:

FeatureEffectMetadata

get_feature_effects_multiclass(source='training', class_=None)

Retrieve Feature Effects for the multiclass model.

Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
  • source (str) – The source Feature Effects are retrieved for.

  • class (str or None) – The class name Feature Effects are retrieved for.

Returns:

The list of multiclass feature effects.

Return type:

list

Raises:

ClientError – If Feature Effects have not been computed or source is not valid value.

get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters:
  • with_metadata (bool) – The flag indicating if the result should include the metadata as well.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.

Returns:

The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:

  • featureImpacts - Feature Impact data as a dictionary. Each item is a dict with

    keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.

  • shapBased - A boolean that indicates whether Feature Impact was calculated using

    Shapley values.

  • ranRedundancyDetection - A boolean that indicates whether redundant feature

    identification was run while calculating this Feature Impact.

  • rowCount - An integer or None that indicates the number of rows that was used to

    calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.

  • count - An integer with the number of features under the featureImpacts.

Return type:

list or dict

Raises:
  • ClientError – If the feature impacts have not been computed.

  • ValueError – If data_slice_filter passed as None

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:

features – The names of the features used in the model.

Return type:

List[str]

get_frozen_child_models()

Retrieve the IDs for all models that are frozen from this model.

Return type:

A list of Models

get_labelwise_roc_curves(source, fallback_to_parent_insights=False)

Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .

Added in version v2.24.

Parameters:
  • source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns:

Labelwise ROC Curve instances for source and all labels

Return type:

list of LabelwiseRocCurve

Raises:

ClientError – If the insight is not available for this model

get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the model Lift chart for the specified source.

Parameters:
  • source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.

  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

Returns:

Model lift chart data

Return type:

LiftChart

Raises:
  • ClientError – If the insight is not available for this model

  • ValueError – If data_slice_filter passed as None

get_missing_report_info()

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns:

The queried model missing report, sorted by missing count (DESCENDING order).

Return type:

An iterable of MissingReportPerFeature

get_model_blueprint_chart()

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns:

The queried model blueprint chart.

Return type:

ModelBlueprintChart

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:

All documents available for the model.

Return type:

list of BlueprintTaskDocument

get_model_blueprint_json()

Get the blueprint json representation used by this model.

Returns:

Json representation of the blueprint stages.

Return type:

BlueprintJson

get_multiclass_feature_impact()

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:

feature_impacts – The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Return type:

list of dict

Raises:

ClientError – If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>, target_class=None)

Retrieve model Lift chart for the specified source.

Parameters:
  • source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

  • target_class (str, optional) – Lift chart target class name.

Returns:

Model lift chart data for each saved target class

Return type:

list of LiftChart

Raises:

ClientError – If the insight is not available for this model

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)

Retrieve model Lift charts for the specified source.

Added in version v2.24.

Parameters:
  • source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:

Model lift chart data for each saved target class

Return type:

list of LiftChart

Raises:

ClientError – If the insight is not available for this model

get_num_iterations_trained()

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns:

  • projectId (str) – id of project containing the model

  • modelId (str) – id of the model

  • data (array) – list of numEstimatorsItem objects, one for each modeling stage.

  • numEstimatorsItem will be of the form

  • stage (str) – indicates the modeling stage (for multi-stage models); None of single-stage models

  • numIterations (int) – the number of estimators or iterations trained by the model

get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)

Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source.

Parameters:
  • source (string) – The source Feature Effects are retrieved for.

  • max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effect job to complete before erroring.

  • row_count (Optional[int]) – (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

  • data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

feature_effects – The Feature Effects data.

Return type:

FeatureEffects

get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)

Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.

Parameters:
  • source (string) – The source Feature Effects retrieve for.

  • class (str or None) – The class name Feature Effects retrieve for.

  • row_count (int) – The number of rows from dataset to use for Feature Impact calculation.

  • top_n_features (int or None) – Number of top features (ranked by Feature Impact) used to calculate Feature Effects.

  • features (list or None) – The list of features used to calculate Feature Effects.

  • max_wait (Optional[int]) – The maximum time to wait for a requested Feature Effects job to complete before erroring.

Returns:

feature_effects – The list of multiclass feature effects data.

Return type:

list of FeatureEffectsMulticlass

get_or_request_feature_impact(max_wait=600, **kwargs)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:
  • max_wait (Optional[int]) – The maximum time to wait for a requested feature impact job to complete before erroring

  • **kwargs – Arbitrary keyword arguments passed to request_feature_impact.

Returns:

feature_impacts – The feature impact data. See get_feature_impact for the exact schema.

Return type:

list or dict

get_parameters()

Retrieve model parameters.

Returns:

Model parameters for this model.

Return type:

ModelParameters

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:

Model ParetoFront data

Return type:

ParetoFront

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:

prime_eligibility – a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

Return type:

dict

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve model residuals chart for the specified source.

Parameters:
  • source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • fallback_to_parent_insights (bool) – Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.

Returns:

Model residuals chart data

Return type:

ResidualsChart

Raises:
  • ClientError – If the insight is not available for this model

  • ValueError – If data_slice_filter passed as None

get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.

Parameters:
  • source (str) – ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.

  • fallback_to_parent_insights (bool) – (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

  • data_slice_filter (DataSlice, optional) – A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns:

Model ROC curve data

Return type:

RocCurve

Raises:
  • ClientError – If the insight is not available for this model

  • (New in version v3.0) TypeError – If the underlying project type is multilabel

  • ValueError – If data_slice_filter passed as None

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:

rulesets

Return type:

list of Ruleset

get_supported_capabilities()

Retrieves a summary of the capabilities supported by a model.

Added in version v2.14.

Returns:

  • supportsBlending (bool) – whether the model supports blending

  • supportsMonotonicConstraints (bool) – whether the model supports monotonic constraints

  • hasWordCloud (bool) – whether the model has word cloud data available

  • eligibleForPrime (bool) – (Deprecated in version v3.6) whether the model is eligible for Prime

  • hasParameters (bool) – whether the model has parameters that can be retrieved

  • supportsCodeGeneration (bool) – (New in version v2.18) whether the model supports code generation

  • supportsShap (bool) –

    (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based

    feature Importance

  • supportsEarlyStopping (bool) – (New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_uri()
Returns:

url – Permanent static hyperlink to this model at leaderboard.

Return type:

str

get_word_cloud(exclude_stop_words=False)

Retrieve word cloud data for the model.

Parameters:

exclude_stop_words (Optional[bool]) – Set to True if you want stopwords filtered out of response.

Returns:

Word cloud data for the model.

Return type:

WordCloud

incremental_train(data_stage_id, training_data_name=None)

Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.

Return type:

ModelJob

classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)

Retrieve paginated model records, sorted by scores, with optional filtering.

Parameters:
  • sort_by_partition (str, one of validation, backtesting, crossValidation or holdout) – Set the partition to use for sorted (by score) list of models. validation is the default.

  • sort_by_metric (str) – Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.

  • with_metric (str) – For a single-metric list of results, specify that project metric.

  • search_term (str) – If specified, only models containing the term in their name or processes are returned.

  • featurelists (List[str]) – If specified, only models trained on selected featurelists are returned.

  • families (List[str]) – If specified, only models belonging to selected families are returned.

  • blueprints (List[str]) – If specified, only models trained on specified blueprint IDs are returned.

  • labels (List[str], starred or prepared for deployment) – If specified, only models tagged with all listed labels are returned.

  • characteristics (List[str]) – If specified, only models matching all listed characteristics are returned.

  • training_filters (List[str]) – If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings

  • number_of_clusters (list of int) – Filter models by number of clusters. Applicable only in unsupervised clustering projects.

  • limit (int)

  • offset (int)

Returns:

generic_models

Return type:

list of GenericModel

open_in_browser()

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type:

None

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:

job – the job generating the rulesets

Return type:

Job

request_cross_class_accuracy_scores()

Request data disparity insights to be computed for the model.

Returns:

status_id – A statusId of computation request.

Return type:

str

request_data_disparity_insights(feature, compared_class_names)

Request data disparity insights to be computed for the model.

Parameters:
  • feature (str) – Bias and Fairness protected feature name.

  • compared_class_names (list(str)) – List of two classes to compare

Returns:

status_id – A statusId of computation request.

Return type:

str

request_external_test(dataset_id, actual_value_column=None)

Request external test to compute scores and insights on an external test dataset

Parameters:
  • dataset_id (string) – The dataset to make predictions against (as uploaded from Project.upload_dataset)

  • actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

Returns:

job – a Job representing external dataset insights computation

Return type:

Job

request_fairness_insights(fairness_metrics_set=None)

Request fairness insights to be computed for the model.

Parameters:

fairness_metrics_set (Optional[str]) – Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

Returns:

status_id – A statusId of computation request.

Return type:

str

request_feature_effect(row_count=None, data_slice_id=None)

Submit request to compute Feature Effects for the model.

See get_feature_effect for more information on the result of the job.

Parameters:
  • row_count (int) – (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

  • data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

job – A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.

Return type:

Job

Raises:

JobAlreadyRequested – If the feature effect have already been requested.

request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)

Request Feature Effects computation for the multiclass model.

See get_feature_effect for more information on the result of the job.

Parameters:
  • row_count (int) – The number of rows from dataset to use for Feature Impact calculation.

  • top_n_features (int or None) – Number of top features (ranked by feature impact) used to calculate Feature Effects.

  • features (list or None) – The list of features used to calculate Feature Effects.

Returns:

job – A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.

Return type:

Job

request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters:
  • row_count (Optional[int]) – The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.

  • with_metadata (Optional[bool]) – Flag indicating whether the result should include the metadata. If true, metadata is included.

  • data_slice_id (Optional[str]) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

job – Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.

Return type:

Job or status_id

Raises:

JobAlreadyRequested – If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)

Train a new frozen model with parameters from this model.

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
  • training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

  • training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

  • training_start_date (datetime.datetime, optional) – the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

  • training_end_date (datetime.datetime, optional) – the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

  • time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

  • sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

Returns:

model_job – the modeling job training a frozen model

Return type:

ModelJob

request_frozen_model(sample_pct=None, training_row_count=None)

Train a new frozen model with parameters from this model

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:
  • sample_pct (float) – optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.

  • training_row_count (int) – (New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns:

model_job – the modeling job training a frozen model

Return type:

ModelJob

request_lift_chart(source, data_slice_id=None)

Request the model Lift Chart for the specified source.

Parameters:
  • source (str) – Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

status_check_job – Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_per_class_fairness_insights(fairness_metrics_set=None)

Request per-class fairness insights be computed for the model.

Parameters:

fairness_metrics_set (Optional[str]) – The fairness metric used to calculate the fairness scores. Value can be any one of <datarobot.enums.FairnessMetricsSet>.

Returns:

status_check_job – The returned object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)

Requests predictions against a previously uploaded dataset.

Parameters:
  • dataset_id (string, optional) – The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)

  • dataset (Dataset, optional) – The dataset to make predictions against (as uploaded from Project.upload_dataset)

  • dataframe (pd.DataFrame, optional) – (New in v3.0) The dataframe to make predictions against

  • file_path (Optional[str]) – (New in v3.0) Path to file to make predictions against

  • file (IOBase, optional) – (New in v3.0) File to make predictions against

  • include_prediction_intervals (Optional[bool]) – (New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.

  • prediction_intervals_size (Optional[int]) – (New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).

  • forecast_point (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.

  • predictions_start_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.

  • predictions_end_date (datetime.datetime or None, optional) – (New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.

  • actual_value_column (string, optional) – (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

  • explanation_algorithm ((New in version v2.21) optional; If set to 'shap', the) – response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).

  • max_explanations ((New in version v2.21) int optional; specifies the maximum number of) – explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.

  • max_ngram_explanations (optional;  int or str) – (New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.

Returns:

job – The job computing the predictions

Return type:

PredictJob

request_residuals_chart(source, data_slice_id=None)

Request the model residuals chart for the specified source.

Parameters:
  • source (str) – Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

status_check_job – Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_roc_curve(source, data_slice_id=None)

Request the model Roc Curve for the specified source.

Parameters:
  • source (str) – Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

  • data_slice_id (string, optional) – ID for the data slice used in the request. If None, request unsliced insight data.

Returns:

status_check_job – Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)

Start a job to build training predictions

Parameters:
  • data_subset (str) –

    data set definition to build predictions on. Choices are:

    • dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for

      models in datetime partitioned projects

    • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for

      all data except training set. Not valid for models in datetime partitioned projects

    • dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only

    • dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading

      the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.

  • explanation_algorithm (dr.enums.EXPLANATIONS_ALGORITHM) – (New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).

  • max_explanations (int) – (New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the max_explanations, the sum of remaining values will also be returned as shap_remaining_total. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored if explanation_algorithm is not set.

Returns:

an instance of created async job

Return type:

Job

retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)

Submit a job to the queue to train a blender model.

Parameters:
  • sample_pct (Optional[float]) – The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.

  • featurelist_id (Optional[str]) – The featurelist id

  • training_row_count (Optional[int]) – The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.

  • n_clusters (Optional[int]) – (new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.

Returns:

job – The created job that is retraining the model

Return type:

ModelJob

set_prediction_threshold(threshold)

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters:

threshold (float) – only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:

Session for setting up and running Advanced Tuning on a model

Return type:

AdvancedTuningSession

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Note

For datetime partitioned projects, see train_datetime instead.

Parameters:
  • sample_pct (Optional[float]) – The amount of data to use for training, as a percentage of the project dataset from 0 to 100.

  • featurelist_id (Optional[str]) – The identifier of the featurelist to use. If not defined, the featurelist of this model is used.

  • scoring_type (Optional[str]) – Either validation or crossValidation (also dr.SCORING_TYPE.validation or dr.SCORING_TYPE.cross_validation). validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, crossValidation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.

  • training_row_count (Optional[int]) – The number of rows to use to train the requested model.

  • monotonic_increasing_featurelist_id (str) – (new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

  • monotonic_decreasing_featurelist_id (str) – (new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:

model_job_id – id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Return type:

str

Examples

project = Project.get('project-id')
model = Model.get('project-id', 'model-id')
model_job_id = model.train(training_row_count=project.max_train_rows)
train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)

Trains this model on a different featurelist or sample size.

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
  • featurelist_id (Optional[str]) – the featurelist to use to train the model. If not specified, the featurelist of this model is used.

  • training_row_count (Optional[int]) – the number of rows of data that should be used to train the model. If specified, neither training_duration nor use_project_settings may be specified.

  • training_duration (Optional[str]) – a duration string specifying what time range the data used to train the model should span. If specified, neither training_row_count nor use_project_settings may be specified.

  • use_project_settings (Optional[bool]) – (New in version v2.20) defaults to False. If True, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neither training_row_count nor training_duration may be specified.

  • time_window_sample_pct (Optional[int]) – may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

  • sampling_method (Optional[str]) – (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

  • monotonic_increasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

  • monotonic_decreasing_featurelist_id (Optional[str]) – (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

  • n_clusters (Optional[int]) – (New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.

Returns:

job – the created job to build the model

Return type:

ModelJob

train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters:
  • data_stage_id (str) – The id of the data stage to use for training.

  • training_data_name (Optional[str]) – The name of the iteration or data stage to indicate what the incremental learning was performed on.

  • data_stage_encoding (Optional[str]) – The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252

  • data_stage_encoding – The delimiter used by the data in the data stage (default: ‘,’).

  • data_stage_compression (Optional[str]) – The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip

Returns:

job – The created job that is retraining the model

Return type:

ModelJob

unstar_model()

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

Combined models

See API reference for Combined Model in Segmented Modeling API Reference