Models

GenericModel

class datarobot.models.GenericModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, is_starred=None, model_family=None, model_number=None, parent_model_id=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, is_trained_into_validation=None, is_trained_into_holdout=None, number_of_clusters=None)

GenericModel [ModelRecord] is the object which is returned from /modelRecords list route. Contains most generic model information.

Model

class datarobot.models.Model(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, is_n_clusters_dynamically_determined=None, blueprint_id=None, metrics=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, n_clusters=None, has_empty_clusters=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)

A model trained on a project’s dataset capable of making predictions.

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. See datetime partitioned project documentation for more information on duration strings.

Attributes:
idstr

ID of the model.

project_idstr

ID of the project the model belongs to.

processeslist of str

Processes used by the model.

featurelist_namestr

Name of the featurelist used by the model.

featurelist_idstr

ID of the featurelist used by the model.

sample_pctfloat or None

Percentage of the project dataset used in model training. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date / training_end_date instead.

training_row_countint or None

Number of rows of the project dataset used in model training. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date is used for training_row_count.

training_durationstr or None

For datetime partitioned projects only. If specified, defines the duration spanned by the data used to train the model and evaluate backtest scores.

training_start_datedatetime or None

For frozen models in datetime partitioned projects only. If specified, the start date of the data used to train the model.

training_end_datedatetime or None

For frozen models in datetime partitioned projects only. If specified, the end date of the data used to train the model.

model_typestr

Type of model, for example ‘Nystroem Kernel SVM Regressor’.

model_categorystr

Category of model, for example ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models.

is_frozenbool

Whether this model is a frozen model.

is_n_clusters_dynamically_determinedbool

(New in version v2.27) Optional. Whether this model determines the number of clusters dynamically.

blueprint_idstr

ID of the blueprint used to build this model.

metricsdict

Mapping from each metric to the model’s score for that metric.

monotonic_increasing_featurelist_idstr

Optional. ID of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

monotonic_decreasing_featurelist_idstr

Optional. ID of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

n_clustersint

(New in version v2.27) Optional. Number of data clusters discovered by model.

has_empty_clusters: bool

(New in version v2.27) Optional. Whether clustering model produces empty clusters.

supports_monotonic_constraintsbool

Optional. Whether this model supports enforcing monotonic constraints.

is_starredbool

Whether this model is marked as a starred model.

prediction_thresholdfloat

Binary classification projects only. Threshold used for predictions.

prediction_threshold_read_onlybool

Whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

model_numberinteger

Model number assigned to the model.

parent_model_idstr or None

(New in version v2.20) ID of the model that tuning parameters are derived from.

supports_composable_mlbool or None

(New in version v2.26) Whether this model is supported Composable ML.

classmethod get(project, model_id)

Retrieve a specific model.

Parameters:
projectstr

Project ID.

model_idstr

ID of the model to retrieve.

Returns:
modelModel

Queried instance.

Raises:
ValueError

passed project parameter value is of not supported type

Return type:

Model

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters:
paramsdict

Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

descriptionstr

Human-readable string describing the newly advanced-tuned model

Returns:
ModelJob

The created job to build the model

Return type:

ModelJob

cross_validate()

Run cross validation on the model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:
ModelJob

The created job to build the model

delete()

Delete a model from the project’s leaderboard.

Return type:

None

download_scoring_code(file_name, source_code=False)

Download the Scoring Code JAR.

Parameters:
file_namestr

File path where scoring code will be saved.

source_codebool, optional

Set to True to download source code archive. It will not be executable.

Return type:

None

download_training_artifact(file_name)

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters:
file_namestr

File path where trained model artifact(s) will be saved.

classmethod from_data(data)

Instantiate an object of this class using a dict.

Parameters:
datadict

Correctly snake_cased keys and their values.

Return type:

TypeVar(T, bound= APIObject)

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:
dict

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys

  • parameter_name : (str) name of the parameter (unique per task, see below)

  • parameter_id : (str) opaque ID string uniquely identifying parameter

  • default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)

  • current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.

  • task_name : (str) name of the task that this parameter belongs to

  • constraints: (dict) see the notes below

  • vertex_id: (str) ID of vertex that this parameter belongs to

Return type:

AdvancedTuningParamsType

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.

  • ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.

  • unicode: The parameter may be any Python unicode object.

  • int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].

  • float: The value may be an object of type float within the specified range (inclusive).

  • intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)

Retrieve a list of all confusion matrices available for the model.

Parameters:
fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of ConfusionChart

Data for all available confusion charts for model.

get_all_feature_impacts(data_slice_filter=None)

Retrieve a list of all feature impact results available for the model.

Parameters:
data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.

Returns:
list of dicts

Data for all available model feature impacts. Or an empty list if not data found.

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()
get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all Lift charts available for the model.

Parameters:
fallback_to_parent_insightsbool, optional

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

data_slice_filterDataSlice, optional

Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:
list of LiftChart

Data for all available model lift charts. Or an empty list if no data found.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()
get_all_multiclass_lift_charts(fallback_to_parent_insights=False)

Retrieve a list of all Lift charts available for the model.

Parameters:
fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of LiftChart

Data for all available model lift charts.

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all residuals charts available for the model.

Parameters:
fallback_to_parent_insightsbool

Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

data_slice_filterDataSlice, optional

Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:
list of ResidualsChart

Data for all available model residuals charts.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()
get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all ROC curves available for the model.

Parameters:
fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

data_slice_filterDataSlice, optional

filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:
list of RocCurve

Data for all available model ROC curves. Or an empty list if no RocCurves are found.

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()
get_confusion_chart(source, fallback_to_parent_insights=False)

Retrieve a multiclass model’s confusion matrix for the specified source.

Parameters:
sourcestr

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
ConfusionChart

Model ConfusionChart data

Raises:
ClientError

If the insight is not available for this model

get_cross_class_accuracy_scores()

Retrieves a list of Cross Class Accuracy scores for the model.

Returns:
json
get_cross_validation_scores(partition=None, metric=None)

Return a dictionary, keyed by metric, showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:
partitionfloat

optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.

metric: unicode

optional name of the metric to filter to resulting cross validation scores by

Returns:
cross_validation_scores: dict

A dictionary keyed by metric showing cross validation scores per partition.

get_data_disparity_insights(feature, class_name1, class_name2)

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters:
featurestr

Bias and Fairness protected feature name.

class_name1str

One of the compared classes

class_name2str

Another compared class

Returns:
json
get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)

Retrieve a list of Per Class Bias insights for the model.

Parameters:
fairness_metrics_setstr, optional

Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

offsetint, optional

Number of items to skip.

limitint, optional

Number of items to return.

Returns:
json
get_feature_effect(source, data_slice_id=None)

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
sourcestring

The source Feature Effects are retrieved for.

data_slice_idstring, optional

ID for the data slice used in the request. If None, retrieve unsliced insight data.

Returns:
feature_effectsFeatureEffects

The feature effects data.

Raises:
ClientError (404)

If the feature effects have not been computed or source is not valid value.

get_feature_effect_metadata()

Retrieve Feature Effects metadata. Response contains status and available model sources.

  • Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.

  • When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.

  • Feature Effects for holdout is not available when holdout was not unlocked for the project.

Use source to retrieve Feature Effects, selecting one of the provided sources.

Returns:
feature_effect_metadata: FeatureEffectMetadata
get_feature_effects_multiclass(source='training', class_=None)

Retrieve Feature Effects for the multiclass model.

Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
sourcestr

The source Feature Effects are retrieved for.

class_str or None

The class name Feature Effects are retrieved for.

Returns:
list

The list of multiclass feature effects.

Raises:
ClientError (404)

If Feature Effects have not been computed or source is not valid value.

get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters:
with_metadatabool

The flag indicating if the result should include the metadata as well.

data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.

Returns:
list or dict

The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:

  • featureImpacts - Feature Impact data as a dictionary. Each item is a dict with

    keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.

  • shapBased - A boolean that indicates whether Feature Impact was calculated using

    Shapley values.

  • ranRedundancyDetection - A boolean that indicates whether redundant feature

    identification was run while calculating this Feature Impact.

  • rowCount - An integer or None that indicates the number of rows that was used to

    calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.

  • count - An integer with the number of features under the featureImpacts.

Raises:
ClientError (404)

If the feature impacts have not been computed.

ValueError

If data_slice_filter passed as None

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:
featureslist of str

The names of the features used in the model.

Return type:

List[str]

get_frozen_child_models()

Retrieve the IDs for all models that are frozen from this model.

Returns:
A list of Models
get_labelwise_roc_curves(source, fallback_to_parent_insights=False)

Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .

Added in version v2.24.

Parameters:
sourcestr

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns:
list of LabelwiseRocCurve

Labelwise ROC Curve instances for source and all labels

Raises:
ClientError

If the insight is not available for this model

get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the model Lift chart for the specified source.

Parameters:
sourcestr

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.

fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

Returns:
LiftChart

Model lift chart data

Raises:
ClientError

If the insight is not available for this model

ValueError

If data_slice_filter passed as None

get_missing_report_info()

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns:
An iterable of MissingReportPerFeature

The queried model missing report, sorted by missing count (DESCENDING order).

get_model_blueprint_chart()

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns:
ModelBlueprintChart

The queried model blueprint chart.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:
list of BlueprintTaskDocument

All documents available for the model.

get_model_blueprint_json()

Get the blueprint json representation used by this model.

Returns:
BlueprintJson

Json representation of the blueprint stages.

Return type:

Dict[str, Tuple[List[str], List[str], str]]

get_multiclass_feature_impact()

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:
feature_impactslist of dict

The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:
ClientError (404)

If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False)

Retrieve model Lift chart for the specified source.

Parameters:
sourcestr

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
list of LiftChart

Model lift chart data for each saved target class

Raises:
ClientError

If the insight is not available for this model

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)

Retrieve model Lift charts for the specified source.

Added in version v2.24.

Parameters:
sourcestr

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
list of LiftChart

Model lift chart data for each saved target class

Raises:
ClientError

If the insight is not available for this model

get_num_iterations_trained()

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns:
projectId: str

id of project containing the model

modelId: str

id of the model

data: array

list of numEstimatorsItem objects, one for each modeling stage.

numEstimatorsItem will be of the form:
stage: str

indicates the modeling stage (for multi-stage models); None of single-stage models

numIterations: int

the number of estimators or iterations trained by the model

get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)

Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source.

Parameters:
sourcestring

The source Feature Effects are retrieved for.

max_waitint, optional

The maximum time to wait for a requested Feature Effect job to complete before erroring.

row_countint, optional

(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

data_slice_idstr, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
feature_effectsFeatureEffects

The Feature Effects data.

get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)

Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.

Parameters:
sourcestring

The source Feature Effects retrieve for.

class_str or None

The class name Feature Effects retrieve for.

row_countint

The number of rows from dataset to use for Feature Impact calculation.

top_n_featuresint or None

Number of top features (ranked by Feature Impact) used to calculate Feature Effects.

featureslist or None

The list of features used to calculate Feature Effects.

max_waitint, optional

The maximum time to wait for a requested Feature Effects job to complete before erroring.

Returns:
feature_effectslist of FeatureEffectsMulticlass

The list of multiclass feature effects data.

get_or_request_feature_impact(max_wait=600, **kwargs)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:
max_waitint, optional

The maximum time to wait for a requested feature impact job to complete before erroring

**kwargs

Arbitrary keyword arguments passed to request_feature_impact.

Returns:
feature_impactslist or dict

The feature impact data. See get_feature_impact for the exact schema.

get_parameters()

Retrieve model parameters.

Returns:
ModelParameters

Model parameters for this model.

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:
ParetoFront

Model ParetoFront data

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:
prime_eligibilitydict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve model residuals chart for the specified source.

Parameters:
sourcestr

Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.

data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.

Returns:
ResidualsChart

Model residuals chart data

Raises:
ClientError

If the insight is not available for this model

ValueError

If data_slice_filter passed as None

get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.

Parameters:
sourcestr

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.

fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns:
RocCurve

Model ROC curve data

Raises:
ClientError

If the insight is not available for this model

(New in version v3.0) TypeError

If the underlying project type is multilabel

ValueError

If data_slice_filter passed as None

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:
rulesetslist of Ruleset
Return type:

List[Ruleset]

get_supported_capabilities()

Retrieves a summary of the capabilities supported by a model.

Added in version v2.14.

Returns:
supportsBlending: bool

whether the model supports blending

supportsMonotonicConstraints: bool

whether the model supports monotonic constraints

hasWordCloud: bool

whether the model has word cloud data available

eligibleForPrime: bool

(Deprecated in version v3.6) whether the model is eligible for Prime

hasParameters: bool

whether the model has parameters that can be retrieved

supportsCodeGeneration: bool

(New in version v2.18) whether the model supports code generation

supportsShap: bool
(New in version v2.18) True if the model supports Shapley package. i.e. Shapley based

feature Importance

supportsEarlyStopping: bool

(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_uri()
Returns:
urlstr

Permanent static hyperlink to this model at leaderboard.

Return type:

str

get_word_cloud(exclude_stop_words=False)

Retrieve word cloud data for the model.

Parameters:
exclude_stop_wordsbool, optional

Set to True if you want stopwords filtered out of response.

Returns:
WordCloud

Word cloud data for the model.

incremental_train(data_stage_id, training_data_name=None)

Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.

Return type:

ModelJob

classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)

Retrieve paginated model records, sorted by scores, with optional filtering.

Parameters:
sort_by_partition: str, one of `validation`, `backtesting`, `crossValidation` or `holdout`

Set the partition to use for sorted (by score) list of models. validation is the default.

sort_by_metric: str

Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.

with_metric: str

For a single-metric list of results, specify that project metric.

search_term: str

If specified, only models containing the term in their name or processes are returned.

featurelists: list of str

If specified, only models trained on selected featurelists are returned.

families: list of str

If specified, only models belonging to selected families are returned.

blueprints: list of str

If specified, only models trained on specified blueprint IDs are returned.

labels: list of str, `starred` or `prepared for deployment`

If specified, only models tagged with all listed labels are returned.

characteristics: list of str

If specified, only models matching all listed characteristics are returned.

training_filters: list of str

If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings

number_of_clusters: list of int

Filter models by number of clusters. Applicable only in unsupervised clustering projects.

limit: int
offset: int
Returns:
generic_models: list of GenericModel
Return type:

List[GenericModel]

open_in_browser()

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type:

None

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:
jobJob

the job generating the rulesets

request_cross_class_accuracy_scores()

Request data disparity insights to be computed for the model.

Returns:
status_idstr

A statusId of computation request.

request_data_disparity_insights(feature, compared_class_names)

Request data disparity insights to be computed for the model.

Parameters:
featurestr

Bias and Fairness protected feature name.

compared_class_nameslist(str)

List of two classes to compare

Returns:
status_idstr

A statusId of computation request.

request_external_test(dataset_id, actual_value_column=None)

Request external test to compute scores and insights on an external test dataset

Parameters:
dataset_idstring

The dataset to make predictions against (as uploaded from Project.upload_dataset)

actual_value_columnstring, optional

(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

Returns
——-
jobJob

a Job representing external dataset insights computation

request_fairness_insights(fairness_metrics_set=None)

Request fairness insights to be computed for the model.

Parameters:
fairness_metrics_setstr, optional

Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

Returns:
status_idstr

A statusId of computation request.

Return type:

str

request_feature_effect(row_count=None, data_slice_id=None)

Submit request to compute Feature Effects for the model.

See get_feature_effect for more information on the result of the job.

Parameters:
row_countint

(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

data_slice_idstr, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
jobJob

A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature effect have already been requested.

request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)

Request Feature Effects computation for the multiclass model.

See get_feature_effect for more information on the result of the job.

Parameters:
row_countint

The number of rows from dataset to use for Feature Impact calculation.

top_n_featuresint or None

Number of top features (ranked by feature impact) used to calculate Feature Effects.

featureslist or None

The list of features used to calculate Feature Effects.

Returns:
jobJob

A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.

request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters:
row_countint, optional

The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.

with_metadatabool, optional

Flag indicating whether the result should include the metadata. If true, metadata is included.

data_slice_idstr, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
jobJob or status_id

Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)

Train a new frozen model with parameters from this model.

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
training_row_countint, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_durationstr, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

training_start_datedatetime.datetime, optional

the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

training_end_datedatetime.datetime, optional

the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

time_window_sample_pctint, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

sampling_methodstr, optional

(New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

Returns:
model_jobModelJob

the modeling job training a frozen model

Return type:

ModelJob

request_frozen_model(sample_pct=None, training_row_count=None)

Train a new frozen model with parameters from this model :rtype: ModelJob

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:
sample_pctfloat

optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.

training_row_countint

(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns:
model_jobModelJob

the modeling job training a frozen model

request_lift_chart(source, data_slice_id=None)

Request the model Lift Chart for the specified source.

Parameters:
sourcestr

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

data_slice_idstring, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
status_check_jobStatusCheckJob

Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_per_class_fairness_insights(fairness_metrics_set=None)

Request per-class fairness insights be computed for the model.

Parameters:
fairness_metrics_setstr, optional

The fairness metric used to calculate the fairness scores. Value can be any one of <datarobot.enums.FairnessMetricsSet>.

Returns:
status_check_jobStatusCheckJob

The returned object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)

Requests predictions against a previously uploaded dataset.

Parameters:
dataset_idstring, optional

The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)

datasetDataset, optional

The dataset to make predictions against (as uploaded from Project.upload_dataset)

dataframepd.DataFrame, optional

(New in v3.0) The dataframe to make predictions against

file_pathstr, optional

(New in v3.0) Path to file to make predictions against

fileIOBase, optional

(New in v3.0) File to make predictions against

include_prediction_intervalsbool, optional

(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.

prediction_intervals_sizeint, optional

(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).

forecast_pointdatetime.datetime or None, optional

(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.

predictions_start_datedatetime.datetime or None, optional

(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.

predictions_end_datedatetime.datetime or None, optional

(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.

actual_value_columnstring, optional

(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the

response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).

max_explanations: (New in version v2.21) int optional; specifies the maximum number of

explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.

max_ngram_explanations: optional; int or str

(New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.

Returns:
jobPredictJob

The job computing the predictions

Return type:

PredictJob

request_residuals_chart(source, data_slice_id=None)

Request the model residuals chart for the specified source.

Parameters:
sourcestr

Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

data_slice_idstring, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
status_check_jobStatusCheckJob

Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_roc_curve(source, data_slice_id=None)

Request the model Roc Curve for the specified source.

Parameters:
sourcestr

Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

data_slice_idstring, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
status_check_jobStatusCheckJob

Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)

Start a job to build training predictions

Parameters:
data_subsetstr

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for

    models in datetime partitioned projects

  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for

    all data except training set. Not valid for models in datetime partitioned projects

  • dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only

  • dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading

    the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.

explanation_algorithmdr.enums.EXPLANATIONS_ALGORITHM

(New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).

max_explanationsint

(New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the max_explanations, the sum of remaining values will also be returned as shap_remaining_total. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored if explanation_algorithm is not set.

Returns:
Job

an instance of created async job

retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)

Submit a job to the queue to train a blender model.

Parameters:
sample_pct: float, optional

The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.

featurelist_idstr, optional

The featurelist id

training_row_countint, optional

The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.

n_clusters: int, optional

(new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.

Returns:
jobModelJob

The created job that is retraining the model

Return type:

ModelJob

set_prediction_threshold(threshold)

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters:
thresholdfloat

only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:
AdvancedTuningSession

Session for setting up and running Advanced Tuning on a model

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class. :rtype: str

Note

For datetime partitioned projects, see train_datetime instead.

Parameters:
sample_pctfloat, optional

The amount of data to use for training, as a percentage of the project dataset from 0 to 100.

featurelist_idstr, optional

The identifier of the featurelist to use. If not defined, the featurelist of this model is used.

scoring_typestr, optional

Either validation or crossValidation (also dr.SCORING_TYPE.validation or dr.SCORING_TYPE.cross_validation). validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, crossValidation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.

training_row_countint, optional

The number of rows to use to train the requested model.

monotonic_increasing_featurelist_idstr

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_idstr

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:
model_job_idstr

id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Examples

project = Project.get('project-id')
model = Model.get('project-id', 'model-id')
model_job_id = model.train(training_row_count=project.max_train_rows)
train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)

Trains this model on a different featurelist or sample size.

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
featurelist_idstr, optional

the featurelist to use to train the model. If not specified, the featurelist of this model is used.

training_row_countint, optional

the number of rows of data that should be used to train the model. If specified, neither training_duration nor use_project_settings may be specified.

training_durationstr, optional

a duration string specifying what time range the data used to train the model should span. If specified, neither training_row_count nor use_project_settings may be specified.

use_project_settingsbool, optional

(New in version v2.20) defaults to False. If True, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neither training_row_count nor training_duration may be specified.

time_window_sample_pctint, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

sampling_methodstr, optional

(New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

monotonic_increasing_featurelist_idstr, optional

(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_idstr, optional

(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

n_clusters: int, optional

(New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.

Returns:
jobModelJob

the created job to build the model

Return type:

ModelJob

train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters:
data_stage_id: str

The id of the data stage to use for training.

training_data_namestr, optional

The name of the iteration or data stage to indicate what the incremental learning was performed on.

data_stage_encodingstr, optional

The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252

data_stage_encodingstr, optional

The delimiter used by the data in the data stage (default: ‘,’).

data_stage_compressionstr, optional

The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip

Returns:
jobModelJob

The created job that is retraining the model

unstar_model()

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

class datarobot.models.model.AdvancedTuningParamsType(*args, **kwargs)
class datarobot.models.model.BiasMitigationFeatureInfo(messages)

PrimeModel

class datarobot.models.PrimeModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, ruleset_id=None, rule_count=None, score=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)

Represents a DataRobot Prime model approximating a parent model with downloadable code.

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Attributes:
idstr

the id of the model

project_idstr

the id of the project the model belongs to

processeslist of str

the processes used by the model

featurelist_namestr

the name of the featurelist used by the model

featurelist_idstr

the id of the featurelist used by the model

sample_pctfloat

the percentage of the project dataset used in training the model

training_row_countint or None

the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.

training_durationstr or None

only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

training_start_datedatetime or None

only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

training_end_datedatetime or None

only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

model_typestr

what model this is, e.g. ‘DataRobot Prime’

model_categorystr

what kind of model this is - always ‘prime’ for DataRobot Prime models

is_frozenbool

whether this model is a frozen model

blueprint_idstr

the id of the blueprint used in this model

metricsdict

a mapping from each metric to the model’s scores for that metric

rulesetRuleset

the ruleset used in the Prime model

parent_model_idstr

the id of the model that this Prime model approximates

monotonic_increasing_featurelist_idstr

optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

monotonic_decreasing_featurelist_idstr

optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

supports_monotonic_constraintsbool

optional, whether this model supports enforcing monotonic constraints

is_starredbool

whether this model is marked as starred

prediction_thresholdfloat

for binary classification projects, the threshold used for predictions

prediction_threshold_read_onlybool

indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

supports_composable_mlbool or None

(New in version v2.26) whether this model is supported in the Composable ML.

classmethod get(project_id, model_id)

Retrieve a specific prime model.

Parameters:
project_idstr

The id of the project the prime model belongs to

model_idstr

The model_id of the prime model to retrieve.

Returns:
modelPrimeModel

The queried instance.

request_download_validation(language)

Prep and validate the downloadable code for the ruleset associated with this model.

Parameters:
languagestr

the language the code should be downloaded in - see datarobot.enums.PRIME_LANGUAGE for available languages

Returns:
jobJob

A job tracking the code preparation and validation

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters:
paramsdict

Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

descriptionstr

Human-readable string describing the newly advanced-tuned model

Returns:
ModelJob

The created job to build the model

Return type:

ModelJob

cross_validate()

Run cross validation on the model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:
ModelJob

The created job to build the model

delete()

Delete a model from the project’s leaderboard.

Return type:

None

download_scoring_code(file_name, source_code=False)

Download the Scoring Code JAR.

Parameters:
file_namestr

File path where scoring code will be saved.

source_codebool, optional

Set to True to download source code archive. It will not be executable.

Return type:

None

download_training_artifact(file_name)

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters:
file_namestr

File path where trained model artifact(s) will be saved.

classmethod from_data(data)

Instantiate an object of this class using a dict.

Parameters:
datadict

Correctly snake_cased keys and their values.

Return type:

TypeVar(T, bound= APIObject)

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:
dict

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys

  • parameter_name : (str) name of the parameter (unique per task, see below)

  • parameter_id : (str) opaque ID string uniquely identifying parameter

  • default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)

  • current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.

  • task_name : (str) name of the task that this parameter belongs to

  • constraints: (dict) see the notes below

  • vertex_id: (str) ID of vertex that this parameter belongs to

Return type:

AdvancedTuningParamsType

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.

  • ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.

  • unicode: The parameter may be any Python unicode object.

  • int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].

  • float: The value may be an object of type float within the specified range (inclusive).

  • intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)

Retrieve a list of all confusion matrices available for the model.

Parameters:
fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of ConfusionChart

Data for all available confusion charts for model.

get_all_feature_impacts(data_slice_filter=None)

Retrieve a list of all feature impact results available for the model.

Parameters:
data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.

Returns:
list of dicts

Data for all available model feature impacts. Or an empty list if not data found.

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()
get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all Lift charts available for the model.

Parameters:
fallback_to_parent_insightsbool, optional

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

data_slice_filterDataSlice, optional

Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:
list of LiftChart

Data for all available model lift charts. Or an empty list if no data found.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()
get_all_multiclass_lift_charts(fallback_to_parent_insights=False)

Retrieve a list of all Lift charts available for the model.

Parameters:
fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of LiftChart

Data for all available model lift charts.

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all residuals charts available for the model.

Parameters:
fallback_to_parent_insightsbool

Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

data_slice_filterDataSlice, optional

Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:
list of ResidualsChart

Data for all available model residuals charts.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()
get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all ROC curves available for the model.

Parameters:
fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

data_slice_filterDataSlice, optional

filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:
list of RocCurve

Data for all available model ROC curves. Or an empty list if no RocCurves are found.

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()
get_confusion_chart(source, fallback_to_parent_insights=False)

Retrieve a multiclass model’s confusion matrix for the specified source.

Parameters:
sourcestr

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
ConfusionChart

Model ConfusionChart data

Raises:
ClientError

If the insight is not available for this model

get_cross_class_accuracy_scores()

Retrieves a list of Cross Class Accuracy scores for the model.

Returns:
json
get_cross_validation_scores(partition=None, metric=None)

Return a dictionary, keyed by metric, showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:
partitionfloat

optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.

metric: unicode

optional name of the metric to filter to resulting cross validation scores by

Returns:
cross_validation_scores: dict

A dictionary keyed by metric showing cross validation scores per partition.

get_data_disparity_insights(feature, class_name1, class_name2)

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters:
featurestr

Bias and Fairness protected feature name.

class_name1str

One of the compared classes

class_name2str

Another compared class

Returns:
json
get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)

Retrieve a list of Per Class Bias insights for the model.

Parameters:
fairness_metrics_setstr, optional

Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

offsetint, optional

Number of items to skip.

limitint, optional

Number of items to return.

Returns:
json
get_feature_effect(source, data_slice_id=None)

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
sourcestring

The source Feature Effects are retrieved for.

data_slice_idstring, optional

ID for the data slice used in the request. If None, retrieve unsliced insight data.

Returns:
feature_effectsFeatureEffects

The feature effects data.

Raises:
ClientError (404)

If the feature effects have not been computed or source is not valid value.

get_feature_effect_metadata()

Retrieve Feature Effects metadata. Response contains status and available model sources.

  • Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.

  • When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.

  • Feature Effects for holdout is not available when holdout was not unlocked for the project.

Use source to retrieve Feature Effects, selecting one of the provided sources.

Returns:
feature_effect_metadata: FeatureEffectMetadata
get_feature_effects_multiclass(source='training', class_=None)

Retrieve Feature Effects for the multiclass model.

Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
sourcestr

The source Feature Effects are retrieved for.

class_str or None

The class name Feature Effects are retrieved for.

Returns:
list

The list of multiclass feature effects.

Raises:
ClientError (404)

If Feature Effects have not been computed or source is not valid value.

get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters:
with_metadatabool

The flag indicating if the result should include the metadata as well.

data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.

Returns:
list or dict

The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:

  • featureImpacts - Feature Impact data as a dictionary. Each item is a dict with

    keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.

  • shapBased - A boolean that indicates whether Feature Impact was calculated using

    Shapley values.

  • ranRedundancyDetection - A boolean that indicates whether redundant feature

    identification was run while calculating this Feature Impact.

  • rowCount - An integer or None that indicates the number of rows that was used to

    calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.

  • count - An integer with the number of features under the featureImpacts.

Raises:
ClientError (404)

If the feature impacts have not been computed.

ValueError

If data_slice_filter passed as None

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:
featureslist of str

The names of the features used in the model.

Return type:

List[str]

get_frozen_child_models()

Retrieve the IDs for all models that are frozen from this model.

Returns:
A list of Models
get_labelwise_roc_curves(source, fallback_to_parent_insights=False)

Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .

Added in version v2.24.

Parameters:
sourcestr

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns:
list of LabelwiseRocCurve

Labelwise ROC Curve instances for source and all labels

Raises:
ClientError

If the insight is not available for this model

get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the model Lift chart for the specified source.

Parameters:
sourcestr

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.

fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

Returns:
LiftChart

Model lift chart data

Raises:
ClientError

If the insight is not available for this model

ValueError

If data_slice_filter passed as None

get_missing_report_info()

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns:
An iterable of MissingReportPerFeature

The queried model missing report, sorted by missing count (DESCENDING order).

get_model_blueprint_chart()

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns:
ModelBlueprintChart

The queried model blueprint chart.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:
list of BlueprintTaskDocument

All documents available for the model.

get_model_blueprint_json()

Get the blueprint json representation used by this model.

Returns:
BlueprintJson

Json representation of the blueprint stages.

Return type:

Dict[str, Tuple[List[str], List[str], str]]

get_multiclass_feature_impact()

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:
feature_impactslist of dict

The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:
ClientError (404)

If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False)

Retrieve model Lift chart for the specified source.

Parameters:
sourcestr

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
list of LiftChart

Model lift chart data for each saved target class

Raises:
ClientError

If the insight is not available for this model

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)

Retrieve model Lift charts for the specified source.

Added in version v2.24.

Parameters:
sourcestr

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
list of LiftChart

Model lift chart data for each saved target class

Raises:
ClientError

If the insight is not available for this model

get_num_iterations_trained()

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns:
projectId: str

id of project containing the model

modelId: str

id of the model

data: array

list of numEstimatorsItem objects, one for each modeling stage.

numEstimatorsItem will be of the form:
stage: str

indicates the modeling stage (for multi-stage models); None of single-stage models

numIterations: int

the number of estimators or iterations trained by the model

get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)

Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source.

Parameters:
sourcestring

The source Feature Effects are retrieved for.

max_waitint, optional

The maximum time to wait for a requested Feature Effect job to complete before erroring.

row_countint, optional

(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

data_slice_idstr, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
feature_effectsFeatureEffects

The Feature Effects data.

get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)

Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.

Parameters:
sourcestring

The source Feature Effects retrieve for.

class_str or None

The class name Feature Effects retrieve for.

row_countint

The number of rows from dataset to use for Feature Impact calculation.

top_n_featuresint or None

Number of top features (ranked by Feature Impact) used to calculate Feature Effects.

featureslist or None

The list of features used to calculate Feature Effects.

max_waitint, optional

The maximum time to wait for a requested Feature Effects job to complete before erroring.

Returns:
feature_effectslist of FeatureEffectsMulticlass

The list of multiclass feature effects data.

get_or_request_feature_impact(max_wait=600, **kwargs)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:
max_waitint, optional

The maximum time to wait for a requested feature impact job to complete before erroring

**kwargs

Arbitrary keyword arguments passed to request_feature_impact.

Returns:
feature_impactslist or dict

The feature impact data. See get_feature_impact for the exact schema.

get_parameters()

Retrieve model parameters.

Returns:
ModelParameters

Model parameters for this model.

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:
ParetoFront

Model ParetoFront data

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:
prime_eligibilitydict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve model residuals chart for the specified source.

Parameters:
sourcestr

Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.

data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.

Returns:
ResidualsChart

Model residuals chart data

Raises:
ClientError

If the insight is not available for this model

ValueError

If data_slice_filter passed as None

get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.

Parameters:
sourcestr

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.

fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns:
RocCurve

Model ROC curve data

Raises:
ClientError

If the insight is not available for this model

(New in version v3.0) TypeError

If the underlying project type is multilabel

ValueError

If data_slice_filter passed as None

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:
rulesetslist of Ruleset
Return type:

List[Ruleset]

get_supported_capabilities()

Retrieves a summary of the capabilities supported by a model.

Added in version v2.14.

Returns:
supportsBlending: bool

whether the model supports blending

supportsMonotonicConstraints: bool

whether the model supports monotonic constraints

hasWordCloud: bool

whether the model has word cloud data available

eligibleForPrime: bool

(Deprecated in version v3.6) whether the model is eligible for Prime

hasParameters: bool

whether the model has parameters that can be retrieved

supportsCodeGeneration: bool

(New in version v2.18) whether the model supports code generation

supportsShap: bool
(New in version v2.18) True if the model supports Shapley package. i.e. Shapley based

feature Importance

supportsEarlyStopping: bool

(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_uri()
Returns:
urlstr

Permanent static hyperlink to this model at leaderboard.

Return type:

str

get_word_cloud(exclude_stop_words=False)

Retrieve word cloud data for the model.

Parameters:
exclude_stop_wordsbool, optional

Set to True if you want stopwords filtered out of response.

Returns:
WordCloud

Word cloud data for the model.

incremental_train(data_stage_id, training_data_name=None)

Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.

Return type:

ModelJob

classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)

Retrieve paginated model records, sorted by scores, with optional filtering.

Parameters:
sort_by_partition: str, one of `validation`, `backtesting`, `crossValidation` or `holdout`

Set the partition to use for sorted (by score) list of models. validation is the default.

sort_by_metric: str

Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.

with_metric: str

For a single-metric list of results, specify that project metric.

search_term: str

If specified, only models containing the term in their name or processes are returned.

featurelists: list of str

If specified, only models trained on selected featurelists are returned.

families: list of str

If specified, only models belonging to selected families are returned.

blueprints: list of str

If specified, only models trained on specified blueprint IDs are returned.

labels: list of str, `starred` or `prepared for deployment`

If specified, only models tagged with all listed labels are returned.

characteristics: list of str

If specified, only models matching all listed characteristics are returned.

training_filters: list of str

If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings

number_of_clusters: list of int

Filter models by number of clusters. Applicable only in unsupervised clustering projects.

limit: int
offset: int
Returns:
generic_models: list of GenericModel
Return type:

List[GenericModel]

open_in_browser()

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type:

None

request_cross_class_accuracy_scores()

Request data disparity insights to be computed for the model.

Returns:
status_idstr

A statusId of computation request.

request_data_disparity_insights(feature, compared_class_names)

Request data disparity insights to be computed for the model.

Parameters:
featurestr

Bias and Fairness protected feature name.

compared_class_nameslist(str)

List of two classes to compare

Returns:
status_idstr

A statusId of computation request.

request_external_test(dataset_id, actual_value_column=None)

Request external test to compute scores and insights on an external test dataset

Parameters:
dataset_idstring

The dataset to make predictions against (as uploaded from Project.upload_dataset)

actual_value_columnstring, optional

(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

Returns
——-
jobJob

a Job representing external dataset insights computation

request_fairness_insights(fairness_metrics_set=None)

Request fairness insights to be computed for the model.

Parameters:
fairness_metrics_setstr, optional

Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

Returns:
status_idstr

A statusId of computation request.

Return type:

str

request_feature_effect(row_count=None, data_slice_id=None)

Submit request to compute Feature Effects for the model.

See get_feature_effect for more information on the result of the job.

Parameters:
row_countint

(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

data_slice_idstr, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
jobJob

A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature effect have already been requested.

request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)

Request Feature Effects computation for the multiclass model.

See get_feature_effect for more information on the result of the job.

Parameters:
row_countint

The number of rows from dataset to use for Feature Impact calculation.

top_n_featuresint or None

Number of top features (ranked by feature impact) used to calculate Feature Effects.

featureslist or None

The list of features used to calculate Feature Effects.

Returns:
jobJob

A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.

request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters:
row_countint, optional

The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.

with_metadatabool, optional

Flag indicating whether the result should include the metadata. If true, metadata is included.

data_slice_idstr, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
jobJob or status_id

Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature impacts have already been requested.

request_lift_chart(source, data_slice_id=None)

Request the model Lift Chart for the specified source.

Parameters:
sourcestr

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

data_slice_idstring, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
status_check_jobStatusCheckJob

Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_per_class_fairness_insights(fairness_metrics_set=None)

Request per-class fairness insights be computed for the model.

Parameters:
fairness_metrics_setstr, optional

The fairness metric used to calculate the fairness scores. Value can be any one of <datarobot.enums.FairnessMetricsSet>.

Returns:
status_check_jobStatusCheckJob

The returned object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)

Requests predictions against a previously uploaded dataset.

Parameters:
dataset_idstring, optional

The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)

datasetDataset, optional

The dataset to make predictions against (as uploaded from Project.upload_dataset)

dataframepd.DataFrame, optional

(New in v3.0) The dataframe to make predictions against

file_pathstr, optional

(New in v3.0) Path to file to make predictions against

fileIOBase, optional

(New in v3.0) File to make predictions against

include_prediction_intervalsbool, optional

(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.

prediction_intervals_sizeint, optional

(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).

forecast_pointdatetime.datetime or None, optional

(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.

predictions_start_datedatetime.datetime or None, optional

(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.

predictions_end_datedatetime.datetime or None, optional

(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.

actual_value_columnstring, optional

(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the

response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).

max_explanations: (New in version v2.21) int optional; specifies the maximum number of

explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.

max_ngram_explanations: optional; int or str

(New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.

Returns:
jobPredictJob

The job computing the predictions

Return type:

PredictJob

request_residuals_chart(source, data_slice_id=None)

Request the model residuals chart for the specified source.

Parameters:
sourcestr

Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

data_slice_idstring, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
status_check_jobStatusCheckJob

Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_roc_curve(source, data_slice_id=None)

Request the model Roc Curve for the specified source.

Parameters:
sourcestr

Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

data_slice_idstring, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
status_check_jobStatusCheckJob

Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)

Start a job to build training predictions

Parameters:
data_subsetstr

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for

    models in datetime partitioned projects

  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for

    all data except training set. Not valid for models in datetime partitioned projects

  • dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only

  • dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading

    the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.

explanation_algorithmdr.enums.EXPLANATIONS_ALGORITHM

(New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).

max_explanationsint

(New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the max_explanations, the sum of remaining values will also be returned as shap_remaining_total. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored if explanation_algorithm is not set.

Returns:
Job

an instance of created async job

retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)

Submit a job to the queue to train a blender model.

Parameters:
sample_pct: float, optional

The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.

featurelist_idstr, optional

The featurelist id

training_row_countint, optional

The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.

n_clusters: int, optional

(new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.

Returns:
jobModelJob

The created job that is retraining the model

Return type:

ModelJob

set_prediction_threshold(threshold)

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters:
thresholdfloat

only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:
AdvancedTuningSession

Session for setting up and running Advanced Tuning on a model

train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters:
data_stage_id: str

The id of the data stage to use for training.

training_data_namestr, optional

The name of the iteration or data stage to indicate what the incremental learning was performed on.

data_stage_encodingstr, optional

The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252

data_stage_encodingstr, optional

The delimiter used by the data in the data stage (default: ‘,’).

data_stage_compressionstr, optional

The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip

Returns:
jobModelJob

The created job that is retraining the model

unstar_model()

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

BlenderModel

class datarobot.models.BlenderModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, model_ids=None, blender_method=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)

Represents blender model that combines prediction results from other models.

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Attributes:
idstr

the id of the model

project_idstr

the id of the project the model belongs to

processeslist of str

the processes used by the model

featurelist_namestr

the name of the featurelist used by the model

featurelist_idstr

the id of the featurelist used by the model

sample_pctfloat

the percentage of the project dataset used in training the model

training_row_countint or None

the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.

training_durationstr or None

only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

training_start_datedatetime or None

only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

training_end_datedatetime or None

only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

model_typestr

what model this is, e.g. ‘DataRobot Prime’

model_categorystr

what kind of model this is - always ‘prime’ for DataRobot Prime models

is_frozenbool

whether this model is a frozen model

blueprint_idstr

the id of the blueprint used in this model

metricsdict

a mapping from each metric to the model’s scores for that metric

model_idslist of str

List of model ids used in blender

blender_methodstr

Method used to blend results from underlying models

monotonic_increasing_featurelist_idstr

optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

monotonic_decreasing_featurelist_idstr

optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

supports_monotonic_constraintsbool

optional, whether this model supports enforcing monotonic constraints

is_starredbool

whether this model marked as starred

prediction_thresholdfloat

for binary classification projects, the threshold used for predictions

prediction_threshold_read_onlybool

indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

model_numberinteger

model number assigned to a model

parent_model_idstr or None

(New in version v2.20) the id of the model that tuning parameters are derived from

supports_composable_mlbool or None

(New in version v2.26) whether this model is supported in the Composable ML.

classmethod get(project_id, model_id)

Retrieve a specific blender.

Parameters:
project_idstr

The project’s id.

model_idstr

The model_id of the leaderboard item to retrieve.

Returns:
modelBlenderModel

The queried instance.

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters:
paramsdict

Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

descriptionstr

Human-readable string describing the newly advanced-tuned model

Returns:
ModelJob

The created job to build the model

Return type:

ModelJob

cross_validate()

Run cross validation on the model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:
ModelJob

The created job to build the model

delete()

Delete a model from the project’s leaderboard.

Return type:

None

download_scoring_code(file_name, source_code=False)

Download the Scoring Code JAR.

Parameters:
file_namestr

File path where scoring code will be saved.

source_codebool, optional

Set to True to download source code archive. It will not be executable.

Return type:

None

download_training_artifact(file_name)

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters:
file_namestr

File path where trained model artifact(s) will be saved.

classmethod from_data(data)

Instantiate an object of this class using a dict.

Parameters:
datadict

Correctly snake_cased keys and their values.

Return type:

TypeVar(T, bound= APIObject)

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:
dict

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys

  • parameter_name : (str) name of the parameter (unique per task, see below)

  • parameter_id : (str) opaque ID string uniquely identifying parameter

  • default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)

  • current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.

  • task_name : (str) name of the task that this parameter belongs to

  • constraints: (dict) see the notes below

  • vertex_id: (str) ID of vertex that this parameter belongs to

Return type:

AdvancedTuningParamsType

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.

  • ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.

  • unicode: The parameter may be any Python unicode object.

  • int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].

  • float: The value may be an object of type float within the specified range (inclusive).

  • intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)

Retrieve a list of all confusion matrices available for the model.

Parameters:
fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of ConfusionChart

Data for all available confusion charts for model.

get_all_feature_impacts(data_slice_filter=None)

Retrieve a list of all feature impact results available for the model.

Parameters:
data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.

Returns:
list of dicts

Data for all available model feature impacts. Or an empty list if not data found.

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()
get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all Lift charts available for the model.

Parameters:
fallback_to_parent_insightsbool, optional

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

data_slice_filterDataSlice, optional

Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:
list of LiftChart

Data for all available model lift charts. Or an empty list if no data found.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()
get_all_multiclass_lift_charts(fallback_to_parent_insights=False)

Retrieve a list of all Lift charts available for the model.

Parameters:
fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of LiftChart

Data for all available model lift charts.

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all residuals charts available for the model.

Parameters:
fallback_to_parent_insightsbool

Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

data_slice_filterDataSlice, optional

Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:
list of ResidualsChart

Data for all available model residuals charts.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()
get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all ROC curves available for the model.

Parameters:
fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

data_slice_filterDataSlice, optional

filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:
list of RocCurve

Data for all available model ROC curves. Or an empty list if no RocCurves are found.

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()
get_confusion_chart(source, fallback_to_parent_insights=False)

Retrieve a multiclass model’s confusion matrix for the specified source.

Parameters:
sourcestr

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
ConfusionChart

Model ConfusionChart data

Raises:
ClientError

If the insight is not available for this model

get_cross_class_accuracy_scores()

Retrieves a list of Cross Class Accuracy scores for the model.

Returns:
json
get_cross_validation_scores(partition=None, metric=None)

Return a dictionary, keyed by metric, showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:
partitionfloat

optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.

metric: unicode

optional name of the metric to filter to resulting cross validation scores by

Returns:
cross_validation_scores: dict

A dictionary keyed by metric showing cross validation scores per partition.

get_data_disparity_insights(feature, class_name1, class_name2)

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters:
featurestr

Bias and Fairness protected feature name.

class_name1str

One of the compared classes

class_name2str

Another compared class

Returns:
json
get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)

Retrieve a list of Per Class Bias insights for the model.

Parameters:
fairness_metrics_setstr, optional

Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

offsetint, optional

Number of items to skip.

limitint, optional

Number of items to return.

Returns:
json
get_feature_effect(source, data_slice_id=None)

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
sourcestring

The source Feature Effects are retrieved for.

data_slice_idstring, optional

ID for the data slice used in the request. If None, retrieve unsliced insight data.

Returns:
feature_effectsFeatureEffects

The feature effects data.

Raises:
ClientError (404)

If the feature effects have not been computed or source is not valid value.

get_feature_effect_metadata()

Retrieve Feature Effects metadata. Response contains status and available model sources.

  • Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.

  • When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.

  • Feature Effects for holdout is not available when holdout was not unlocked for the project.

Use source to retrieve Feature Effects, selecting one of the provided sources.

Returns:
feature_effect_metadata: FeatureEffectMetadata
get_feature_effects_multiclass(source='training', class_=None)

Retrieve Feature Effects for the multiclass model.

Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
sourcestr

The source Feature Effects are retrieved for.

class_str or None

The class name Feature Effects are retrieved for.

Returns:
list

The list of multiclass feature effects.

Raises:
ClientError (404)

If Feature Effects have not been computed or source is not valid value.

get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters:
with_metadatabool

The flag indicating if the result should include the metadata as well.

data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.

Returns:
list or dict

The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:

  • featureImpacts - Feature Impact data as a dictionary. Each item is a dict with

    keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.

  • shapBased - A boolean that indicates whether Feature Impact was calculated using

    Shapley values.

  • ranRedundancyDetection - A boolean that indicates whether redundant feature

    identification was run while calculating this Feature Impact.

  • rowCount - An integer or None that indicates the number of rows that was used to

    calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.

  • count - An integer with the number of features under the featureImpacts.

Raises:
ClientError (404)

If the feature impacts have not been computed.

ValueError

If data_slice_filter passed as None

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:
featureslist of str

The names of the features used in the model.

Return type:

List[str]

get_frozen_child_models()

Retrieve the IDs for all models that are frozen from this model.

Returns:
A list of Models
get_labelwise_roc_curves(source, fallback_to_parent_insights=False)

Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .

Added in version v2.24.

Parameters:
sourcestr

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns:
list of LabelwiseRocCurve

Labelwise ROC Curve instances for source and all labels

Raises:
ClientError

If the insight is not available for this model

get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the model Lift chart for the specified source.

Parameters:
sourcestr

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.

fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

Returns:
LiftChart

Model lift chart data

Raises:
ClientError

If the insight is not available for this model

ValueError

If data_slice_filter passed as None

get_missing_report_info()

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns:
An iterable of MissingReportPerFeature

The queried model missing report, sorted by missing count (DESCENDING order).

get_model_blueprint_chart()

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns:
ModelBlueprintChart

The queried model blueprint chart.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:
list of BlueprintTaskDocument

All documents available for the model.

get_model_blueprint_json()

Get the blueprint json representation used by this model.

Returns:
BlueprintJson

Json representation of the blueprint stages.

Return type:

Dict[str, Tuple[List[str], List[str], str]]

get_multiclass_feature_impact()

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:
feature_impactslist of dict

The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:
ClientError (404)

If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False)

Retrieve model Lift chart for the specified source.

Parameters:
sourcestr

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
list of LiftChart

Model lift chart data for each saved target class

Raises:
ClientError

If the insight is not available for this model

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)

Retrieve model Lift charts for the specified source.

Added in version v2.24.

Parameters:
sourcestr

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
list of LiftChart

Model lift chart data for each saved target class

Raises:
ClientError

If the insight is not available for this model

get_num_iterations_trained()

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns:
projectId: str

id of project containing the model

modelId: str

id of the model

data: array

list of numEstimatorsItem objects, one for each modeling stage.

numEstimatorsItem will be of the form:
stage: str

indicates the modeling stage (for multi-stage models); None of single-stage models

numIterations: int

the number of estimators or iterations trained by the model

get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)

Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source.

Parameters:
sourcestring

The source Feature Effects are retrieved for.

max_waitint, optional

The maximum time to wait for a requested Feature Effect job to complete before erroring.

row_countint, optional

(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

data_slice_idstr, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
feature_effectsFeatureEffects

The Feature Effects data.

get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)

Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.

Parameters:
sourcestring

The source Feature Effects retrieve for.

class_str or None

The class name Feature Effects retrieve for.

row_countint

The number of rows from dataset to use for Feature Impact calculation.

top_n_featuresint or None

Number of top features (ranked by Feature Impact) used to calculate Feature Effects.

featureslist or None

The list of features used to calculate Feature Effects.

max_waitint, optional

The maximum time to wait for a requested Feature Effects job to complete before erroring.

Returns:
feature_effectslist of FeatureEffectsMulticlass

The list of multiclass feature effects data.

get_or_request_feature_impact(max_wait=600, **kwargs)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:
max_waitint, optional

The maximum time to wait for a requested feature impact job to complete before erroring

**kwargs

Arbitrary keyword arguments passed to request_feature_impact.

Returns:
feature_impactslist or dict

The feature impact data. See get_feature_impact for the exact schema.

get_parameters()

Retrieve model parameters.

Returns:
ModelParameters

Model parameters for this model.

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:
ParetoFront

Model ParetoFront data

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:
prime_eligibilitydict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve model residuals chart for the specified source.

Parameters:
sourcestr

Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.

data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.

Returns:
ResidualsChart

Model residuals chart data

Raises:
ClientError

If the insight is not available for this model

ValueError

If data_slice_filter passed as None

get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.

Parameters:
sourcestr

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.

fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns:
RocCurve

Model ROC curve data

Raises:
ClientError

If the insight is not available for this model

(New in version v3.0) TypeError

If the underlying project type is multilabel

ValueError

If data_slice_filter passed as None

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:
rulesetslist of Ruleset
Return type:

List[Ruleset]

get_supported_capabilities()

Retrieves a summary of the capabilities supported by a model.

Added in version v2.14.

Returns:
supportsBlending: bool

whether the model supports blending

supportsMonotonicConstraints: bool

whether the model supports monotonic constraints

hasWordCloud: bool

whether the model has word cloud data available

eligibleForPrime: bool

(Deprecated in version v3.6) whether the model is eligible for Prime

hasParameters: bool

whether the model has parameters that can be retrieved

supportsCodeGeneration: bool

(New in version v2.18) whether the model supports code generation

supportsShap: bool
(New in version v2.18) True if the model supports Shapley package. i.e. Shapley based

feature Importance

supportsEarlyStopping: bool

(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_uri()
Returns:
urlstr

Permanent static hyperlink to this model at leaderboard.

Return type:

str

get_word_cloud(exclude_stop_words=False)

Retrieve word cloud data for the model.

Parameters:
exclude_stop_wordsbool, optional

Set to True if you want stopwords filtered out of response.

Returns:
WordCloud

Word cloud data for the model.

incremental_train(data_stage_id, training_data_name=None)

Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.

Return type:

ModelJob

classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)

Retrieve paginated model records, sorted by scores, with optional filtering.

Parameters:
sort_by_partition: str, one of `validation`, `backtesting`, `crossValidation` or `holdout`

Set the partition to use for sorted (by score) list of models. validation is the default.

sort_by_metric: str

Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.

with_metric: str

For a single-metric list of results, specify that project metric.

search_term: str

If specified, only models containing the term in their name or processes are returned.

featurelists: list of str

If specified, only models trained on selected featurelists are returned.

families: list of str

If specified, only models belonging to selected families are returned.

blueprints: list of str

If specified, only models trained on specified blueprint IDs are returned.

labels: list of str, `starred` or `prepared for deployment`

If specified, only models tagged with all listed labels are returned.

characteristics: list of str

If specified, only models matching all listed characteristics are returned.

training_filters: list of str

If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings

number_of_clusters: list of int

Filter models by number of clusters. Applicable only in unsupervised clustering projects.

limit: int
offset: int
Returns:
generic_models: list of GenericModel
Return type:

List[GenericModel]

open_in_browser()

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type:

None

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:
jobJob

the job generating the rulesets

request_cross_class_accuracy_scores()

Request data disparity insights to be computed for the model.

Returns:
status_idstr

A statusId of computation request.

request_data_disparity_insights(feature, compared_class_names)

Request data disparity insights to be computed for the model.

Parameters:
featurestr

Bias and Fairness protected feature name.

compared_class_nameslist(str)

List of two classes to compare

Returns:
status_idstr

A statusId of computation request.

request_external_test(dataset_id, actual_value_column=None)

Request external test to compute scores and insights on an external test dataset

Parameters:
dataset_idstring

The dataset to make predictions against (as uploaded from Project.upload_dataset)

actual_value_columnstring, optional

(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

Returns
——-
jobJob

a Job representing external dataset insights computation

request_fairness_insights(fairness_metrics_set=None)

Request fairness insights to be computed for the model.

Parameters:
fairness_metrics_setstr, optional

Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

Returns:
status_idstr

A statusId of computation request.

Return type:

str

request_feature_effect(row_count=None, data_slice_id=None)

Submit request to compute Feature Effects for the model.

See get_feature_effect for more information on the result of the job.

Parameters:
row_countint

(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

data_slice_idstr, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
jobJob

A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature effect have already been requested.

request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)

Request Feature Effects computation for the multiclass model.

See get_feature_effect for more information on the result of the job.

Parameters:
row_countint

The number of rows from dataset to use for Feature Impact calculation.

top_n_featuresint or None

Number of top features (ranked by feature impact) used to calculate Feature Effects.

featureslist or None

The list of features used to calculate Feature Effects.

Returns:
jobJob

A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.

request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters:
row_countint, optional

The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.

with_metadatabool, optional

Flag indicating whether the result should include the metadata. If true, metadata is included.

data_slice_idstr, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
jobJob or status_id

Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)

Train a new frozen model with parameters from this model.

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
training_row_countint, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_durationstr, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

training_start_datedatetime.datetime, optional

the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

training_end_datedatetime.datetime, optional

the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

time_window_sample_pctint, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

sampling_methodstr, optional

(New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

Returns:
model_jobModelJob

the modeling job training a frozen model

Return type:

ModelJob

request_frozen_model(sample_pct=None, training_row_count=None)

Train a new frozen model with parameters from this model :rtype: ModelJob

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:
sample_pctfloat

optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.

training_row_countint

(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns:
model_jobModelJob

the modeling job training a frozen model

request_lift_chart(source, data_slice_id=None)

Request the model Lift Chart for the specified source.

Parameters:
sourcestr

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

data_slice_idstring, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
status_check_jobStatusCheckJob

Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_per_class_fairness_insights(fairness_metrics_set=None)

Request per-class fairness insights be computed for the model.

Parameters:
fairness_metrics_setstr, optional

The fairness metric used to calculate the fairness scores. Value can be any one of <datarobot.enums.FairnessMetricsSet>.

Returns:
status_check_jobStatusCheckJob

The returned object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)

Requests predictions against a previously uploaded dataset.

Parameters:
dataset_idstring, optional

The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)

datasetDataset, optional

The dataset to make predictions against (as uploaded from Project.upload_dataset)

dataframepd.DataFrame, optional

(New in v3.0) The dataframe to make predictions against

file_pathstr, optional

(New in v3.0) Path to file to make predictions against

fileIOBase, optional

(New in v3.0) File to make predictions against

include_prediction_intervalsbool, optional

(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.

prediction_intervals_sizeint, optional

(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).

forecast_pointdatetime.datetime or None, optional

(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.

predictions_start_datedatetime.datetime or None, optional

(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.

predictions_end_datedatetime.datetime or None, optional

(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.

actual_value_columnstring, optional

(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the

response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).

max_explanations: (New in version v2.21) int optional; specifies the maximum number of

explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.

max_ngram_explanations: optional; int or str

(New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.

Returns:
jobPredictJob

The job computing the predictions

Return type:

PredictJob

request_residuals_chart(source, data_slice_id=None)

Request the model residuals chart for the specified source.

Parameters:
sourcestr

Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

data_slice_idstring, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
status_check_jobStatusCheckJob

Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_roc_curve(source, data_slice_id=None)

Request the model Roc Curve for the specified source.

Parameters:
sourcestr

Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

data_slice_idstring, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
status_check_jobStatusCheckJob

Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)

Start a job to build training predictions

Parameters:
data_subsetstr

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for

    models in datetime partitioned projects

  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for

    all data except training set. Not valid for models in datetime partitioned projects

  • dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only

  • dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading

    the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.

explanation_algorithmdr.enums.EXPLANATIONS_ALGORITHM

(New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).

max_explanationsint

(New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the max_explanations, the sum of remaining values will also be returned as shap_remaining_total. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored if explanation_algorithm is not set.

Returns:
Job

an instance of created async job

retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)

Submit a job to the queue to train a blender model.

Parameters:
sample_pct: float, optional

The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.

featurelist_idstr, optional

The featurelist id

training_row_countint, optional

The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.

n_clusters: int, optional

(new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.

Returns:
jobModelJob

The created job that is retraining the model

Return type:

ModelJob

set_prediction_threshold(threshold)

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters:
thresholdfloat

only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:
AdvancedTuningSession

Session for setting up and running Advanced Tuning on a model

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class. :rtype: str

Note

For datetime partitioned projects, see train_datetime instead.

Parameters:
sample_pctfloat, optional

The amount of data to use for training, as a percentage of the project dataset from 0 to 100.

featurelist_idstr, optional

The identifier of the featurelist to use. If not defined, the featurelist of this model is used.

scoring_typestr, optional

Either validation or crossValidation (also dr.SCORING_TYPE.validation or dr.SCORING_TYPE.cross_validation). validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, crossValidation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.

training_row_countint, optional

The number of rows to use to train the requested model.

monotonic_increasing_featurelist_idstr

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_idstr

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:
model_job_idstr

id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Examples

project = Project.get('project-id')
model = Model.get('project-id', 'model-id')
model_job_id = model.train(training_row_count=project.max_train_rows)
train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)

Trains this model on a different featurelist or sample size.

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
featurelist_idstr, optional

the featurelist to use to train the model. If not specified, the featurelist of this model is used.

training_row_countint, optional

the number of rows of data that should be used to train the model. If specified, neither training_duration nor use_project_settings may be specified.

training_durationstr, optional

a duration string specifying what time range the data used to train the model should span. If specified, neither training_row_count nor use_project_settings may be specified.

use_project_settingsbool, optional

(New in version v2.20) defaults to False. If True, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neither training_row_count nor training_duration may be specified.

time_window_sample_pctint, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

sampling_methodstr, optional

(New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

monotonic_increasing_featurelist_idstr, optional

(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_idstr, optional

(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

n_clusters: int, optional

(New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.

Returns:
jobModelJob

the created job to build the model

Return type:

ModelJob

train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters:
data_stage_id: str

The id of the data stage to use for training.

training_data_namestr, optional

The name of the iteration or data stage to indicate what the incremental learning was performed on.

data_stage_encodingstr, optional

The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252

data_stage_encodingstr, optional

The delimiter used by the data in the data stage (default: ‘,’).

data_stage_compressionstr, optional

The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip

Returns:
jobModelJob

The created job that is retraining the model

unstar_model()

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

DatetimeModel

class datarobot.models.DatetimeModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, training_info=None, holdout_score=None, holdout_status=None, data_selection_method=None, backtests=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, effective_feature_derivation_window_start=None, effective_feature_derivation_window_end=None, forecast_window_start=None, forecast_window_end=None, windows_basis_unit=None, model_number=None, parent_model_id=None, supports_composable_ml=None, n_clusters=None, is_n_clusters_dynamically_determined=None, has_empty_clusters=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None, **kwargs)

Represents a model from a datetime partitioned project

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Note that only one of training_row_count, training_duration, and training_start_date and training_end_date will be specified, depending on the data_selection_method of the model. Whichever method was selected determines the amount of data used to train on when making predictions and scoring the backtests and the holdout.

Attributes:
idstr

the id of the model

project_idstr

the id of the project the model belongs to

processeslist of str

the processes used by the model

featurelist_namestr

the name of the featurelist used by the model

featurelist_idstr

the id of the featurelist used by the model

sample_pctfloat

the percentage of the project dataset used in training the model

training_row_countint or None

If specified, an int specifying the number of rows used to train the model and evaluate backtest scores.

training_durationstr or None

If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

training_start_datedatetime or None

only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

training_end_datedatetime or None

only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

time_window_sample_pctint or None

An integer between 1 and 99 indicating the percentage of sampling within the training window. The points kept are determined by a random uniform sample. If not specified, no sampling was done.

sampling_methodstr or None

(New in v2.23) indicates the way training data has been selected (either how rows have been selected within backtest or how time_window_sample_pct has been applied).

model_typestr

what model this is, e.g. ‘Nystroem Kernel SVM Regressor’

model_categorystr

what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models

is_frozenbool

whether this model is a frozen model

blueprint_idstr

the id of the blueprint used in this model

metricsdict

a mapping from each metric to the model’s scores for that metric. The keys in metrics are the different metrics used to evaluate the model, and the values are the results. The dictionaries inside of metrics will be as described here: ‘validation’, the score for a single backtest; ‘crossValidation’, always None; ‘backtesting’, the average score for all backtests if all are available and computed, or None otherwise; ‘backtestingScores’, a list of scores for all backtests where the score is None if that backtest does not have a score available; and ‘holdout’, the score for the holdout or None if the holdout is locked or the score is unavailable.

backtestslist of dict

describes what data was used to fit each backtest, the score for the project metric, and why the backtest score is unavailable if it is not provided.

data_selection_methodstr

which of training_row_count, training_duration, or training_start_data and training_end_date were used to determine the data used to fit the model. One of ‘rowCount’, ‘duration’, or ‘selectedDateRange’.

training_infodict

describes which data was used to train on when scoring the holdout and making predictions. training_info` will have the following keys: holdout_training_start_date, holdout_training_duration, holdout_training_row_count, holdout_training_end_date, prediction_training_start_date, prediction_training_duration, prediction_training_row_count, prediction_training_end_date. Start and end dates will be datetimes, durations will be duration strings, and rows will be integers.

holdout_scorefloat or None

the score against the holdout, if available and the holdout is unlocked, according to the project metric.

holdout_statusstring or None

the status of the holdout score, e.g. “COMPLETED”, “HOLDOUT_BOUNDARIES_EXCEEDED”. Unavailable if the holdout fold was disabled in the partitioning configuration.

monotonic_increasing_featurelist_idstr

optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

monotonic_decreasing_featurelist_idstr

optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

supports_monotonic_constraintsbool

optional, whether this model supports enforcing monotonic constraints

is_starredbool

whether this model marked as starred

prediction_thresholdfloat

for binary classification projects, the threshold used for predictions

prediction_threshold_read_onlybool

indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

effective_feature_derivation_window_startint or None

(New in v2.16) For time series projects only. How many units of the windows_basis_unit into the past relative to the forecast point the user needs to provide history for at prediction time. This can differ from the feature_derivation_window_start set on the project due to the differencing method and period selected, or if the model is a time series native model such as ARIMA. Will be a negative integer in time series projects and None otherwise.

effective_feature_derivation_window_endint or None

(New in v2.16) For time series projects only. How many units of the windows_basis_unit into the past relative to the forecast point the feature derivation window should end. Will be a non-positive integer in time series projects and None otherwise.

forecast_window_startint or None

(New in v2.16) For time series projects only. How many units of the windows_basis_unit into the future relative to the forecast point the forecast window should start. Note that this field will be the same as what is shown in the project settings. Will be a non-negative integer in time series projects and None otherwise.

forecast_window_endint or None

(New in v2.16) For time series projects only. How many units of the windows_basis_unit into the future relative to the forecast point the forecast window should end. Note that this field will be the same as what is shown in the project settings. Will be a non-negative integer in time series projects and None otherwise.

windows_basis_unitstr or None

(New in v2.16) For time series projects only. Indicates which unit is the basis for the feature derivation window and the forecast window. Note that this field will be the same as what is shown in the project settings. In time series projects, will be either the detected time unit or “ROW”, and None otherwise.

model_numberinteger

model number assigned to a model

parent_model_idstr or None

(New in version v2.20) the id of the model that tuning parameters are derived from

supports_composable_mlbool or None

(New in version v2.26) whether this model is supported in the Composable ML.

is_n_clusters_dynamically_determinedbool, optional

(New in version 2.27) if True, indicates that model determines number of clusters automatically.

n_clustersint, optional

(New in version 2.27) Number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.

classmethod get(project, model_id)

Retrieve a specific datetime model.

If the project does not use datetime partitioning, a ClientError will occur.

Parameters:
projectstr

the id of the project the model belongs to

model_idstr

the id of the model to retrieve

Returns:
modelDatetimeModel

the model

score_backtests()

Compute the scores for all available backtests.

Some backtests may be unavailable if the model is trained into their validation data.

Returns:
jobJob

a job tracking the backtest computation. When it is complete, all available backtests will have scores computed.

cross_validate()

Inherited from the model. DatetimeModels cannot request cross validation scores; use backtests instead.

Return type:

NoReturn

get_cross_validation_scores(partition=None, metric=None)

Inherited from Model - DatetimeModels cannot request Cross Validation scores,

Use backtests instead.

Return type:

NoReturn

request_training_predictions(data_subset, *args, **kwargs)

Start a job that builds training predictions.

Parameters:
data_subsetstr

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only

  • dr.enums.DATA_SUBSET.ALL_BACKTESTS for downloading the predictions for all

    backtest validation folds. Requires the model to have successfully scored all backtests.

Returns
——-
Job

an instance of created async job

get_series_accuracy_as_dataframe(offset=0, limit=100, metric=None, multiseries_value=None, order_by=None, reverse=False)

Retrieve series accuracy results for the specified model as a pandas.DataFrame.

Parameters:
offsetint, optional

The number of results to skip. Defaults to 0 if not specified.

limitint, optional

The maximum number of results to return. Defaults to 100 if not specified.

metricstr, optional

The name of the metric to retrieve scores for. If omitted, the default project metric will be used.

multiseries_valuestr, optional

If specified, only the series containing the given value in one of the series ID columns will be returned.

order_bystr, optional

Used for sorting the series. Attribute must be one of datarobot.enums.SERIES_ACCURACY_ORDER_BY.

reversebool, optional

Used for sorting the series. If True, will sort the series in descending order by the attribute specified by order_by.

Returns:
data

A pandas.DataFrame with the Series Accuracy for the specified model.

download_series_accuracy_as_csv(filename, encoding='utf-8', offset=0, limit=100, metric=None, multiseries_value=None, order_by=None, reverse=False)

Save series accuracy results for the specified model in a CSV file.

Parameters:
filenamestr or file object

The path or file object to save the data to.

encodingstr, optional

A string representing the encoding to use in the output csv file. Defaults to ‘utf-8’.

offsetint, optional

The number of results to skip. Defaults to 0 if not specified.

limitint, optional

The maximum number of results to return. Defaults to 100 if not specified.

metricstr, optional

The name of the metric to retrieve scores for. If omitted, the default project metric will be used.

multiseries_valuestr, optional

If specified, only the series containing the given value in one of the series ID columns will be returned.

order_bystr, optional

Used for sorting the series. Attribute must be one of datarobot.enums.SERIES_ACCURACY_ORDER_BY.

reversebool, optional

Used for sorting the series. If True, will sort the series in descending order by the attribute specified by order_by.

get_series_clusters(offset=0, limit=100, order_by=None, reverse=False)

Retrieve a dictionary of series and the clusters assigned to each series. This is only usable for clustering projects.

Parameters:
offsetint, optional

The number of results to skip. Defaults to 0 if not specified.

limitint, optional

The maximum number of results to return. Defaults to 100 if not specified.

order_bystr, optional

Used for sorting the series. Attribute must be one of datarobot.enums.SERIES_ACCURACY_ORDER_BY.

reversebool, optional

Used for sorting the series. If True, will sort the series in descending order by the attribute specified by order_by.

Returns:
Dict

A dictionary of the series in the dataset with their associated cluster

Raises:
ValueError

If the model type returns an unsupported insight

ClientError

If the insight is not available for this model

Return type:

Dict[str, str]

compute_series_accuracy(compute_all_series=False)

Compute series accuracy for the model.

Parameters:
compute_all_seriesbool, optional

Calculate accuracy for all series or only first 1000.

Returns:
Job

an instance of the created async job

retrain(time_window_sample_pct=None, featurelist_id=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, sampling_method=None, n_clusters=None)

Retrain an existing datetime model using a new training period for the model’s training set (with optional time window sampling) or a different feature list.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
featurelist_idstr, optional

The ID of the featurelist to use.

training_row_countint, optional

The number of rows to train the model on. If this parameter is used then sample_pct cannot be specified.

time_window_sample_pctint, optional

An int between 1 and 99 indicating the percentage of sampling within the time window. The points kept are determined by a random uniform sample. If specified, training_row_count must not be specified and either training_duration or training_start_date and training_end_date must be specified.

training_durationstr, optional

A duration string representing the training duration for the submitted model. If specified then training_row_count, training_start_date, and training_end_date cannot be specified.

training_start_datestr, optional

A datetime string representing the start date of the data to use for training this model. If specified, training_end_date must also be specified, and training_duration cannot be specified. The value must be before the training_end_date value.

training_end_datestr, optional

A datetime string representing the end date of the data to use for training this model. If specified, training_start_date must also be specified, and training_duration cannot be specified. The value must be after the training_start_date value.

sampling_methodstr, optional

(New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

n_clustersint, optional

(New in version 2.27) Number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.

Returns:
jobModelJob

The created job that is retraining the model

get_feature_effect_metadata()

Retrieve Feature Effect metadata for each backtest. Response contains status and available sources for each backtest of the model.

  • Each backtest is available for training and validation

  • If holdout is configured for the project it has holdout as backtestIndex. It has training and holdout sources available.

Start/stop models contain a single response item with startstop value for backtestIndex.

  • Feature Effect of training is always available (except for the old project which supports only Feature Effect for validation).

  • When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Effect is not available for validation or holdout.

  • Feature Effect for holdout is not available when there is no holdout configured for the project.

source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.

backtestIndex is expected parameter to submit compute request and retrieve Feature Effect. One of provided backtest indexes shall be used.

Returns:
feature_effect_metadata: FeatureEffectMetadataDatetime
request_feature_effect(backtest_index, data_slice_filter=<datarobot.models.model.Sentinel object>)

Request feature effects to be computed for the model.

See get_feature_effect for more information on the result of the job.

See get_feature_effect_metadata for retrieving information of backtest_index.

Parameters:
backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.

The backtest index to retrieve Feature Effects for.

Returns:
jobJob

A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature effect have already been requested.

get_feature_effect(source, backtest_index, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information of source, backtest_index.

Parameters:
source: string

The source Feature Effects are retrieved for. One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the available sources for feature effect.

backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.

The backtest index to retrieve Feature Effects for.

Returns:
feature_effects: FeatureEffects

The feature effects data.

Raises:
ClientError (404)

If the feature effects have not been computed or source is not valid value.

get_or_request_feature_effect(source, backtest_index, max_wait=600, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve Feature Effects computations for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source, backtest_index.

Parameters:
max_waitint, optional

The maximum time to wait for a requested feature effect job to complete before erroring

sourcestring

The source Feature Effects are retrieved for. One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the available sources for feature effect.

backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.

The backtest index to retrieve Feature Effects for.

Returns:
feature_effectsFeatureEffects

The feature effects data.

request_feature_effects_multiclass(backtest_index, row_count=None, top_n_features=None, features=None)

Request feature effects to be computed for the multiclass datetime model.

See get_feature_effect for more information on the result of the job.

Parameters:
backtest_indexstr

The backtest index to use for Feature Effects calculation.

row_countint

The number of rows from dataset to use for Feature Impact calculation.

top_n_featuresint or None

Number of top features (ranked by Feature Impact) used to calculate Feature Effects.

featureslist or None

The list of features to use to calculate Feature Effects.

Returns:
jobJob

A Job representing Feature Effects computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.

get_feature_effects_multiclass(backtest_index, source='training', class_=None)

Retrieve Feature Effects for the multiclass datetime model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
backtest_indexstr

The backtest index to retrieve Feature Effects for.

sourcestr

The source Feature Effects are retrieved for.

class_str or None

The class name Feature Effects are retrieved for.

Returns:
list

The list of multiclass Feature Effects.

Raises:
ClientError (404)

If the Feature Effects have not been computed or source is not valid value.

get_or_request_feature_effects_multiclass(backtest_index, source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)

Retrieve Feature Effects for a datetime multiclass model, and request a job if it hasn’t been run previously.

Parameters:
backtest_indexstr

The backtest index to retrieve Feature Effects for.

sourcestring

The source from which Feature Effects are retrieved.

class_str or None

The class name Feature Effects retrieve for.

row_countint

The number of rows used from the dataset for Feature Impact calculation.

top_n_featuresint or None

Number of top features (ranked by feature impact) used to calculate Feature Effects.

featureslist or None

The list of features used to calculate Feature Effects.

max_waitint, optional

The maximum time to wait for a requested feature effect job to complete before erroring.

Returns:
feature_effectslist of FeatureEffectsMulticlass

The list of multiclass feature effects data.

calculate_prediction_intervals(prediction_intervals_size)

Calculate prediction intervals for this DatetimeModel for the specified size. :rtype: Job

Added in version v2.19.

Parameters:
prediction_intervals_sizeint

The prediction interval’s size to calculate for this model. See the prediction intervals documentation for more information.

Returns:
jobJob

a Job tracking the prediction intervals computation

get_calculated_prediction_intervals(offset=None, limit=None)

Retrieve a list of already-calculated prediction intervals for this model

Added in version v2.19.

Parameters:
offsetint, optional

If provided, this many results will be skipped

limitint, optional

If provided, at most this many results will be returned. If not provided, will return at most 100 results.

Returns:
list[int]

A descending-ordered list of already-calculated prediction interval sizes

compute_datetime_trend_plots(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance_start=None, forecast_distance_end=None)

Computes datetime trend plots (Accuracy over Time, Forecast vs Actual, Anomaly over Time) for this model

Added in version v2.25.

Parameters:
backtestint or string, optional

Compute plots for a specific backtest (use the backtest index starting from zero). To compute plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT

sourcestring, optional

The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE

forecast_distance_startint, optional:

The start of forecast distance range (forecast window) to compute. If not specified, the first forecast distance for this project will be used. Only for time series supervised models

forecast_distance_endint, optional:

The end of forecast distance range (forecast window) to compute. If not specified, the last forecast distance for this project will be used. Only for time series supervised models

Returns:
jobJob

a Job tracking the datetime trend plots computation

Notes

  • Forecast distance specifies the number of time steps between the predicted point and the origin point.

  • For the multiseries models only first 1000 series in alphabetical order and an average plot for them will be computed.

  • Maximum 100 forecast distances can be requested for calculation in time series supervised projects.

get_accuracy_over_time_plots_metadata(forecast_distance=None)

Retrieve Accuracy over Time plots metadata for this model.

Added in version v2.25.

Parameters:
forecast_distanceint, optional

Forecast distance to retrieve the metadata for. If not specified, the first forecast distance for this project will be used. Only available for time series projects.

Returns:
metadataAccuracyOverTimePlotsMetadata

a AccuracyOverTimePlotsMetadata representing Accuracy over Time plots metadata

get_accuracy_over_time_plot(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance=None, series_id=None, resolution=None, max_bin_size=None, start_date=None, end_date=None, max_wait=600)

Retrieve Accuracy over Time plots for this model.

Added in version v2.25.

Parameters:
backtestint or string, optional

Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT

sourcestring, optional

The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE

forecast_distanceint, optional

Forecast distance to retrieve the plots for. If not specified, the first forecast distance for this project will be used. Only available for time series projects.

series_idstring, optional

The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.

resolutionstring, optional

Specifying at which resolution the data should be binned. If not provided an optimal resolution will be used to build chart data with number of bins <= max_bin_size. One of dr.enums.DATETIME_TREND_PLOTS_RESOLUTION.

max_bin_sizeint, optional

An int between 1 and 1000, which specifies the maximum number of bins for the retrieval. Default is 500.

start_datedatetime.datetime, optional

The start of the date range to return. If not specified, start date for requested plot will be used.

end_datedatetime.datetime, optional

The end of the date range to return. If not specified, end date for requested plot will be used.

max_waitint or None, optional

The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.

Returns:
plotAccuracyOverTimePlot

a AccuracyOverTimePlot representing Accuracy over Time plot

Examples

import datarobot as dr
import pandas as pd
model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_accuracy_over_time_plot()
df = pd.DataFrame.from_dict(plot.bins)
figure = df.plot("start_date", ["actual", "predicted"]).get_figure()
figure.savefig("accuracy_over_time.png")
get_accuracy_over_time_plot_preview(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance=None, series_id=None, max_wait=600)

Retrieve Accuracy over Time preview plots for this model.

Added in version v2.25.

Parameters:
backtestint or string, optional

Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT

sourcestring, optional

The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE

forecast_distanceint, optional

Forecast distance to retrieve the plots for. If not specified, the first forecast distance for this project will be used. Only available for time series projects.

series_idstring, optional

The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.

max_waitint or None, optional

The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.

Returns:
plotAccuracyOverTimePlotPreview

a AccuracyOverTimePlotPreview representing Accuracy over Time plot preview

Examples

import datarobot as dr
import pandas as pd
model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_accuracy_over_time_plot_preview()
df = pd.DataFrame.from_dict(plot.bins)
figure = df.plot("start_date", ["actual", "predicted"]).get_figure()
figure.savefig("accuracy_over_time_preview.png")
get_forecast_vs_actual_plots_metadata()

Retrieve Forecast vs Actual plots metadata for this model.

Added in version v2.25.

Returns:
metadataForecastVsActualPlotsMetadata

a ForecastVsActualPlotsMetadata representing Forecast vs Actual plots metadata

get_forecast_vs_actual_plot(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance_start=None, forecast_distance_end=None, series_id=None, resolution=None, max_bin_size=None, start_date=None, end_date=None, max_wait=600)

Retrieve Forecast vs Actual plots for this model.

Added in version v2.25.

Parameters:
backtestint or string, optional

Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT

sourcestring, optional

The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE

forecast_distance_startint, optional:

The start of forecast distance range (forecast window) to retrieve. If not specified, the first forecast distance for this project will be used.

forecast_distance_endint, optional:

The end of forecast distance range (forecast window) to retrieve. If not specified, the last forecast distance for this project will be used.

series_idstring, optional

The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.

resolutionstring, optional

Specifying at which resolution the data should be binned. If not provided an optimal resolution will be used to build chart data with number of bins <= max_bin_size. One of dr.enums.DATETIME_TREND_PLOTS_RESOLUTION.

max_bin_sizeint, optional

An int between 1 and 1000, which specifies the maximum number of bins for the retrieval. Default is 500.

start_datedatetime.datetime, optional

The start of the date range to return. If not specified, start date for requested plot will be used.

end_datedatetime.datetime, optional

The end of the date range to return. If not specified, end date for requested plot will be used.

max_waitint or None, optional

The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.

Returns:
plotForecastVsActualPlot

a ForecastVsActualPlot representing Forecast vs Actual plot

Examples

import datarobot as dr
import pandas as pd
import matplotlib.pyplot as plt

model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_forecast_vs_actual_plot()
df = pd.DataFrame.from_dict(plot.bins)

# As an example, get the forecasts for the 10th point
forecast_point_index = 10
# Pad the forecasts for plotting. The forecasts length must match the df length
forecasts = [None] * forecast_point_index + df.forecasts[forecast_point_index]
forecasts = forecasts + [None] * (len(df) - len(forecasts))

plt.plot(df.start_date, df.actual, label="Actual")
plt.plot(df.start_date, forecasts, label="Forecast")
forecast_point = df.start_date[forecast_point_index]
plt.title("Forecast vs Actual (Forecast Point {})".format(forecast_point))
plt.legend()
plt.savefig("forecast_vs_actual.png")
get_forecast_vs_actual_plot_preview(backtest=0, source=SOURCE_TYPE.VALIDATION, series_id=None, max_wait=600)

Retrieve Forecast vs Actual preview plots for this model.

Added in version v2.25.

Parameters:
backtestint or string, optional

Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT

sourcestring, optional

The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE

series_idstring, optional

The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.

max_waitint or None, optional

The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.

Returns:
plotForecastVsActualPlotPreview

a ForecastVsActualPlotPreview representing Forecast vs Actual plot preview

Examples

import datarobot as dr
import pandas as pd
model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_forecast_vs_actual_plot_preview()
df = pd.DataFrame.from_dict(plot.bins)
figure = df.plot("start_date", ["actual", "predicted"]).get_figure()
figure.savefig("forecast_vs_actual_preview.png")
get_anomaly_over_time_plots_metadata()

Retrieve Anomaly over Time plots metadata for this model.

Added in version v2.25.

Returns:
metadataAnomalyOverTimePlotsMetadata

a AnomalyOverTimePlotsMetadata representing Anomaly over Time plots metadata

get_anomaly_over_time_plot(backtest=0, source=SOURCE_TYPE.VALIDATION, series_id=None, resolution=None, max_bin_size=None, start_date=None, end_date=None, max_wait=600)

Retrieve Anomaly over Time plots for this model.

Added in version v2.25.

Parameters:
backtestint or string, optional

Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT

sourcestring, optional

The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE

series_idstring, optional

The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.

resolutionstring, optional

Specifying at which resolution the data should be binned. If not provided an optimal resolution will be used to build chart data with number of bins <= max_bin_size. One of dr.enums.DATETIME_TREND_PLOTS_RESOLUTION.

max_bin_sizeint, optional

An int between 1 and 1000, which specifies the maximum number of bins for the retrieval. Default is 500.

start_datedatetime.datetime, optional

The start of the date range to return. If not specified, start date for requested plot will be used.

end_datedatetime.datetime, optional

The end of the date range to return. If not specified, end date for requested plot will be used.

max_waitint or None, optional

The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.

Returns:
plotAnomalyOverTimePlot

a AnomalyOverTimePlot representing Anomaly over Time plot

Examples

import datarobot as dr
import pandas as pd
model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_anomaly_over_time_plot()
df = pd.DataFrame.from_dict(plot.bins)
figure = df.plot("start_date", "predicted").get_figure()
figure.savefig("anomaly_over_time.png")
get_anomaly_over_time_plot_preview(prediction_threshold=0.5, backtest=0, source=SOURCE_TYPE.VALIDATION, series_id=None, max_wait=600)

Retrieve Anomaly over Time preview plots for this model.

Added in version v2.25.

Parameters:
prediction_threshold: float, optional

Only bins with predictions exceeding this threshold will be returned in the response.

backtestint or string, optional

Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT

sourcestring, optional

The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE

series_idstring, optional

The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.

max_waitint or None, optional

The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.

Returns:
plotAnomalyOverTimePlotPreview

a AnomalyOverTimePlotPreview representing Anomaly over Time plot preview

Examples

import datarobot as dr
import pandas as pd
import matplotlib.pyplot as plt

model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_anomaly_over_time_plot_preview(prediction_threshold=0.01)
df = pd.DataFrame.from_dict(plot.bins)
x = pd.date_range(
    plot.start_date, plot.end_date, freq=df.end_date[0] - df.start_date[0]
)
plt.plot(x, [0] * len(x), label="Date range")
plt.plot(df.start_date, [0] * len(df.start_date), "ro", label="Anomaly")
plt.yticks([])
plt.legend()
plt.savefig("anomaly_over_time_preview.png")
initialize_anomaly_assessment(backtest, source, series_id=None)

Initialize the anomaly assessment insight and calculate Shapley explanations for the most anomalous points in the subset. The insight is available for anomaly detection models in time series unsupervised projects which also support calculation of Shapley values.

Parameters:
backtest: int starting with 0 or “holdout”

The backtest to compute insight for.

source: “training” or “validation”

The source to compute insight for.

series_id: string

Required for multiseries projects. The series id to compute insight for. Say if there is a series column containing cities, the example of the series name to pass would be “Boston”

Returns:
AnomalyAssessmentRecord
get_anomaly_assessment_records(backtest=None, source=None, series_id=None, limit=100, offset=0, with_data_only=False)

Retrieve computed Anomaly Assessment records for this model. Model must be an anomaly detection model in time series unsupervised project which also supports calculation of Shapley values.

Records can be filtered by the data backtest, source and series_id. The results can be limited.

Added in version v2.25.

Parameters:
backtest: int starting with 0 or “holdout”

The backtest of the data to filter records by.

source: “training” or “validation”

The source of the data to filter records by.

series_id: string

The series id to filter records by.

limit: int, optional
offset: int, optional
with_data_only: bool, optional

Whether to return only records with preview and explanations available. False by default.

Returns:
recordslist of AnomalyAssessmentRecord

a AnomalyAssessmentRecord representing Anomaly Assessment Record

get_feature_impact(with_metadata=False, backtest=None, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Else where this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters:
with_metadatabool

The flag indicating if the result should include the metadata as well.

backtestint or string

The index of the backtest unless it is holdout then it is string ‘holdout’. This is supported only in DatetimeModels

data_slice_filterDataSlice, optional

(New in version v3.4) A data slice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns:
list or dict

The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:

  • featureImpacts - Feature Impact data as a dictionary. Each item is a dict with

    keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.

  • shapBased - A boolean that indicates whether Feature Impact was calculated using

    Shapley values.

  • ranRedundancyDetection - A boolean that indicates whether redundant feature

    identification was run while calculating this Feature Impact.

  • rowCount - An integer or None that indicates the number of rows that was used to

    calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.

  • count - An integer with the number of features under the featureImpacts.

Raises:
ClientError (404)

If the feature impacts have not been computed.

request_feature_impact(row_count=None, with_metadata=False, backtest=None, data_slice_filter=<datarobot.models.model.Sentinel object>)

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters:
row_countint

The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multi-class (that has a separate method) and time series projects.

with_metadatabool

The flag indicating if the result should include the metadata as well.

backtestint or string

The index of the backtest unless it is holdout then it is string ‘holdout’. This is supported only in DatetimeModels

data_slice_filterDataSlice, optional

(New in version v3.4) A data slice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns:
jobJob

A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature impacts have already been requested.

get_or_request_feature_impact(max_wait=600, row_count=None, with_metadata=False, backtest=None, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:
max_waitint, optional

The maximum time to wait for a requested feature impact job to complete before erroring

row_countint

The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multi-class (that has a separate method) and time series projects.

with_metadatabool

The flag indicating if the result should include the metadata as well.

backteststr

Feature Impact backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.

data_slice_filterDataSlice, optional

(New in version v3.4) A data slice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns
——-
feature_impactslist or dict

The feature impact data. See get_feature_impact for the exact schema.

request_lift_chart(source=None, backtest_index=None, data_slice_filter=<datarobot.models.model.Sentinel object>)

(New in version v3.4) Request the model Lift Chart for the specified backtest data slice.

Parameters:
sourcestr

(Deprecated in version v3.4) Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. If backtest_index is present then this will be ignored.

backtest_indexstr

Lift chart data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.

data_slice_filterDataSlice, optional

A data slice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then request_lift_chart will raise a ValueError.

Returns:
status_check_jobStatusCheckJob

Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

get_lift_chart(source=None, backtest_index=None, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

(New in version v3.4) Retrieve the model Lift chart for the specified backtest and data slice.

Parameters:
sourcestr

(Deprecated in version v3.4) Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model. If backtest_index is present then this will be ignored.

backtest_indexstr

Lift chart data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.

fallback_to_parent_insightsbool

Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

data_slice_filterDataSlice, optional

A data slice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

Returns:
LiftChart

Model lift chart data

Raises:
ClientError

If the insight is not available for this model

ValueError

If data_slice_filter passed as None

request_roc_curve(source=None, backtest_index=None, data_slice_filter=<datarobot.models.model.Sentinel object>)

(New in version v3.4) Request the binary model Roc Curve for the specified backtest and data slice.

Parameters:
sourcestr

(Deprecated in version v3.4) Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. If backtest_index is present then this will be ignored.

backtest_indexstr

ROC curve data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.

data_slice_filterDataSlice, optional

A data slice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then request_roc_curve will raise a ValueError.

Returns:
status_check_jobStatusCheckJob

Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

get_roc_curve(source=None, backtest_index=None, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

(New in version v3.4) Retrieve the ROC curve for a binary model for the specified backtest and data slice.

Parameters:
sourcestr

(Deprecated in version v3.4) ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model. If backtest_index is present then this will be ignored.

backtest_indexstr

ROC curve data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.

fallback_to_parent_insightsbool

Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

data_slice_filterDataSlice, optional

A data slice used to filter the return values based on the data slice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns:
RocCurve

Model ROC curve data

Raises:
ClientError

If the insight is not available for this model

TypeError

If the underlying project type is multilabel

ValueError

If data_slice_filter passed as None

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters:
paramsdict

Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

descriptionstr

Human-readable string describing the newly advanced-tuned model

Returns:
ModelJob

The created job to build the model

Return type:

ModelJob

delete()

Delete a model from the project’s leaderboard.

Return type:

None

download_scoring_code(file_name, source_code=False)

Download the Scoring Code JAR.

Parameters:
file_namestr

File path where scoring code will be saved.

source_codebool, optional

Set to True to download source code archive. It will not be executable.

Return type:

None

download_training_artifact(file_name)

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters:
file_namestr

File path where trained model artifact(s) will be saved.

classmethod from_data(data)

Instantiate an object of this class using a dict.

Parameters:
datadict

Correctly snake_cased keys and their values.

Return type:

TypeVar(T, bound= APIObject)

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:
dict

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys

  • parameter_name : (str) name of the parameter (unique per task, see below)

  • parameter_id : (str) opaque ID string uniquely identifying parameter

  • default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)

  • current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.

  • task_name : (str) name of the task that this parameter belongs to

  • constraints: (dict) see the notes below

  • vertex_id: (str) ID of vertex that this parameter belongs to

Return type:

AdvancedTuningParamsType

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.

  • ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.

  • unicode: The parameter may be any Python unicode object.

  • int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].

  • float: The value may be an object of type float within the specified range (inclusive).

  • intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)

Retrieve a list of all confusion matrices available for the model.

Parameters:
fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of ConfusionChart

Data for all available confusion charts for model.

get_all_feature_impacts(data_slice_filter=None)

Retrieve a list of all feature impact results available for the model.

Parameters:
data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.

Returns:
list of dicts

Data for all available model feature impacts. Or an empty list if not data found.

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()
get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all Lift charts available for the model.

Parameters:
fallback_to_parent_insightsbool, optional

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

data_slice_filterDataSlice, optional

Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:
list of LiftChart

Data for all available model lift charts. Or an empty list if no data found.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()
get_all_multiclass_lift_charts(fallback_to_parent_insights=False)

Retrieve a list of all Lift charts available for the model.

Parameters:
fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of LiftChart

Data for all available model lift charts.

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all residuals charts available for the model.

Parameters:
fallback_to_parent_insightsbool

Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

data_slice_filterDataSlice, optional

Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:
list of ResidualsChart

Data for all available model residuals charts.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()
get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all ROC curves available for the model.

Parameters:
fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

data_slice_filterDataSlice, optional

filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:
list of RocCurve

Data for all available model ROC curves. Or an empty list if no RocCurves are found.

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()
get_confusion_chart(source, fallback_to_parent_insights=False)

Retrieve a multiclass model’s confusion matrix for the specified source.

Parameters:
sourcestr

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
ConfusionChart

Model ConfusionChart data

Raises:
ClientError

If the insight is not available for this model

get_cross_class_accuracy_scores()

Retrieves a list of Cross Class Accuracy scores for the model.

Returns:
json
get_data_disparity_insights(feature, class_name1, class_name2)

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters:
featurestr

Bias and Fairness protected feature name.

class_name1str

One of the compared classes

class_name2str

Another compared class

Returns:
json
get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)

Retrieve a list of Per Class Bias insights for the model.

Parameters:
fairness_metrics_setstr, optional

Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

offsetint, optional

Number of items to skip.

limitint, optional

Number of items to return.

Returns:
json
get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:
featureslist of str

The names of the features used in the model.

Return type:

List[str]

get_frozen_child_models()

Retrieve the IDs for all models that are frozen from this model.

Returns:
A list of Models
get_labelwise_roc_curves(source, fallback_to_parent_insights=False)

Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .

Added in version v2.24.

Parameters:
sourcestr

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns:
list of LabelwiseRocCurve

Labelwise ROC Curve instances for source and all labels

Raises:
ClientError

If the insight is not available for this model

get_missing_report_info()

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns:
An iterable of MissingReportPerFeature

The queried model missing report, sorted by missing count (DESCENDING order).

get_model_blueprint_chart()

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns:
ModelBlueprintChart

The queried model blueprint chart.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:
list of BlueprintTaskDocument

All documents available for the model.

get_model_blueprint_json()

Get the blueprint json representation used by this model.

Returns:
BlueprintJson

Json representation of the blueprint stages.

Return type:

Dict[str, Tuple[List[str], List[str], str]]

get_multiclass_feature_impact()

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:
feature_impactslist of dict

The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:
ClientError (404)

If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False)

Retrieve model Lift chart for the specified source.

Parameters:
sourcestr

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
list of LiftChart

Model lift chart data for each saved target class

Raises:
ClientError

If the insight is not available for this model

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)

Retrieve model Lift charts for the specified source.

Added in version v2.24.

Parameters:
sourcestr

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
list of LiftChart

Model lift chart data for each saved target class

Raises:
ClientError

If the insight is not available for this model

get_num_iterations_trained()

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns:
projectId: str

id of project containing the model

modelId: str

id of the model

data: array

list of numEstimatorsItem objects, one for each modeling stage.

numEstimatorsItem will be of the form:
stage: str

indicates the modeling stage (for multi-stage models); None of single-stage models

numIterations: int

the number of estimators or iterations trained by the model

get_parameters()

Retrieve model parameters.

Returns:
ModelParameters

Model parameters for this model.

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:
ParetoFront

Model ParetoFront data

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:
prime_eligibilitydict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve model residuals chart for the specified source.

Parameters:
sourcestr

Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.

data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.

Returns:
ResidualsChart

Model residuals chart data

Raises:
ClientError

If the insight is not available for this model

ValueError

If data_slice_filter passed as None

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:
rulesetslist of Ruleset
Return type:

List[Ruleset]

get_supported_capabilities()

Retrieves a summary of the capabilities supported by a model.

Added in version v2.14.

Returns:
supportsBlending: bool

whether the model supports blending

supportsMonotonicConstraints: bool

whether the model supports monotonic constraints

hasWordCloud: bool

whether the model has word cloud data available

eligibleForPrime: bool

(Deprecated in version v3.6) whether the model is eligible for Prime

hasParameters: bool

whether the model has parameters that can be retrieved

supportsCodeGeneration: bool

(New in version v2.18) whether the model supports code generation

supportsShap: bool
(New in version v2.18) True if the model supports Shapley package. i.e. Shapley based

feature Importance

supportsEarlyStopping: bool

(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_uri()
Returns:
urlstr

Permanent static hyperlink to this model at leaderboard.

Return type:

str

get_word_cloud(exclude_stop_words=False)

Retrieve word cloud data for the model.

Parameters:
exclude_stop_wordsbool, optional

Set to True if you want stopwords filtered out of response.

Returns:
WordCloud

Word cloud data for the model.

incremental_train(data_stage_id, training_data_name=None)

Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.

Return type:

ModelJob

classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)

Retrieve paginated model records, sorted by scores, with optional filtering.

Parameters:
sort_by_partition: str, one of `validation`, `backtesting`, `crossValidation` or `holdout`

Set the partition to use for sorted (by score) list of models. validation is the default.

sort_by_metric: str

Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.

with_metric: str

For a single-metric list of results, specify that project metric.

search_term: str

If specified, only models containing the term in their name or processes are returned.

featurelists: list of str

If specified, only models trained on selected featurelists are returned.

families: list of str

If specified, only models belonging to selected families are returned.

blueprints: list of str

If specified, only models trained on specified blueprint IDs are returned.

labels: list of str, `starred` or `prepared for deployment`

If specified, only models tagged with all listed labels are returned.

characteristics: list of str

If specified, only models matching all listed characteristics are returned.

training_filters: list of str

If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings

number_of_clusters: list of int

Filter models by number of clusters. Applicable only in unsupervised clustering projects.

limit: int
offset: int
Returns:
generic_models: list of GenericModel
Return type:

List[GenericModel]

open_in_browser()

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type:

None

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:
jobJob

the job generating the rulesets

request_cross_class_accuracy_scores()

Request data disparity insights to be computed for the model.

Returns:
status_idstr

A statusId of computation request.

request_data_disparity_insights(feature, compared_class_names)

Request data disparity insights to be computed for the model.

Parameters:
featurestr

Bias and Fairness protected feature name.

compared_class_nameslist(str)

List of two classes to compare

Returns:
status_idstr

A statusId of computation request.

request_external_test(dataset_id, actual_value_column=None)

Request external test to compute scores and insights on an external test dataset

Parameters:
dataset_idstring

The dataset to make predictions against (as uploaded from Project.upload_dataset)

actual_value_columnstring, optional

(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

Returns
——-
jobJob

a Job representing external dataset insights computation

request_fairness_insights(fairness_metrics_set=None)

Request fairness insights to be computed for the model.

Parameters:
fairness_metrics_setstr, optional

Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

Returns:
status_idstr

A statusId of computation request.

Return type:

str

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)

Train a new frozen model with parameters from this model.

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
training_row_countint, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_durationstr, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

training_start_datedatetime.datetime, optional

the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

training_end_datedatetime.datetime, optional

the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

time_window_sample_pctint, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

sampling_methodstr, optional

(New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

Returns:
model_jobModelJob

the modeling job training a frozen model

Return type:

ModelJob

request_per_class_fairness_insights(fairness_metrics_set=None)

Request per-class fairness insights be computed for the model.

Parameters:
fairness_metrics_setstr, optional

The fairness metric used to calculate the fairness scores. Value can be any one of <datarobot.enums.FairnessMetricsSet>.

Returns:
status_check_jobStatusCheckJob

The returned object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)

Requests predictions against a previously uploaded dataset.

Parameters:
dataset_idstring, optional

The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)

datasetDataset, optional

The dataset to make predictions against (as uploaded from Project.upload_dataset)

dataframepd.DataFrame, optional

(New in v3.0) The dataframe to make predictions against

file_pathstr, optional

(New in v3.0) Path to file to make predictions against

fileIOBase, optional

(New in v3.0) File to make predictions against

include_prediction_intervalsbool, optional

(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.

prediction_intervals_sizeint, optional

(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).

forecast_pointdatetime.datetime or None, optional

(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.

predictions_start_datedatetime.datetime or None, optional

(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.

predictions_end_datedatetime.datetime or None, optional

(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.

actual_value_columnstring, optional

(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the

response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).

max_explanations: (New in version v2.21) int optional; specifies the maximum number of

explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.

max_ngram_explanations: optional; int or str

(New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.

Returns:
jobPredictJob

The job computing the predictions

Return type:

PredictJob

request_residuals_chart(source, data_slice_id=None)

Request the model residuals chart for the specified source.

Parameters:
sourcestr

Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

data_slice_idstring, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
status_check_jobStatusCheckJob

Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

set_prediction_threshold(threshold)

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters:
thresholdfloat

only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:
AdvancedTuningSession

Session for setting up and running Advanced Tuning on a model

train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)

Trains this model on a different featurelist or sample size.

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
featurelist_idstr, optional

the featurelist to use to train the model. If not specified, the featurelist of this model is used.

training_row_countint, optional

the number of rows of data that should be used to train the model. If specified, neither training_duration nor use_project_settings may be specified.

training_durationstr, optional

a duration string specifying what time range the data used to train the model should span. If specified, neither training_row_count nor use_project_settings may be specified.

use_project_settingsbool, optional

(New in version v2.20) defaults to False. If True, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neither training_row_count nor training_duration may be specified.

time_window_sample_pctint, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

sampling_methodstr, optional

(New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

monotonic_increasing_featurelist_idstr, optional

(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_idstr, optional

(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

n_clusters: int, optional

(New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.

Returns:
jobModelJob

the created job to build the model

Return type:

ModelJob

train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters:
data_stage_id: str

The id of the data stage to use for training.

training_data_namestr, optional

The name of the iteration or data stage to indicate what the incremental learning was performed on.

data_stage_encodingstr, optional

The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252

data_stage_encodingstr, optional

The delimiter used by the data in the data stage (default: ‘,’).

data_stage_compressionstr, optional

The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip

Returns:
jobModelJob

The created job that is retraining the model

unstar_model()

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

Frozen Model

class datarobot.models.FrozenModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, is_n_clusters_dynamically_determined=None, blueprint_id=None, metrics=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, n_clusters=None, has_empty_clusters=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)

Represents a model tuned with parameters which are derived from another model

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Attributes:
idstr

the id of the model

project_idstr

the id of the project the model belongs to

processeslist of str

the processes used by the model

featurelist_namestr

the name of the featurelist used by the model

featurelist_idstr

the id of the featurelist used by the model

sample_pctfloat

the percentage of the project dataset used in training the model

training_row_countint or None

the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.

training_durationstr or None

only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

training_start_datedatetime or None

only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

training_end_datedatetime or None

only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

model_typestr

what model this is, e.g. ‘Nystroem Kernel SVM Regressor’

model_categorystr

what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models

is_frozenbool

whether this model is a frozen model

parent_model_idstr

the id of the model that tuning parameters are derived from

blueprint_idstr

the id of the blueprint used in this model

metricsdict

a mapping from each metric to the model’s scores for that metric

monotonic_increasing_featurelist_idstr

optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

monotonic_decreasing_featurelist_idstr

optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

supports_monotonic_constraintsbool

optional, whether this model supports enforcing monotonic constraints

is_starredbool

whether this model marked as starred

prediction_thresholdfloat

for binary classification projects, the threshold used for predictions

prediction_threshold_read_onlybool

indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

model_numberinteger

model number assigned to a model

supports_composable_mlbool or None

(New in version v2.26) whether this model is supported in the Composable ML.

classmethod get(project_id, model_id)

Retrieve a specific frozen model.

Parameters:
project_idstr

The project’s id.

model_idstr

The model_id of the leaderboard item to retrieve.

Returns:
modelFrozenModel

The queried instance.

RatingTableModel

class datarobot.models.RatingTableModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, rating_table_id=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)

A model that has a rating table.

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Attributes:
idstr

the id of the model

project_idstr

the id of the project the model belongs to

processeslist of str

the processes used by the model

featurelist_namestr

the name of the featurelist used by the model

featurelist_idstr

the id of the featurelist used by the model

sample_pctfloat or None

the percentage of the project dataset used in training the model. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date and training_end_date instead.

training_row_countint or None

the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.

training_durationstr or None

only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

training_start_datedatetime or None

only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

training_end_datedatetime or None

only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

model_typestr

what model this is, e.g. ‘Nystroem Kernel SVM Regressor’

model_categorystr

what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models

is_frozenbool

whether this model is a frozen model

blueprint_idstr

the id of the blueprint used in this model

metricsdict

a mapping from each metric to the model’s scores for that metric

rating_table_idstr

the id of the rating table that belongs to this model

monotonic_increasing_featurelist_idstr

optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

monotonic_decreasing_featurelist_idstr

optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

supports_monotonic_constraintsbool

optional, whether this model supports enforcing monotonic constraints

is_starredbool

whether this model marked as starred

prediction_thresholdfloat

for binary classification projects, the threshold used for predictions

prediction_threshold_read_onlybool

indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

model_numberinteger

model number assigned to a model

supports_composable_mlbool or None

(New in version v2.26) whether this model is supported in the Composable ML.

classmethod get(project_id, model_id)

Retrieve a specific rating table model

If the project does not have a rating table, a ClientError will occur.

Parameters:
project_idstr

the id of the project the model belongs to

model_idstr

the id of the model to retrieve

Returns:
modelRatingTableModel

the model

classmethod create_from_rating_table(project_id, rating_table_id)

Creates a new model from a validated rating table record. The RatingTable must not be associated with an existing model.

Parameters:
project_idstr

the id of the project the rating table belongs to

rating_table_idstr

the id of the rating table to create this model from

Returns:
job: Job

an instance of created async job

Raises:
ClientError (422)

Raised if creating model from a RatingTable that failed validation

JobAlreadyRequested

Raised if creating model from a RatingTable that is already associated with a RatingTableModel

Return type:

Job

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters:
paramsdict

Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

descriptionstr

Human-readable string describing the newly advanced-tuned model

Returns:
ModelJob

The created job to build the model

Return type:

ModelJob

cross_validate()

Run cross validation on the model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:
ModelJob

The created job to build the model

delete()

Delete a model from the project’s leaderboard.

Return type:

None

download_scoring_code(file_name, source_code=False)

Download the Scoring Code JAR.

Parameters:
file_namestr

File path where scoring code will be saved.

source_codebool, optional

Set to True to download source code archive. It will not be executable.

Return type:

None

download_training_artifact(file_name)

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters:
file_namestr

File path where trained model artifact(s) will be saved.

classmethod from_data(data)

Instantiate an object of this class using a dict.

Parameters:
datadict

Correctly snake_cased keys and their values.

Return type:

TypeVar(T, bound= APIObject)

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:
dict

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys

  • parameter_name : (str) name of the parameter (unique per task, see below)

  • parameter_id : (str) opaque ID string uniquely identifying parameter

  • default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)

  • current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.

  • task_name : (str) name of the task that this parameter belongs to

  • constraints: (dict) see the notes below

  • vertex_id: (str) ID of vertex that this parameter belongs to

Return type:

AdvancedTuningParamsType

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.

  • ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.

  • unicode: The parameter may be any Python unicode object.

  • int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].

  • float: The value may be an object of type float within the specified range (inclusive).

  • intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)

Retrieve a list of all confusion matrices available for the model.

Parameters:
fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of ConfusionChart

Data for all available confusion charts for model.

get_all_feature_impacts(data_slice_filter=None)

Retrieve a list of all feature impact results available for the model.

Parameters:
data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.

Returns:
list of dicts

Data for all available model feature impacts. Or an empty list if not data found.

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()
get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all Lift charts available for the model.

Parameters:
fallback_to_parent_insightsbool, optional

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

data_slice_filterDataSlice, optional

Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:
list of LiftChart

Data for all available model lift charts. Or an empty list if no data found.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()
get_all_multiclass_lift_charts(fallback_to_parent_insights=False)

Retrieve a list of all Lift charts available for the model.

Parameters:
fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of LiftChart

Data for all available model lift charts.

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all residuals charts available for the model.

Parameters:
fallback_to_parent_insightsbool

Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

data_slice_filterDataSlice, optional

Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:
list of ResidualsChart

Data for all available model residuals charts.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()
get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)

Retrieve a list of all ROC curves available for the model.

Parameters:
fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

data_slice_filterDataSlice, optional

filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns:
list of RocCurve

Data for all available model ROC curves. Or an empty list if no RocCurves are found.

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()
get_confusion_chart(source, fallback_to_parent_insights=False)

Retrieve a multiclass model’s confusion matrix for the specified source.

Parameters:
sourcestr

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
ConfusionChart

Model ConfusionChart data

Raises:
ClientError

If the insight is not available for this model

get_cross_class_accuracy_scores()

Retrieves a list of Cross Class Accuracy scores for the model.

Returns:
json
get_cross_validation_scores(partition=None, metric=None)

Return a dictionary, keyed by metric, showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:
partitionfloat

optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.

metric: unicode

optional name of the metric to filter to resulting cross validation scores by

Returns:
cross_validation_scores: dict

A dictionary keyed by metric showing cross validation scores per partition.

get_data_disparity_insights(feature, class_name1, class_name2)

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters:
featurestr

Bias and Fairness protected feature name.

class_name1str

One of the compared classes

class_name2str

Another compared class

Returns:
json
get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)

Retrieve a list of Per Class Bias insights for the model.

Parameters:
fairness_metrics_setstr, optional

Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

offsetint, optional

Number of items to skip.

limitint, optional

Number of items to return.

Returns:
json
get_feature_effect(source, data_slice_id=None)

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
sourcestring

The source Feature Effects are retrieved for.

data_slice_idstring, optional

ID for the data slice used in the request. If None, retrieve unsliced insight data.

Returns:
feature_effectsFeatureEffects

The feature effects data.

Raises:
ClientError (404)

If the feature effects have not been computed or source is not valid value.

get_feature_effect_metadata()

Retrieve Feature Effects metadata. Response contains status and available model sources.

  • Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.

  • When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.

  • Feature Effects for holdout is not available when holdout was not unlocked for the project.

Use source to retrieve Feature Effects, selecting one of the provided sources.

Returns:
feature_effect_metadata: FeatureEffectMetadata
get_feature_effects_multiclass(source='training', class_=None)

Retrieve Feature Effects for the multiclass model.

Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
sourcestr

The source Feature Effects are retrieved for.

class_str or None

The class name Feature Effects are retrieved for.

Returns:
list

The list of multiclass feature effects.

Raises:
ClientError (404)

If Feature Effects have not been computed or source is not valid value.

get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters:
with_metadatabool

The flag indicating if the result should include the metadata as well.

data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.

Returns:
list or dict

The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:

  • featureImpacts - Feature Impact data as a dictionary. Each item is a dict with

    keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.

  • shapBased - A boolean that indicates whether Feature Impact was calculated using

    Shapley values.

  • ranRedundancyDetection - A boolean that indicates whether redundant feature

    identification was run while calculating this Feature Impact.

  • rowCount - An integer or None that indicates the number of rows that was used to

    calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.

  • count - An integer with the number of features under the featureImpacts.

Raises:
ClientError (404)

If the feature impacts have not been computed.

ValueError

If data_slice_filter passed as None

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:
featureslist of str

The names of the features used in the model.

Return type:

List[str]

get_frozen_child_models()

Retrieve the IDs for all models that are frozen from this model.

Returns:
A list of Models
get_labelwise_roc_curves(source, fallback_to_parent_insights=False)

Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .

Added in version v2.24.

Parameters:
sourcestr

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns:
list of LabelwiseRocCurve

Labelwise ROC Curve instances for source and all labels

Raises:
ClientError

If the insight is not available for this model

get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the model Lift chart for the specified source.

Parameters:
sourcestr

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.

fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

Returns:
LiftChart

Model lift chart data

Raises:
ClientError

If the insight is not available for this model

ValueError

If data_slice_filter passed as None

get_missing_report_info()

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns:
An iterable of MissingReportPerFeature

The queried model missing report, sorted by missing count (DESCENDING order).

get_model_blueprint_chart()

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns:
ModelBlueprintChart

The queried model blueprint chart.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:
list of BlueprintTaskDocument

All documents available for the model.

get_model_blueprint_json()

Get the blueprint json representation used by this model.

Returns:
BlueprintJson

Json representation of the blueprint stages.

Return type:

Dict[str, Tuple[List[str], List[str], str]]

get_multiclass_feature_impact()

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:
feature_impactslist of dict

The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:
ClientError (404)

If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False)

Retrieve model Lift chart for the specified source.

Parameters:
sourcestr

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
list of LiftChart

Model lift chart data for each saved target class

Raises:
ClientError

If the insight is not available for this model

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)

Retrieve model Lift charts for the specified source.

Added in version v2.24.

Parameters:
sourcestr

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
list of LiftChart

Model lift chart data for each saved target class

Raises:
ClientError

If the insight is not available for this model

get_num_iterations_trained()

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns:
projectId: str

id of project containing the model

modelId: str

id of the model

data: array

list of numEstimatorsItem objects, one for each modeling stage.

numEstimatorsItem will be of the form:
stage: str

indicates the modeling stage (for multi-stage models); None of single-stage models

numIterations: int

the number of estimators or iterations trained by the model

get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)

Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source.

Parameters:
sourcestring

The source Feature Effects are retrieved for.

max_waitint, optional

The maximum time to wait for a requested Feature Effect job to complete before erroring.

row_countint, optional

(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

data_slice_idstr, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
feature_effectsFeatureEffects

The Feature Effects data.

get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)

Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.

Parameters:
sourcestring

The source Feature Effects retrieve for.

class_str or None

The class name Feature Effects retrieve for.

row_countint

The number of rows from dataset to use for Feature Impact calculation.

top_n_featuresint or None

Number of top features (ranked by Feature Impact) used to calculate Feature Effects.

featureslist or None

The list of features used to calculate Feature Effects.

max_waitint, optional

The maximum time to wait for a requested Feature Effects job to complete before erroring.

Returns:
feature_effectslist of FeatureEffectsMulticlass

The list of multiclass feature effects data.

get_or_request_feature_impact(max_wait=600, **kwargs)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:
max_waitint, optional

The maximum time to wait for a requested feature impact job to complete before erroring

**kwargs

Arbitrary keyword arguments passed to request_feature_impact.

Returns:
feature_impactslist or dict

The feature impact data. See get_feature_impact for the exact schema.

get_parameters()

Retrieve model parameters.

Returns:
ModelParameters

Model parameters for this model.

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:
ParetoFront

Model ParetoFront data

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:
prime_eligibilitydict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve model residuals chart for the specified source.

Parameters:
sourcestr

Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insightsbool

Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.

data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.

Returns:
ResidualsChart

Model residuals chart data

Raises:
ClientError

If the insight is not available for this model

ValueError

If data_slice_filter passed as None

get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)

Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.

Parameters:
sourcestr

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.

fallback_to_parent_insightsbool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

data_slice_filterDataSlice, optional

A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns:
RocCurve

Model ROC curve data

Raises:
ClientError

If the insight is not available for this model

(New in version v3.0) TypeError

If the underlying project type is multilabel

ValueError

If data_slice_filter passed as None

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:
rulesetslist of Ruleset
Return type:

List[Ruleset]

get_supported_capabilities()

Retrieves a summary of the capabilities supported by a model.

Added in version v2.14.

Returns:
supportsBlending: bool

whether the model supports blending

supportsMonotonicConstraints: bool

whether the model supports monotonic constraints

hasWordCloud: bool

whether the model has word cloud data available

eligibleForPrime: bool

(Deprecated in version v3.6) whether the model is eligible for Prime

hasParameters: bool

whether the model has parameters that can be retrieved

supportsCodeGeneration: bool

(New in version v2.18) whether the model supports code generation

supportsShap: bool
(New in version v2.18) True if the model supports Shapley package. i.e. Shapley based

feature Importance

supportsEarlyStopping: bool

(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_uri()
Returns:
urlstr

Permanent static hyperlink to this model at leaderboard.

Return type:

str

get_word_cloud(exclude_stop_words=False)

Retrieve word cloud data for the model.

Parameters:
exclude_stop_wordsbool, optional

Set to True if you want stopwords filtered out of response.

Returns:
WordCloud

Word cloud data for the model.

incremental_train(data_stage_id, training_data_name=None)

Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.

Return type:

ModelJob

classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)

Retrieve paginated model records, sorted by scores, with optional filtering.

Parameters:
sort_by_partition: str, one of `validation`, `backtesting`, `crossValidation` or `holdout`

Set the partition to use for sorted (by score) list of models. validation is the default.

sort_by_metric: str

Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.

with_metric: str

For a single-metric list of results, specify that project metric.

search_term: str

If specified, only models containing the term in their name or processes are returned.

featurelists: list of str

If specified, only models trained on selected featurelists are returned.

families: list of str

If specified, only models belonging to selected families are returned.

blueprints: list of str

If specified, only models trained on specified blueprint IDs are returned.

labels: list of str, `starred` or `prepared for deployment`

If specified, only models tagged with all listed labels are returned.

characteristics: list of str

If specified, only models matching all listed characteristics are returned.

training_filters: list of str

If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings

number_of_clusters: list of int

Filter models by number of clusters. Applicable only in unsupervised clustering projects.

limit: int
offset: int
Returns:
generic_models: list of GenericModel
Return type:

List[GenericModel]

open_in_browser()

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type:

None

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:
jobJob

the job generating the rulesets

request_cross_class_accuracy_scores()

Request data disparity insights to be computed for the model.

Returns:
status_idstr

A statusId of computation request.

request_data_disparity_insights(feature, compared_class_names)

Request data disparity insights to be computed for the model.

Parameters:
featurestr

Bias and Fairness protected feature name.

compared_class_nameslist(str)

List of two classes to compare

Returns:
status_idstr

A statusId of computation request.

request_external_test(dataset_id, actual_value_column=None)

Request external test to compute scores and insights on an external test dataset

Parameters:
dataset_idstring

The dataset to make predictions against (as uploaded from Project.upload_dataset)

actual_value_columnstring, optional

(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

Returns
——-
jobJob

a Job representing external dataset insights computation

request_fairness_insights(fairness_metrics_set=None)

Request fairness insights to be computed for the model.

Parameters:
fairness_metrics_setstr, optional

Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

Returns:
status_idstr

A statusId of computation request.

Return type:

str

request_feature_effect(row_count=None, data_slice_id=None)

Submit request to compute Feature Effects for the model.

See get_feature_effect for more information on the result of the job.

Parameters:
row_countint

(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

data_slice_idstr, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
jobJob

A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature effect have already been requested.

request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)

Request Feature Effects computation for the multiclass model.

See get_feature_effect for more information on the result of the job.

Parameters:
row_countint

The number of rows from dataset to use for Feature Impact calculation.

top_n_featuresint or None

Number of top features (ranked by feature impact) used to calculate Feature Effects.

featureslist or None

The list of features used to calculate Feature Effects.

Returns:
jobJob

A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.

request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters:
row_countint, optional

The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.

with_metadatabool, optional

Flag indicating whether the result should include the metadata. If true, metadata is included.

data_slice_idstr, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
jobJob or status_id

Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)

Train a new frozen model with parameters from this model.

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
training_row_countint, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_durationstr, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

training_start_datedatetime.datetime, optional

the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

training_end_datedatetime.datetime, optional

the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

time_window_sample_pctint, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

sampling_methodstr, optional

(New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

Returns:
model_jobModelJob

the modeling job training a frozen model

Return type:

ModelJob

request_frozen_model(sample_pct=None, training_row_count=None)

Train a new frozen model with parameters from this model :rtype: ModelJob

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:
sample_pctfloat

optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.

training_row_countint

(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns:
model_jobModelJob

the modeling job training a frozen model

request_lift_chart(source, data_slice_id=None)

Request the model Lift Chart for the specified source.

Parameters:
sourcestr

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

data_slice_idstring, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
status_check_jobStatusCheckJob

Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_per_class_fairness_insights(fairness_metrics_set=None)

Request per-class fairness insights be computed for the model.

Parameters:
fairness_metrics_setstr, optional

The fairness metric used to calculate the fairness scores. Value can be any one of <datarobot.enums.FairnessMetricsSet>.

Returns:
status_check_jobStatusCheckJob

The returned object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)

Requests predictions against a previously uploaded dataset.

Parameters:
dataset_idstring, optional

The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)

datasetDataset, optional

The dataset to make predictions against (as uploaded from Project.upload_dataset)

dataframepd.DataFrame, optional

(New in v3.0) The dataframe to make predictions against

file_pathstr, optional

(New in v3.0) Path to file to make predictions against

fileIOBase, optional

(New in v3.0) File to make predictions against

include_prediction_intervalsbool, optional

(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.

prediction_intervals_sizeint, optional

(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).

forecast_pointdatetime.datetime or None, optional

(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.

predictions_start_datedatetime.datetime or None, optional

(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.

predictions_end_datedatetime.datetime or None, optional

(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.

actual_value_columnstring, optional

(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the

response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).

max_explanations: (New in version v2.21) int optional; specifies the maximum number of

explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.

max_ngram_explanations: optional; int or str

(New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.

Returns:
jobPredictJob

The job computing the predictions

Return type:

PredictJob

request_residuals_chart(source, data_slice_id=None)

Request the model residuals chart for the specified source.

Parameters:
sourcestr

Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

data_slice_idstring, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
status_check_jobStatusCheckJob

Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_roc_curve(source, data_slice_id=None)

Request the model Roc Curve for the specified source.

Parameters:
sourcestr

Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

data_slice_idstring, optional

ID for the data slice used in the request. If None, request unsliced insight data.

Returns:
status_check_jobStatusCheckJob

Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)

Start a job to build training predictions

Parameters:
data_subsetstr

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for

    models in datetime partitioned projects

  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for

    all data except training set. Not valid for models in datetime partitioned projects

  • dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only

  • dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading

    the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.

explanation_algorithmdr.enums.EXPLANATIONS_ALGORITHM

(New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).

max_explanationsint

(New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the max_explanations, the sum of remaining values will also be returned as shap_remaining_total. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored if explanation_algorithm is not set.

Returns:
Job

an instance of created async job

retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)

Submit a job to the queue to train a blender model.

Parameters:
sample_pct: float, optional

The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.

featurelist_idstr, optional

The featurelist id

training_row_countint, optional

The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.

n_clusters: int, optional

(new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.

Returns:
jobModelJob

The created job that is retraining the model

Return type:

ModelJob

set_prediction_threshold(threshold)

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters:
thresholdfloat

only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns:
AdvancedTuningSession

Session for setting up and running Advanced Tuning on a model

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class. :rtype: str

Note

For datetime partitioned projects, see train_datetime instead.

Parameters:
sample_pctfloat, optional

The amount of data to use for training, as a percentage of the project dataset from 0 to 100.

featurelist_idstr, optional

The identifier of the featurelist to use. If not defined, the featurelist of this model is used.

scoring_typestr, optional

Either validation or crossValidation (also dr.SCORING_TYPE.validation or dr.SCORING_TYPE.cross_validation). validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, crossValidation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.

training_row_countint, optional

The number of rows to use to train the requested model.

monotonic_increasing_featurelist_idstr

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_idstr

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:
model_job_idstr

id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Examples

project = Project.get('project-id')
model = Model.get('project-id', 'model-id')
model_job_id = model.train(training_row_count=project.max_train_rows)
train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)

Trains this model on a different featurelist or sample size.

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
featurelist_idstr, optional

the featurelist to use to train the model. If not specified, the featurelist of this model is used.

training_row_countint, optional

the number of rows of data that should be used to train the model. If specified, neither training_duration nor use_project_settings may be specified.

training_durationstr, optional

a duration string specifying what time range the data used to train the model should span. If specified, neither training_row_count nor use_project_settings may be specified.

use_project_settingsbool, optional

(New in version v2.20) defaults to False. If True, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neither training_row_count nor training_duration may be specified.

time_window_sample_pctint, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

sampling_methodstr, optional

(New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

monotonic_increasing_featurelist_idstr, optional

(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_idstr, optional

(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

n_clusters: int, optional

(New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.

Returns:
jobModelJob

the created job to build the model

Return type:

ModelJob

train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters:
data_stage_id: str

The id of the data stage to use for training.

training_data_namestr, optional

The name of the iteration or data stage to indicate what the incremental learning was performed on.

data_stage_encodingstr, optional

The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252

data_stage_encodingstr, optional

The delimiter used by the data in the data stage (default: ‘,’).

data_stage_compressionstr, optional

The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip

Returns:
jobModelJob

The created job that is retraining the model

unstar_model()

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type:

None

Combined Model

See API reference for Combined Model in Segmented Modeling API Reference