Models
GenericModel
- class datarobot.models.GenericModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, is_starred=None, model_family=None, model_number=None, parent_model_id=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, is_trained_into_validation=None, is_trained_into_holdout=None, number_of_clusters=None)
GenericModel [ModelRecord] is the object which is returned from /modelRecords list route. Contains most generic model information.
Model
- class datarobot.models.Model(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, is_n_clusters_dynamically_determined=None, blueprint_id=None, metrics=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, n_clusters=None, has_empty_clusters=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)
A model trained on a project’s dataset capable of making predictions.
All durations are specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. See datetime partitioned project documentation for more information on duration strings.- Attributes:
- idstr
ID of the model.
- project_idstr
ID of the project the model belongs to.
- processeslist of str
Processes used by the model.
- featurelist_namestr
Name of the featurelist used by the model.
- featurelist_idstr
ID of the featurelist used by the model.
- sample_pctfloat or None
Percentage of the project dataset used in model training. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date / training_end_date instead.
- training_row_countint or None
Number of rows of the project dataset used in model training. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date is used for training_row_count.
- training_durationstr or None
For datetime partitioned projects only. If specified, defines the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_datedatetime or None
For frozen models in datetime partitioned projects only. If specified, the start date of the data used to train the model.
- training_end_datedatetime or None
For frozen models in datetime partitioned projects only. If specified, the end date of the data used to train the model.
- model_typestr
Type of model, for example ‘Nystroem Kernel SVM Regressor’.
- model_categorystr
Category of model, for example ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models.
- is_frozenbool
Whether this model is a frozen model.
- is_n_clusters_dynamically_determinedbool
(New in version v2.27) Optional. Whether this model determines the number of clusters dynamically.
- blueprint_idstr
ID of the blueprint used to build this model.
- metricsdict
Mapping from each metric to the model’s score for that metric.
- monotonic_increasing_featurelist_idstr
Optional. ID of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_idstr
Optional. ID of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- n_clustersint
(New in version v2.27) Optional. Number of data clusters discovered by model.
- has_empty_clusters: bool
(New in version v2.27) Optional. Whether clustering model produces empty clusters.
- supports_monotonic_constraintsbool
Optional. Whether this model supports enforcing monotonic constraints.
- is_starredbool
Whether this model is marked as a starred model.
- prediction_thresholdfloat
Binary classification projects only. Threshold used for predictions.
- prediction_threshold_read_onlybool
Whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- model_numberinteger
Model number assigned to the model.
- parent_model_idstr or None
(New in version v2.20) ID of the model that tuning parameters are derived from.
- supports_composable_mlbool or None
(New in version v2.26) Whether this model is supported Composable ML.
- classmethod get(project, model_id)
Retrieve a specific model.
- Parameters:
- projectstr
Project ID.
- model_idstr
ID of the model to retrieve.
- Returns:
- modelModel
Queried instance.
- Raises:
- ValueError
passed
project
parameter value is of not supported type
- Return type:
- advanced_tune(params, description=None)
Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Parameters:
- paramsdict
Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- descriptionstr
Human-readable string describing the newly advanced-tuned model
- Returns:
- ModelJob
The created job to build the model
- Return type:
- cross_validate()
Run cross validation on the model.
Note
To perform Cross Validation on a new model with new parameters, use
train
instead.- Returns:
- ModelJob
The created job to build the model
- delete()
Delete a model from the project’s leaderboard.
- Return type:
None
- download_scoring_code(file_name, source_code=False)
Download the Scoring Code JAR.
- Parameters:
- file_namestr
File path where scoring code will be saved.
- source_codebool, optional
Set to True to download source code archive. It will not be executable.
- Return type:
None
- download_training_artifact(file_name)
Retrieve trained artifact(s) from a model containing one or more custom tasks.
Artifact(s) will be downloaded to the specified local filepath.
- Parameters:
- file_namestr
File path where trained model artifact(s) will be saved.
- classmethod from_data(data)
Instantiate an object of this class using a dict.
- Parameters:
- datadict
Correctly snake_cased keys and their values.
- Return type:
TypeVar
(T
, bound= APIObject)
- get_advanced_tuning_parameters()
Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns:
- dict
A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.
tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.
tuning_parameters is a list of a dicts, each has the following keys
parameter_name : (str) name of the parameter (unique per task, see below)
parameter_id : (str) opaque ID string uniquely identifying parameter
default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)
current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.
task_name : (str) name of the task that this parameter belongs to
constraints: (dict) see the notes below
vertex_id: (str) ID of vertex that this parameter belongs to
- Return type:
Notes
The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.
"constraints": { "select": { "values": [<list(basestring or number) : possible values>] }, "ascii": {}, "unicode": {}, "int": { "min": <int : minimum valid value>, "max": <int : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "float": { "min": <float : minimum valid value>, "max": <float : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "intList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <int : minimum valid value>, "max_val": <int : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "floatList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <float : minimum valid value>, "max_val": <float : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> } }
The keys have meaning as follows:
select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.
- get_all_confusion_charts(fallback_to_parent_insights=False)
Retrieve a list of all confusion matrices available for the model.
- Parameters:
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns:
- list of ConfusionChart
Data for all available confusion charts for model.
- get_all_feature_impacts(data_slice_filter=None)
Retrieve a list of all feature impact results available for the model.
- Parameters:
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.
- Returns:
- list of dicts
Data for all available model feature impacts. Or an empty list if not data found.
Examples
model = datarobot.Model(id='model-id', project_id='project-id') # Get feature impact insights for sliced data data_slice = datarobot.DataSlice(id='data-slice-id') sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get feature impact insights for unsliced data data_slice = datarobot.DataSlice() unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get all feature impact insights all_fi = model.get_all_feature_impacts()
- get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)
Retrieve a list of all Lift charts available for the model.
- Parameters:
- fallback_to_parent_insightsbool, optional
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns:
- list of LiftChart
Data for all available model lift charts. Or an empty list if no data found.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get lift chart insights for sliced data sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id') # Get lift chart insights for unsliced data unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True) # Get all lift chart insights all_lift_charts = model.get_all_lift_charts()
- get_all_multiclass_lift_charts(fallback_to_parent_insights=False)
Retrieve a list of all Lift charts available for the model.
- Parameters:
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns:
- list of LiftChart
Data for all available model lift charts.
- get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)
Retrieve a list of all residuals charts available for the model.
- Parameters:
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns:
- list of ResidualsChart
Data for all available model residuals charts.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get residuals chart insights for sliced data sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id') # Get residuals chart insights for unsliced data unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True) # Get all residuals chart insights all_residuals_charts = model.get_all_residuals_charts()
- get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)
Retrieve a list of all ROC curves available for the model.
- Parameters:
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns:
- list of RocCurve
Data for all available model ROC curves. Or an empty list if no RocCurves are found.
Examples
model = datarobot.Model.get('project-id', 'model-id') ds_filter=DataSlice(id='data-slice-id') # Get roc curve insights for sliced data sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get roc curve insights for unsliced data data_slice_filter=DataSlice(id=None) unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get all roc curve insights all_roc_curves = model.get_all_roc_curves()
- get_confusion_chart(source, fallback_to_parent_insights=False)
Retrieve a multiclass model’s confusion matrix for the specified source.
- Parameters:
- sourcestr
Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns:
- ConfusionChart
Model ConfusionChart data
- Raises:
- ClientError
If the insight is not available for this model
- get_cross_class_accuracy_scores()
Retrieves a list of Cross Class Accuracy scores for the model.
- Returns:
- json
- get_cross_validation_scores(partition=None, metric=None)
Return a dictionary, keyed by metric, showing cross validation scores per partition.
Cross Validation should already have been performed using
cross_validate
ortrain
.Note
Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.
- Parameters:
- partitionfloat
optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.
- metric: unicode
optional name of the metric to filter to resulting cross validation scores by
- Returns:
- cross_validation_scores: dict
A dictionary keyed by metric showing cross validation scores per partition.
- get_data_disparity_insights(feature, class_name1, class_name2)
Retrieve a list of Cross Class Data Disparity insights for the model.
- Parameters:
- featurestr
Bias and Fairness protected feature name.
- class_name1str
One of the compared classes
- class_name2str
Another compared class
- Returns:
- json
- get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)
Retrieve a list of Per Class Bias insights for the model.
- Parameters:
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- offsetint, optional
Number of items to skip.
- limitint, optional
Number of items to return.
- Returns:
- json
- get_feature_effect(source, data_slice_id=None)
Retrieve Feature Effects for the model.
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the available sources.- Parameters:
- sourcestring
The source Feature Effects are retrieved for.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, retrieve unsliced insight data.
- Returns:
- feature_effectsFeatureEffects
The feature effects data.
- Raises:
- ClientError (404)
If the feature effects have not been computed or source is not valid value.
- get_feature_effect_metadata()
Retrieve Feature Effects metadata. Response contains status and available model sources.
Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for the project.
Use source to retrieve Feature Effects, selecting one of the provided sources.
- Returns:
- feature_effect_metadata: FeatureEffectMetadata
- get_feature_effects_multiclass(source='training', class_=None)
Retrieve Feature Effects for the multiclass model.
Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the available sources.- Parameters:
- sourcestr
The source Feature Effects are retrieved for.
- class_str or None
The class name Feature Effects are retrieved for.
- Returns:
- list
The list of multiclass feature effects.
- Raises:
- ClientError (404)
If Feature Effects have not been computed or source is not valid value.
- get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.
Elsewhere this technique is sometimes called ‘Permutation Importance’.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Parameters:
- with_metadatabool
The flag indicating if the result should include the metadata as well.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.
- Returns:
- list or dict
The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.
Each List item is a dict with the keys
featureName
,impactNormalized
, andimpactUnnormalized
,redundantWith
andcount
.For dict response available keys are:
featureImpacts
- Feature Impact data as a dictionary. Each item is a dict withkeys:
featureName
,impactNormalized
, andimpactUnnormalized
, andredundantWith
.
shapBased
- A boolean that indicates whether Feature Impact was calculated usingShapley values.
ranRedundancyDetection
- A boolean that indicates whether redundant featureidentification was run while calculating this Feature Impact.
rowCount
- An integer or None that indicates the number of rows that was used tocalculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.
count
- An integer with the number of features under thefeatureImpacts
.
- Raises:
- ClientError (404)
If the feature impacts have not been computed.
- ValueError
If data_slice_filter passed as None
- get_features_used()
Query the server to determine which features were used.
Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.
- Returns:
- featureslist of str
The names of the features used in the model.
- Return type:
List
[str
]
- get_frozen_child_models()
Retrieve the IDs for all models that are frozen from this model.
- Returns:
- A list of Models
- get_labelwise_roc_curves(source, fallback_to_parent_insights=False)
Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .
Added in version v2.24.
- Parameters:
- sourcestr
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- Returns:
- list of
LabelwiseRocCurve
Labelwise ROC Curve instances for
source
and all labels
- list of
- Raises:
- ClientError
If the insight is not available for this model
- get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve the model Lift chart for the specified source.
- Parameters:
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- Returns:
- LiftChart
Model lift chart data
- Raises:
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- get_missing_report_info()
Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.
- Returns:
- An iterable of MissingReportPerFeature
The queried model missing report, sorted by missing count (DESCENDING order).
- get_model_blueprint_chart()
Retrieve a diagram that can be used to understand data flow in the blueprint.
- Returns:
- ModelBlueprintChart
The queried model blueprint chart.
- get_model_blueprint_documents()
Get documentation for tasks used in this model.
- Returns:
- list of BlueprintTaskDocument
All documents available for the model.
- get_model_blueprint_json()
Get the blueprint json representation used by this model.
- Returns:
- BlueprintJson
Json representation of the blueprint stages.
- Return type:
Dict
[str
,Tuple
[List
[str
],List
[str
],str
]]
- get_multiclass_feature_impact()
For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Returns:
- feature_impactslist of dict
The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
- Raises:
- ClientError (404)
If the multiclass feature impacts have not been computed.
- get_multiclass_lift_chart(source, fallback_to_parent_insights=False)
Retrieve model Lift chart for the specified source.
- Parameters:
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns:
- list of LiftChart
Model lift chart data for each saved target class
- Raises:
- ClientError
If the insight is not available for this model
- get_multilabel_lift_charts(source, fallback_to_parent_insights=False)
Retrieve model Lift charts for the specified source.
Added in version v2.24.
- Parameters:
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns:
- list of LiftChart
Model lift chart data for each saved target class
- Raises:
- ClientError
If the insight is not available for this model
- get_num_iterations_trained()
Retrieves the number of estimators trained by early-stopping tree-based models.
– versionadded:: v2.22
- Returns:
- projectId: str
id of project containing the model
- modelId: str
id of the model
- data: array
list of numEstimatorsItem objects, one for each modeling stage.
- numEstimatorsItem will be of the form:
- stage: str
indicates the modeling stage (for multi-stage models); None of single-stage models
- numIterations: int
the number of estimators or iterations trained by the model
- get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)
Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.
See
get_feature_effect_metadata
for retrieving information of source.- Parameters:
- sourcestring
The source Feature Effects are retrieved for.
- max_waitint, optional
The maximum time to wait for a requested Feature Effect job to complete before erroring.
- row_countint, optional
(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- feature_effectsFeatureEffects
The Feature Effects data.
- get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)
Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.
- Parameters:
- sourcestring
The source Feature Effects retrieve for.
- class_str or None
The class name Feature Effects retrieve for.
- row_countint
The number of rows from dataset to use for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
- featureslist or None
The list of features used to calculate Feature Effects.
- max_waitint, optional
The maximum time to wait for a requested Feature Effects job to complete before erroring.
- Returns:
- feature_effectslist of FeatureEffectsMulticlass
The list of multiclass feature effects data.
- get_or_request_feature_impact(max_wait=600, **kwargs)
Retrieve feature impact for the model, requesting a job if it hasn’t been run previously
- Parameters:
- max_waitint, optional
The maximum time to wait for a requested feature impact job to complete before erroring
- **kwargs
Arbitrary keyword arguments passed to
request_feature_impact
.
- Returns:
- feature_impactslist or dict
The feature impact data. See
get_feature_impact
for the exact schema.
- get_parameters()
Retrieve model parameters.
- Returns:
- ModelParameters
Model parameters for this model.
- get_pareto_front()
Retrieve the Pareto Front for a Eureqa model.
This method is only supported for Eureqa models.
- Returns:
- ParetoFront
Model ParetoFront data
- get_prime_eligibility()
Check if this model can be approximated with DataRobot Prime
- Returns:
- prime_eligibilitydict
a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
- get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve model residuals chart for the specified source.
- Parameters:
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.
- Returns:
- ResidualsChart
Model residuals chart data
- Raises:
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.
- Parameters:
- sourcestr
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
- Returns:
- RocCurve
Model ROC curve data
- Raises:
- ClientError
If the insight is not available for this model
- (New in version v3.0) TypeError
If the underlying project type is multilabel
- ValueError
If data_slice_filter passed as None
- get_rulesets()
List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.
- Returns:
- rulesetslist of Ruleset
- Return type:
List
[Ruleset
]
- get_supported_capabilities()
Retrieves a summary of the capabilities supported by a model.
Added in version v2.14.
- Returns:
- supportsBlending: bool
whether the model supports blending
- supportsMonotonicConstraints: bool
whether the model supports monotonic constraints
- hasWordCloud: bool
whether the model has word cloud data available
- eligibleForPrime: bool
(Deprecated in version v3.6) whether the model is eligible for Prime
- hasParameters: bool
whether the model has parameters that can be retrieved
- supportsCodeGeneration: bool
(New in version v2.18) whether the model supports code generation
- supportsShap: bool
- (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based
feature Importance
- supportsEarlyStopping: bool
(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.
- get_uri()
- Returns:
- urlstr
Permanent static hyperlink to this model at leaderboard.
- Return type:
str
- get_word_cloud(exclude_stop_words=False)
Retrieve word cloud data for the model.
- Parameters:
- exclude_stop_wordsbool, optional
Set to True if you want stopwords filtered out of response.
- Returns:
- WordCloud
Word cloud data for the model.
- incremental_train(data_stage_id, training_data_name=None)
Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.
- Return type:
- classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)
Retrieve paginated model records, sorted by scores, with optional filtering.
- Parameters:
- sort_by_partition: str, one of `validation`, `backtesting`, `crossValidation` or `holdout`
Set the partition to use for sorted (by score) list of models. validation is the default.
- sort_by_metric: str
Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.
- with_metric: str
For a single-metric list of results, specify that project metric.
- search_term: str
If specified, only models containing the term in their name or processes are returned.
- featurelists: list of str
If specified, only models trained on selected featurelists are returned.
- families: list of str
If specified, only models belonging to selected families are returned.
- blueprints: list of str
If specified, only models trained on specified blueprint IDs are returned.
- labels: list of str, `starred` or `prepared for deployment`
If specified, only models tagged with all listed labels are returned.
- characteristics: list of str
If specified, only models matching all listed characteristics are returned.
- training_filters: list of str
If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings
- number_of_clusters: list of int
Filter models by number of clusters. Applicable only in unsupervised clustering projects.
- limit: int
- offset: int
- Returns:
- generic_models: list of GenericModel
- Return type:
List
[GenericModel
]
- open_in_browser()
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type:
None
- request_approximation()
Request an approximation of this model using DataRobot Prime
This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.
- Returns:
- jobJob
the job generating the rulesets
- request_cross_class_accuracy_scores()
Request data disparity insights to be computed for the model.
- Returns:
- status_idstr
A statusId of computation request.
- request_data_disparity_insights(feature, compared_class_names)
Request data disparity insights to be computed for the model.
- Parameters:
- featurestr
Bias and Fairness protected feature name.
- compared_class_nameslist(str)
List of two classes to compare
- Returns:
- status_idstr
A statusId of computation request.
- request_external_test(dataset_id, actual_value_column=None)
Request external test to compute scores and insights on an external test dataset
- Parameters:
- dataset_idstring
The dataset to make predictions against (as uploaded from Project.upload_dataset)
- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- Returns
- ——-
- jobJob
a Job representing external dataset insights computation
- request_fairness_insights(fairness_metrics_set=None)
Request fairness insights to be computed for the model.
- Parameters:
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- Returns:
- status_idstr
A statusId of computation request.
- Return type:
str
- request_feature_effect(row_count=None, data_slice_id=None)
Submit request to compute Feature Effects for the model.
See
get_feature_effect
for more information on the result of the job.- Parameters:
- row_countint
(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- jobJob
A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
- Raises:
- JobAlreadyRequested (422)
If the feature effect have already been requested.
- request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)
Request Feature Effects computation for the multiclass model.
See
get_feature_effect
for more information on the result of the job.- Parameters:
- row_countint
The number of rows from dataset to use for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by feature impact) used to calculate Feature Effects.
- featureslist or None
The list of features used to calculate Feature Effects.
- Returns:
- jobJob
A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.
- request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)
Request feature impacts to be computed for the model.
See
get_feature_impact
for more information on the result of the job.- Parameters:
- row_countint, optional
The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.
- with_metadatabool, optional
Flag indicating whether the result should include the metadata. If true, metadata is included.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- jobJob or status_id
Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.
- Raises:
- JobAlreadyRequested (422)
If the feature impacts have already been requested.
- request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)
Train a new frozen model with parameters from this model.
Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.
Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.
Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Parameters:
- training_row_countint, optional
the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
- training_durationstr, optional
a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
- training_start_datedatetime.datetime, optional
the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
- training_end_datedatetime.datetime, optional
the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
- time_window_sample_pctint, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_methodstr, optional
(New in version v2.23) defines the way training data is selected. Can be either
random
orlatest
. In combination withtraining_row_count
defines how rows are selected from backtest (latest
by default). When training data is defined using time range (training_duration
oruse_project_settings
) this setting changes the waytime_window_sample_pct
is applied (random
by default). Applicable to OTV projects only.
- Returns:
- model_jobModelJob
the modeling job training a frozen model
- Return type:
- request_frozen_model(sample_pct=None, training_row_count=None)
Train a new frozen model with parameters from this model :rtype:
ModelJob
Note
This method only works if project the model belongs to is not datetime partitioned. If it is, use
request_frozen_datetime_model
instead.Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
- Parameters:
- sample_pctfloat
optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.
- training_row_countint
(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.
- Returns:
- model_jobModelJob
the modeling job training a frozen model
- request_lift_chart(source, data_slice_id=None)
Request the model Lift Chart for the specified source.
- Parameters:
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type:
- request_per_class_fairness_insights(fairness_metrics_set=None)
Request per-class fairness insights be computed for the model.
- Parameters:
- fairness_metrics_setstr, optional
The fairness metric used to calculate the fairness scores. Value can be any one of <datarobot.enums.FairnessMetricsSet>.
- Returns:
- status_check_jobStatusCheckJob
The returned object contains all needed logic for a periodical status check of an async job.
- Return type:
- request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)
Requests predictions against a previously uploaded dataset.
- Parameters:
- dataset_idstring, optional
The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataset
Dataset
, optional The dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataframepd.DataFrame, optional
(New in v3.0) The dataframe to make predictions against
- file_pathstr, optional
(New in v3.0) Path to file to make predictions against
- fileIOBase, optional
(New in v3.0) File to make predictions against
- include_prediction_intervalsbool, optional
(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_sizeint, optional
(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_pointdatetime.datetime or None, optional
(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the
response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
- max_explanations: (New in version v2.21) int optional; specifies the maximum number of
explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
- max_ngram_explanations: optional; int or str
(New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.
- Returns:
- jobPredictJob
The job computing the predictions
- Return type:
- request_residuals_chart(source, data_slice_id=None)
Request the model residuals chart for the specified source.
- Parameters:
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type:
- request_roc_curve(source, data_slice_id=None)
Request the model Roc Curve for the specified source.
- Parameters:
- sourcestr
Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type:
- request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)
Start a job to build training predictions
- Parameters:
- data_subsetstr
data set definition to build predictions on. Choices are:
- dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
models in datetime partitioned projects
- dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
all data except training set. Not valid for models in datetime partitioned projects
dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
- dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
- explanation_algorithmdr.enums.EXPLANATIONS_ALGORITHM
(New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).
- max_explanationsint
(New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the
max_explanations
, the sum of remaining values will also be returned asshap_remaining_total
. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored ifexplanation_algorithm
is not set.
- Returns:
- Job
an instance of created async job
- retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)
Submit a job to the queue to train a blender model.
- Parameters:
- sample_pct: float, optional
The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
- featurelist_idstr, optional
The featurelist id
- training_row_countint, optional
The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.
- n_clusters: int, optional
(new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.
- Returns:
- jobModelJob
The created job that is retraining the model
- Return type:
- set_prediction_threshold(threshold)
Set a custom prediction threshold for the model.
May not be used once
prediction_threshold_read_only
is True for this model.- Parameters:
- thresholdfloat
only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).
- star_model()
Mark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type:
None
- start_advanced_tuning_session()
Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns:
- AdvancedTuningSession
Session for setting up and running Advanced Tuning on a model
- train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)
Train the blueprint used in model on a particular featurelist or amount of data.
This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.
Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.
In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class. :rtype:
str
Note
For datetime partitioned projects, see
train_datetime
instead.- Parameters:
- sample_pctfloat, optional
The amount of data to use for training, as a percentage of the project dataset from 0 to 100.
- featurelist_idstr, optional
The identifier of the featurelist to use. If not defined, the featurelist of this model is used.
- scoring_typestr, optional
Either
validation
orcrossValidation
(alsodr.SCORING_TYPE.validation
ordr.SCORING_TYPE.cross_validation
).validation
is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning,crossValidation
can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.- training_row_countint, optional
The number of rows to use to train the requested model.
- monotonic_increasing_featurelist_idstr
(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_idstr
(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.
- Returns:
- model_job_idstr
id of created job, can be used as parameter to
ModelJob.get
method orwait_for_async_model_creation
function
Examples
project = Project.get('project-id') model = Model.get('project-id', 'model-id') model_job_id = model.train(training_row_count=project.max_train_rows)
- train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)
Trains this model on a different featurelist or sample size.
Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Parameters:
- featurelist_idstr, optional
the featurelist to use to train the model. If not specified, the featurelist of this model is used.
- training_row_countint, optional
the number of rows of data that should be used to train the model. If specified, neither
training_duration
noruse_project_settings
may be specified.- training_durationstr, optional
a duration string specifying what time range the data used to train the model should span. If specified, neither
training_row_count
noruse_project_settings
may be specified.- use_project_settingsbool, optional
(New in version v2.20) defaults to
False
. IfTrue
, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neithertraining_row_count
nortraining_duration
may be specified.- time_window_sample_pctint, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_methodstr, optional
(New in version v2.23) defines the way training data is selected. Can be either
random
orlatest
. In combination withtraining_row_count
defines how rows are selected from backtest (latest
by default). When training data is defined using time range (training_duration
oruse_project_settings
) this setting changes the waytime_window_sample_pct
is applied (random
by default). Applicable to OTV projects only.- monotonic_increasing_featurelist_idstr, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_idstr, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- n_clusters: int, optional
(New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.
- Returns:
- jobModelJob
the created job to build the model
- Return type:
- train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)
Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.
This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.
- Parameters:
- data_stage_id: str
The id of the data stage to use for training.
- training_data_namestr, optional
The name of the iteration or data stage to indicate what the incremental learning was performed on.
- data_stage_encodingstr, optional
The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252
- data_stage_encodingstr, optional
The delimiter used by the data in the data stage (default: ‘,’).
- data_stage_compressionstr, optional
The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip
- Returns:
- jobModelJob
The created job that is retraining the model
- unstar_model()
Unmark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type:
None
- class datarobot.models.model.AdvancedTuningParamsType(*args, **kwargs)
- class datarobot.models.model.BiasMitigationFeatureInfo(messages)
PrimeModel
- class datarobot.models.PrimeModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, ruleset_id=None, rule_count=None, score=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)
Represents a DataRobot Prime model approximating a parent model with downloadable code.
All durations are specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Attributes:
- idstr
the id of the model
- project_idstr
the id of the project the model belongs to
- processeslist of str
the processes used by the model
- featurelist_namestr
the name of the featurelist used by the model
- featurelist_idstr
the id of the featurelist used by the model
- sample_pctfloat
the percentage of the project dataset used in training the model
- training_row_countint or None
the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
- training_durationstr or None
only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- model_typestr
what model this is, e.g. ‘DataRobot Prime’
- model_categorystr
what kind of model this is - always ‘prime’ for DataRobot Prime models
- is_frozenbool
whether this model is a frozen model
- blueprint_idstr
the id of the blueprint used in this model
- metricsdict
a mapping from each metric to the model’s scores for that metric
- rulesetRuleset
the ruleset used in the Prime model
- parent_model_idstr
the id of the model that this Prime model approximates
- monotonic_increasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraintsbool
optional, whether this model supports enforcing monotonic constraints
- is_starredbool
whether this model is marked as starred
- prediction_thresholdfloat
for binary classification projects, the threshold used for predictions
- prediction_threshold_read_onlybool
indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- supports_composable_mlbool or None
(New in version v2.26) whether this model is supported in the Composable ML.
- classmethod get(project_id, model_id)
Retrieve a specific prime model.
- Parameters:
- project_idstr
The id of the project the prime model belongs to
- model_idstr
The
model_id
of the prime model to retrieve.
- Returns:
- modelPrimeModel
The queried instance.
- request_download_validation(language)
Prep and validate the downloadable code for the ruleset associated with this model.
- Parameters:
- languagestr
the language the code should be downloaded in - see
datarobot.enums.PRIME_LANGUAGE
for available languages
- Returns:
- jobJob
A job tracking the code preparation and validation
- advanced_tune(params, description=None)
Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Parameters:
- paramsdict
Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- descriptionstr
Human-readable string describing the newly advanced-tuned model
- Returns:
- ModelJob
The created job to build the model
- Return type:
- cross_validate()
Run cross validation on the model.
Note
To perform Cross Validation on a new model with new parameters, use
train
instead.- Returns:
- ModelJob
The created job to build the model
- delete()
Delete a model from the project’s leaderboard.
- Return type:
None
- download_scoring_code(file_name, source_code=False)
Download the Scoring Code JAR.
- Parameters:
- file_namestr
File path where scoring code will be saved.
- source_codebool, optional
Set to True to download source code archive. It will not be executable.
- Return type:
None
- download_training_artifact(file_name)
Retrieve trained artifact(s) from a model containing one or more custom tasks.
Artifact(s) will be downloaded to the specified local filepath.
- Parameters:
- file_namestr
File path where trained model artifact(s) will be saved.
- classmethod from_data(data)
Instantiate an object of this class using a dict.
- Parameters:
- datadict
Correctly snake_cased keys and their values.
- Return type:
TypeVar
(T
, bound= APIObject)
- get_advanced_tuning_parameters()
Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns:
- dict
A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.
tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.
tuning_parameters is a list of a dicts, each has the following keys
parameter_name : (str) name of the parameter (unique per task, see below)
parameter_id : (str) opaque ID string uniquely identifying parameter
default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)
current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.
task_name : (str) name of the task that this parameter belongs to
constraints: (dict) see the notes below
vertex_id: (str) ID of vertex that this parameter belongs to
- Return type:
Notes
The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.
"constraints": { "select": { "values": [<list(basestring or number) : possible values>] }, "ascii": {}, "unicode": {}, "int": { "min": <int : minimum valid value>, "max": <int : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "float": { "min": <float : minimum valid value>, "max": <float : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "intList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <int : minimum valid value>, "max_val": <int : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "floatList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <float : minimum valid value>, "max_val": <float : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> } }
The keys have meaning as follows:
select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.
- get_all_confusion_charts(fallback_to_parent_insights=False)
Retrieve a list of all confusion matrices available for the model.
- Parameters:
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns:
- list of ConfusionChart
Data for all available confusion charts for model.
- get_all_feature_impacts(data_slice_filter=None)
Retrieve a list of all feature impact results available for the model.
- Parameters:
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.
- Returns:
- list of dicts
Data for all available model feature impacts. Or an empty list if not data found.
Examples
model = datarobot.Model(id='model-id', project_id='project-id') # Get feature impact insights for sliced data data_slice = datarobot.DataSlice(id='data-slice-id') sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get feature impact insights for unsliced data data_slice = datarobot.DataSlice() unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get all feature impact insights all_fi = model.get_all_feature_impacts()
- get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)
Retrieve a list of all Lift charts available for the model.
- Parameters:
- fallback_to_parent_insightsbool, optional
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns:
- list of LiftChart
Data for all available model lift charts. Or an empty list if no data found.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get lift chart insights for sliced data sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id') # Get lift chart insights for unsliced data unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True) # Get all lift chart insights all_lift_charts = model.get_all_lift_charts()
- get_all_multiclass_lift_charts(fallback_to_parent_insights=False)
Retrieve a list of all Lift charts available for the model.
- Parameters:
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns:
- list of LiftChart
Data for all available model lift charts.
- get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)
Retrieve a list of all residuals charts available for the model.
- Parameters:
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns:
- list of ResidualsChart
Data for all available model residuals charts.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get residuals chart insights for sliced data sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id') # Get residuals chart insights for unsliced data unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True) # Get all residuals chart insights all_residuals_charts = model.get_all_residuals_charts()
- get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)
Retrieve a list of all ROC curves available for the model.
- Parameters:
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns:
- list of RocCurve
Data for all available model ROC curves. Or an empty list if no RocCurves are found.
Examples
model = datarobot.Model.get('project-id', 'model-id') ds_filter=DataSlice(id='data-slice-id') # Get roc curve insights for sliced data sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get roc curve insights for unsliced data data_slice_filter=DataSlice(id=None) unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get all roc curve insights all_roc_curves = model.get_all_roc_curves()
- get_confusion_chart(source, fallback_to_parent_insights=False)
Retrieve a multiclass model’s confusion matrix for the specified source.
- Parameters:
- sourcestr
Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns:
- ConfusionChart
Model ConfusionChart data
- Raises:
- ClientError
If the insight is not available for this model
- get_cross_class_accuracy_scores()
Retrieves a list of Cross Class Accuracy scores for the model.
- Returns:
- json
- get_cross_validation_scores(partition=None, metric=None)
Return a dictionary, keyed by metric, showing cross validation scores per partition.
Cross Validation should already have been performed using
cross_validate
ortrain
.Note
Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.
- Parameters:
- partitionfloat
optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.
- metric: unicode
optional name of the metric to filter to resulting cross validation scores by
- Returns:
- cross_validation_scores: dict
A dictionary keyed by metric showing cross validation scores per partition.
- get_data_disparity_insights(feature, class_name1, class_name2)
Retrieve a list of Cross Class Data Disparity insights for the model.
- Parameters:
- featurestr
Bias and Fairness protected feature name.
- class_name1str
One of the compared classes
- class_name2str
Another compared class
- Returns:
- json
- get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)
Retrieve a list of Per Class Bias insights for the model.
- Parameters:
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- offsetint, optional
Number of items to skip.
- limitint, optional
Number of items to return.
- Returns:
- json
- get_feature_effect(source, data_slice_id=None)
Retrieve Feature Effects for the model.
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the available sources.- Parameters:
- sourcestring
The source Feature Effects are retrieved for.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, retrieve unsliced insight data.
- Returns:
- feature_effectsFeatureEffects
The feature effects data.
- Raises:
- ClientError (404)
If the feature effects have not been computed or source is not valid value.
- get_feature_effect_metadata()
Retrieve Feature Effects metadata. Response contains status and available model sources.
Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for the project.
Use source to retrieve Feature Effects, selecting one of the provided sources.
- Returns:
- feature_effect_metadata: FeatureEffectMetadata
- get_feature_effects_multiclass(source='training', class_=None)
Retrieve Feature Effects for the multiclass model.
Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the available sources.- Parameters:
- sourcestr
The source Feature Effects are retrieved for.
- class_str or None
The class name Feature Effects are retrieved for.
- Returns:
- list
The list of multiclass feature effects.
- Raises:
- ClientError (404)
If Feature Effects have not been computed or source is not valid value.
- get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.
Elsewhere this technique is sometimes called ‘Permutation Importance’.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Parameters:
- with_metadatabool
The flag indicating if the result should include the metadata as well.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.
- Returns:
- list or dict
The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.
Each List item is a dict with the keys
featureName
,impactNormalized
, andimpactUnnormalized
,redundantWith
andcount
.For dict response available keys are:
featureImpacts
- Feature Impact data as a dictionary. Each item is a dict withkeys:
featureName
,impactNormalized
, andimpactUnnormalized
, andredundantWith
.
shapBased
- A boolean that indicates whether Feature Impact was calculated usingShapley values.
ranRedundancyDetection
- A boolean that indicates whether redundant featureidentification was run while calculating this Feature Impact.
rowCount
- An integer or None that indicates the number of rows that was used tocalculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.
count
- An integer with the number of features under thefeatureImpacts
.
- Raises:
- ClientError (404)
If the feature impacts have not been computed.
- ValueError
If data_slice_filter passed as None
- get_features_used()
Query the server to determine which features were used.
Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.
- Returns:
- featureslist of str
The names of the features used in the model.
- Return type:
List
[str
]
- get_frozen_child_models()
Retrieve the IDs for all models that are frozen from this model.
- Returns:
- A list of Models
- get_labelwise_roc_curves(source, fallback_to_parent_insights=False)
Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .
Added in version v2.24.
- Parameters:
- sourcestr
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- Returns:
- list of
LabelwiseRocCurve
Labelwise ROC Curve instances for
source
and all labels
- list of
- Raises:
- ClientError
If the insight is not available for this model
- get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve the model Lift chart for the specified source.
- Parameters:
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- Returns:
- LiftChart
Model lift chart data
- Raises:
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- get_missing_report_info()
Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.
- Returns:
- An iterable of MissingReportPerFeature
The queried model missing report, sorted by missing count (DESCENDING order).
- get_model_blueprint_chart()
Retrieve a diagram that can be used to understand data flow in the blueprint.
- Returns:
- ModelBlueprintChart
The queried model blueprint chart.
- get_model_blueprint_documents()
Get documentation for tasks used in this model.
- Returns:
- list of BlueprintTaskDocument
All documents available for the model.
- get_model_blueprint_json()
Get the blueprint json representation used by this model.
- Returns:
- BlueprintJson
Json representation of the blueprint stages.
- Return type:
Dict
[str
,Tuple
[List
[str
],List
[str
],str
]]
- get_multiclass_feature_impact()
For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Returns:
- feature_impactslist of dict
The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
- Raises:
- ClientError (404)
If the multiclass feature impacts have not been computed.
- get_multiclass_lift_chart(source, fallback_to_parent_insights=False)
Retrieve model Lift chart for the specified source.
- Parameters:
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns:
- list of LiftChart
Model lift chart data for each saved target class
- Raises:
- ClientError
If the insight is not available for this model
- get_multilabel_lift_charts(source, fallback_to_parent_insights=False)
Retrieve model Lift charts for the specified source.
Added in version v2.24.
- Parameters:
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns:
- list of LiftChart
Model lift chart data for each saved target class
- Raises:
- ClientError
If the insight is not available for this model
- get_num_iterations_trained()
Retrieves the number of estimators trained by early-stopping tree-based models.
– versionadded:: v2.22
- Returns:
- projectId: str
id of project containing the model
- modelId: str
id of the model
- data: array
list of numEstimatorsItem objects, one for each modeling stage.
- numEstimatorsItem will be of the form:
- stage: str
indicates the modeling stage (for multi-stage models); None of single-stage models
- numIterations: int
the number of estimators or iterations trained by the model
- get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)
Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.
See
get_feature_effect_metadata
for retrieving information of source.- Parameters:
- sourcestring
The source Feature Effects are retrieved for.
- max_waitint, optional
The maximum time to wait for a requested Feature Effect job to complete before erroring.
- row_countint, optional
(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- feature_effectsFeatureEffects
The Feature Effects data.
- get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)
Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.
- Parameters:
- sourcestring
The source Feature Effects retrieve for.
- class_str or None
The class name Feature Effects retrieve for.
- row_countint
The number of rows from dataset to use for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
- featureslist or None
The list of features used to calculate Feature Effects.
- max_waitint, optional
The maximum time to wait for a requested Feature Effects job to complete before erroring.
- Returns:
- feature_effectslist of FeatureEffectsMulticlass
The list of multiclass feature effects data.
- get_or_request_feature_impact(max_wait=600, **kwargs)
Retrieve feature impact for the model, requesting a job if it hasn’t been run previously
- Parameters:
- max_waitint, optional
The maximum time to wait for a requested feature impact job to complete before erroring
- **kwargs
Arbitrary keyword arguments passed to
request_feature_impact
.
- Returns:
- feature_impactslist or dict
The feature impact data. See
get_feature_impact
for the exact schema.
- get_parameters()
Retrieve model parameters.
- Returns:
- ModelParameters
Model parameters for this model.
- get_pareto_front()
Retrieve the Pareto Front for a Eureqa model.
This method is only supported for Eureqa models.
- Returns:
- ParetoFront
Model ParetoFront data
- get_prime_eligibility()
Check if this model can be approximated with DataRobot Prime
- Returns:
- prime_eligibilitydict
a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
- get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve model residuals chart for the specified source.
- Parameters:
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.
- Returns:
- ResidualsChart
Model residuals chart data
- Raises:
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.
- Parameters:
- sourcestr
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
- Returns:
- RocCurve
Model ROC curve data
- Raises:
- ClientError
If the insight is not available for this model
- (New in version v3.0) TypeError
If the underlying project type is multilabel
- ValueError
If data_slice_filter passed as None
- get_rulesets()
List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.
- Returns:
- rulesetslist of Ruleset
- Return type:
List
[Ruleset
]
- get_supported_capabilities()
Retrieves a summary of the capabilities supported by a model.
Added in version v2.14.
- Returns:
- supportsBlending: bool
whether the model supports blending
- supportsMonotonicConstraints: bool
whether the model supports monotonic constraints
- hasWordCloud: bool
whether the model has word cloud data available
- eligibleForPrime: bool
(Deprecated in version v3.6) whether the model is eligible for Prime
- hasParameters: bool
whether the model has parameters that can be retrieved
- supportsCodeGeneration: bool
(New in version v2.18) whether the model supports code generation
- supportsShap: bool
- (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based
feature Importance
- supportsEarlyStopping: bool
(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.
- get_uri()
- Returns:
- urlstr
Permanent static hyperlink to this model at leaderboard.
- Return type:
str
- get_word_cloud(exclude_stop_words=False)
Retrieve word cloud data for the model.
- Parameters:
- exclude_stop_wordsbool, optional
Set to True if you want stopwords filtered out of response.
- Returns:
- WordCloud
Word cloud data for the model.
- incremental_train(data_stage_id, training_data_name=None)
Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.
- Return type:
- classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)
Retrieve paginated model records, sorted by scores, with optional filtering.
- Parameters:
- sort_by_partition: str, one of `validation`, `backtesting`, `crossValidation` or `holdout`
Set the partition to use for sorted (by score) list of models. validation is the default.
- sort_by_metric: str
Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.
- with_metric: str
For a single-metric list of results, specify that project metric.
- search_term: str
If specified, only models containing the term in their name or processes are returned.
- featurelists: list of str
If specified, only models trained on selected featurelists are returned.
- families: list of str
If specified, only models belonging to selected families are returned.
- blueprints: list of str
If specified, only models trained on specified blueprint IDs are returned.
- labels: list of str, `starred` or `prepared for deployment`
If specified, only models tagged with all listed labels are returned.
- characteristics: list of str
If specified, only models matching all listed characteristics are returned.
- training_filters: list of str
If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings
- number_of_clusters: list of int
Filter models by number of clusters. Applicable only in unsupervised clustering projects.
- limit: int
- offset: int
- Returns:
- generic_models: list of GenericModel
- Return type:
List
[GenericModel
]
- open_in_browser()
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type:
None
- request_cross_class_accuracy_scores()
Request data disparity insights to be computed for the model.
- Returns:
- status_idstr
A statusId of computation request.
- request_data_disparity_insights(feature, compared_class_names)
Request data disparity insights to be computed for the model.
- Parameters:
- featurestr
Bias and Fairness protected feature name.
- compared_class_nameslist(str)
List of two classes to compare
- Returns:
- status_idstr
A statusId of computation request.
- request_external_test(dataset_id, actual_value_column=None)
Request external test to compute scores and insights on an external test dataset
- Parameters:
- dataset_idstring
The dataset to make predictions against (as uploaded from Project.upload_dataset)
- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- Returns
- ——-
- jobJob
a Job representing external dataset insights computation
- request_fairness_insights(fairness_metrics_set=None)
Request fairness insights to be computed for the model.
- Parameters:
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- Returns:
- status_idstr
A statusId of computation request.
- Return type:
str
- request_feature_effect(row_count=None, data_slice_id=None)
Submit request to compute Feature Effects for the model.
See
get_feature_effect
for more information on the result of the job.- Parameters:
- row_countint
(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- jobJob
A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
- Raises:
- JobAlreadyRequested (422)
If the feature effect have already been requested.
- request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)
Request Feature Effects computation for the multiclass model.
See
get_feature_effect
for more information on the result of the job.- Parameters:
- row_countint
The number of rows from dataset to use for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by feature impact) used to calculate Feature Effects.
- featureslist or None
The list of features used to calculate Feature Effects.
- Returns:
- jobJob
A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.
- request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)
Request feature impacts to be computed for the model.
See
get_feature_impact
for more information on the result of the job.- Parameters:
- row_countint, optional
The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.
- with_metadatabool, optional
Flag indicating whether the result should include the metadata. If true, metadata is included.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- jobJob or status_id
Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.
- Raises:
- JobAlreadyRequested (422)
If the feature impacts have already been requested.
- request_lift_chart(source, data_slice_id=None)
Request the model Lift Chart for the specified source.
- Parameters:
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type:
- request_per_class_fairness_insights(fairness_metrics_set=None)
Request per-class fairness insights be computed for the model.
- Parameters:
- fairness_metrics_setstr, optional
The fairness metric used to calculate the fairness scores. Value can be any one of <datarobot.enums.FairnessMetricsSet>.
- Returns:
- status_check_jobStatusCheckJob
The returned object contains all needed logic for a periodical status check of an async job.
- Return type:
- request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)
Requests predictions against a previously uploaded dataset.
- Parameters:
- dataset_idstring, optional
The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataset
Dataset
, optional The dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataframepd.DataFrame, optional
(New in v3.0) The dataframe to make predictions against
- file_pathstr, optional
(New in v3.0) Path to file to make predictions against
- fileIOBase, optional
(New in v3.0) File to make predictions against
- include_prediction_intervalsbool, optional
(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_sizeint, optional
(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_pointdatetime.datetime or None, optional
(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the
response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
- max_explanations: (New in version v2.21) int optional; specifies the maximum number of
explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
- max_ngram_explanations: optional; int or str
(New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.
- Returns:
- jobPredictJob
The job computing the predictions
- Return type:
- request_residuals_chart(source, data_slice_id=None)
Request the model residuals chart for the specified source.
- Parameters:
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type:
- request_roc_curve(source, data_slice_id=None)
Request the model Roc Curve for the specified source.
- Parameters:
- sourcestr
Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type:
- request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)
Start a job to build training predictions
- Parameters:
- data_subsetstr
data set definition to build predictions on. Choices are:
- dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
models in datetime partitioned projects
- dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
all data except training set. Not valid for models in datetime partitioned projects
dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
- dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
- explanation_algorithmdr.enums.EXPLANATIONS_ALGORITHM
(New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).
- max_explanationsint
(New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the
max_explanations
, the sum of remaining values will also be returned asshap_remaining_total
. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored ifexplanation_algorithm
is not set.
- Returns:
- Job
an instance of created async job
- retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)
Submit a job to the queue to train a blender model.
- Parameters:
- sample_pct: float, optional
The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
- featurelist_idstr, optional
The featurelist id
- training_row_countint, optional
The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.
- n_clusters: int, optional
(new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.
- Returns:
- jobModelJob
The created job that is retraining the model
- Return type:
- set_prediction_threshold(threshold)
Set a custom prediction threshold for the model.
May not be used once
prediction_threshold_read_only
is True for this model.- Parameters:
- thresholdfloat
only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).
- star_model()
Mark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type:
None
- start_advanced_tuning_session()
Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns:
- AdvancedTuningSession
Session for setting up and running Advanced Tuning on a model
- train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)
Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.
This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.
- Parameters:
- data_stage_id: str
The id of the data stage to use for training.
- training_data_namestr, optional
The name of the iteration or data stage to indicate what the incremental learning was performed on.
- data_stage_encodingstr, optional
The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252
- data_stage_encodingstr, optional
The delimiter used by the data in the data stage (default: ‘,’).
- data_stage_compressionstr, optional
The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip
- Returns:
- jobModelJob
The created job that is retraining the model
- unstar_model()
Unmark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type:
None
BlenderModel
- class datarobot.models.BlenderModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, model_ids=None, blender_method=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)
Represents blender model that combines prediction results from other models.
All durations are specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Attributes:
- idstr
the id of the model
- project_idstr
the id of the project the model belongs to
- processeslist of str
the processes used by the model
- featurelist_namestr
the name of the featurelist used by the model
- featurelist_idstr
the id of the featurelist used by the model
- sample_pctfloat
the percentage of the project dataset used in training the model
- training_row_countint or None
the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
- training_durationstr or None
only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- model_typestr
what model this is, e.g. ‘DataRobot Prime’
- model_categorystr
what kind of model this is - always ‘prime’ for DataRobot Prime models
- is_frozenbool
whether this model is a frozen model
- blueprint_idstr
the id of the blueprint used in this model
- metricsdict
a mapping from each metric to the model’s scores for that metric
- model_idslist of str
List of model ids used in blender
- blender_methodstr
Method used to blend results from underlying models
- monotonic_increasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraintsbool
optional, whether this model supports enforcing monotonic constraints
- is_starredbool
whether this model marked as starred
- prediction_thresholdfloat
for binary classification projects, the threshold used for predictions
- prediction_threshold_read_onlybool
indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- model_numberinteger
model number assigned to a model
- parent_model_idstr or None
(New in version v2.20) the id of the model that tuning parameters are derived from
- supports_composable_mlbool or None
(New in version v2.26) whether this model is supported in the Composable ML.
- classmethod get(project_id, model_id)
Retrieve a specific blender.
- Parameters:
- project_idstr
The project’s id.
- model_idstr
The
model_id
of the leaderboard item to retrieve.
- Returns:
- modelBlenderModel
The queried instance.
- advanced_tune(params, description=None)
Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Parameters:
- paramsdict
Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- descriptionstr
Human-readable string describing the newly advanced-tuned model
- Returns:
- ModelJob
The created job to build the model
- Return type:
- cross_validate()
Run cross validation on the model.
Note
To perform Cross Validation on a new model with new parameters, use
train
instead.- Returns:
- ModelJob
The created job to build the model
- delete()
Delete a model from the project’s leaderboard.
- Return type:
None
- download_scoring_code(file_name, source_code=False)
Download the Scoring Code JAR.
- Parameters:
- file_namestr
File path where scoring code will be saved.
- source_codebool, optional
Set to True to download source code archive. It will not be executable.
- Return type:
None
- download_training_artifact(file_name)
Retrieve trained artifact(s) from a model containing one or more custom tasks.
Artifact(s) will be downloaded to the specified local filepath.
- Parameters:
- file_namestr
File path where trained model artifact(s) will be saved.
- classmethod from_data(data)
Instantiate an object of this class using a dict.
- Parameters:
- datadict
Correctly snake_cased keys and their values.
- Return type:
TypeVar
(T
, bound= APIObject)
- get_advanced_tuning_parameters()
Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns:
- dict
A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.
tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.
tuning_parameters is a list of a dicts, each has the following keys
parameter_name : (str) name of the parameter (unique per task, see below)
parameter_id : (str) opaque ID string uniquely identifying parameter
default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)
current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.
task_name : (str) name of the task that this parameter belongs to
constraints: (dict) see the notes below
vertex_id: (str) ID of vertex that this parameter belongs to
- Return type:
Notes
The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.
"constraints": { "select": { "values": [<list(basestring or number) : possible values>] }, "ascii": {}, "unicode": {}, "int": { "min": <int : minimum valid value>, "max": <int : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "float": { "min": <float : minimum valid value>, "max": <float : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "intList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <int : minimum valid value>, "max_val": <int : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "floatList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <float : minimum valid value>, "max_val": <float : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> } }
The keys have meaning as follows:
select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.
- get_all_confusion_charts(fallback_to_parent_insights=False)
Retrieve a list of all confusion matrices available for the model.
- Parameters:
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns:
- list of ConfusionChart
Data for all available confusion charts for model.
- get_all_feature_impacts(data_slice_filter=None)
Retrieve a list of all feature impact results available for the model.
- Parameters:
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.
- Returns:
- list of dicts
Data for all available model feature impacts. Or an empty list if not data found.
Examples
model = datarobot.Model(id='model-id', project_id='project-id') # Get feature impact insights for sliced data data_slice = datarobot.DataSlice(id='data-slice-id') sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get feature impact insights for unsliced data data_slice = datarobot.DataSlice() unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get all feature impact insights all_fi = model.get_all_feature_impacts()
- get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)
Retrieve a list of all Lift charts available for the model.
- Parameters:
- fallback_to_parent_insightsbool, optional
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns:
- list of LiftChart
Data for all available model lift charts. Or an empty list if no data found.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get lift chart insights for sliced data sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id') # Get lift chart insights for unsliced data unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True) # Get all lift chart insights all_lift_charts = model.get_all_lift_charts()
- get_all_multiclass_lift_charts(fallback_to_parent_insights=False)
Retrieve a list of all Lift charts available for the model.
- Parameters:
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns:
- list of LiftChart
Data for all available model lift charts.
- get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)
Retrieve a list of all residuals charts available for the model.
- Parameters:
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns:
- list of ResidualsChart
Data for all available model residuals charts.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get residuals chart insights for sliced data sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id') # Get residuals chart insights for unsliced data unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True) # Get all residuals chart insights all_residuals_charts = model.get_all_residuals_charts()
- get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)
Retrieve a list of all ROC curves available for the model.
- Parameters:
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns:
- list of RocCurve
Data for all available model ROC curves. Or an empty list if no RocCurves are found.
Examples
model = datarobot.Model.get('project-id', 'model-id') ds_filter=DataSlice(id='data-slice-id') # Get roc curve insights for sliced data sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get roc curve insights for unsliced data data_slice_filter=DataSlice(id=None) unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get all roc curve insights all_roc_curves = model.get_all_roc_curves()
- get_confusion_chart(source, fallback_to_parent_insights=False)
Retrieve a multiclass model’s confusion matrix for the specified source.
- Parameters:
- sourcestr
Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns:
- ConfusionChart
Model ConfusionChart data
- Raises:
- ClientError
If the insight is not available for this model
- get_cross_class_accuracy_scores()
Retrieves a list of Cross Class Accuracy scores for the model.
- Returns:
- json
- get_cross_validation_scores(partition=None, metric=None)
Return a dictionary, keyed by metric, showing cross validation scores per partition.
Cross Validation should already have been performed using
cross_validate
ortrain
.Note
Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.
- Parameters:
- partitionfloat
optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.
- metric: unicode
optional name of the metric to filter to resulting cross validation scores by
- Returns:
- cross_validation_scores: dict
A dictionary keyed by metric showing cross validation scores per partition.
- get_data_disparity_insights(feature, class_name1, class_name2)
Retrieve a list of Cross Class Data Disparity insights for the model.
- Parameters:
- featurestr
Bias and Fairness protected feature name.
- class_name1str
One of the compared classes
- class_name2str
Another compared class
- Returns:
- json
- get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)
Retrieve a list of Per Class Bias insights for the model.
- Parameters:
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- offsetint, optional
Number of items to skip.
- limitint, optional
Number of items to return.
- Returns:
- json
- get_feature_effect(source, data_slice_id=None)
Retrieve Feature Effects for the model.
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the available sources.- Parameters:
- sourcestring
The source Feature Effects are retrieved for.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, retrieve unsliced insight data.
- Returns:
- feature_effectsFeatureEffects
The feature effects data.
- Raises:
- ClientError (404)
If the feature effects have not been computed or source is not valid value.
- get_feature_effect_metadata()
Retrieve Feature Effects metadata. Response contains status and available model sources.
Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for the project.
Use source to retrieve Feature Effects, selecting one of the provided sources.
- Returns:
- feature_effect_metadata: FeatureEffectMetadata
- get_feature_effects_multiclass(source='training', class_=None)
Retrieve Feature Effects for the multiclass model.
Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the available sources.- Parameters:
- sourcestr
The source Feature Effects are retrieved for.
- class_str or None
The class name Feature Effects are retrieved for.
- Returns:
- list
The list of multiclass feature effects.
- Raises:
- ClientError (404)
If Feature Effects have not been computed or source is not valid value.
- get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.
Elsewhere this technique is sometimes called ‘Permutation Importance’.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Parameters:
- with_metadatabool
The flag indicating if the result should include the metadata as well.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.
- Returns:
- list or dict
The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.
Each List item is a dict with the keys
featureName
,impactNormalized
, andimpactUnnormalized
,redundantWith
andcount
.For dict response available keys are:
featureImpacts
- Feature Impact data as a dictionary. Each item is a dict withkeys:
featureName
,impactNormalized
, andimpactUnnormalized
, andredundantWith
.
shapBased
- A boolean that indicates whether Feature Impact was calculated usingShapley values.
ranRedundancyDetection
- A boolean that indicates whether redundant featureidentification was run while calculating this Feature Impact.
rowCount
- An integer or None that indicates the number of rows that was used tocalculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.
count
- An integer with the number of features under thefeatureImpacts
.
- Raises:
- ClientError (404)
If the feature impacts have not been computed.
- ValueError
If data_slice_filter passed as None
- get_features_used()
Query the server to determine which features were used.
Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.
- Returns:
- featureslist of str
The names of the features used in the model.
- Return type:
List
[str
]
- get_frozen_child_models()
Retrieve the IDs for all models that are frozen from this model.
- Returns:
- A list of Models
- get_labelwise_roc_curves(source, fallback_to_parent_insights=False)
Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .
Added in version v2.24.
- Parameters:
- sourcestr
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- Returns:
- list of
LabelwiseRocCurve
Labelwise ROC Curve instances for
source
and all labels
- list of
- Raises:
- ClientError
If the insight is not available for this model
- get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve the model Lift chart for the specified source.
- Parameters:
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- Returns:
- LiftChart
Model lift chart data
- Raises:
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- get_missing_report_info()
Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.
- Returns:
- An iterable of MissingReportPerFeature
The queried model missing report, sorted by missing count (DESCENDING order).
- get_model_blueprint_chart()
Retrieve a diagram that can be used to understand data flow in the blueprint.
- Returns:
- ModelBlueprintChart
The queried model blueprint chart.
- get_model_blueprint_documents()
Get documentation for tasks used in this model.
- Returns:
- list of BlueprintTaskDocument
All documents available for the model.
- get_model_blueprint_json()
Get the blueprint json representation used by this model.
- Returns:
- BlueprintJson
Json representation of the blueprint stages.
- Return type:
Dict
[str
,Tuple
[List
[str
],List
[str
],str
]]
- get_multiclass_feature_impact()
For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Returns:
- feature_impactslist of dict
The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
- Raises:
- ClientError (404)
If the multiclass feature impacts have not been computed.
- get_multiclass_lift_chart(source, fallback_to_parent_insights=False)
Retrieve model Lift chart for the specified source.
- Parameters:
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns:
- list of LiftChart
Model lift chart data for each saved target class
- Raises:
- ClientError
If the insight is not available for this model
- get_multilabel_lift_charts(source, fallback_to_parent_insights=False)
Retrieve model Lift charts for the specified source.
Added in version v2.24.
- Parameters:
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns:
- list of LiftChart
Model lift chart data for each saved target class
- Raises:
- ClientError
If the insight is not available for this model
- get_num_iterations_trained()
Retrieves the number of estimators trained by early-stopping tree-based models.
– versionadded:: v2.22
- Returns:
- projectId: str
id of project containing the model
- modelId: str
id of the model
- data: array
list of numEstimatorsItem objects, one for each modeling stage.
- numEstimatorsItem will be of the form:
- stage: str
indicates the modeling stage (for multi-stage models); None of single-stage models
- numIterations: int
the number of estimators or iterations trained by the model
- get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)
Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.
See
get_feature_effect_metadata
for retrieving information of source.- Parameters:
- sourcestring
The source Feature Effects are retrieved for.
- max_waitint, optional
The maximum time to wait for a requested Feature Effect job to complete before erroring.
- row_countint, optional
(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- feature_effectsFeatureEffects
The Feature Effects data.
- get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)
Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.
- Parameters:
- sourcestring
The source Feature Effects retrieve for.
- class_str or None
The class name Feature Effects retrieve for.
- row_countint
The number of rows from dataset to use for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
- featureslist or None
The list of features used to calculate Feature Effects.
- max_waitint, optional
The maximum time to wait for a requested Feature Effects job to complete before erroring.
- Returns:
- feature_effectslist of FeatureEffectsMulticlass
The list of multiclass feature effects data.
- get_or_request_feature_impact(max_wait=600, **kwargs)
Retrieve feature impact for the model, requesting a job if it hasn’t been run previously
- Parameters:
- max_waitint, optional
The maximum time to wait for a requested feature impact job to complete before erroring
- **kwargs
Arbitrary keyword arguments passed to
request_feature_impact
.
- Returns:
- feature_impactslist or dict
The feature impact data. See
get_feature_impact
for the exact schema.
- get_parameters()
Retrieve model parameters.
- Returns:
- ModelParameters
Model parameters for this model.
- get_pareto_front()
Retrieve the Pareto Front for a Eureqa model.
This method is only supported for Eureqa models.
- Returns:
- ParetoFront
Model ParetoFront data
- get_prime_eligibility()
Check if this model can be approximated with DataRobot Prime
- Returns:
- prime_eligibilitydict
a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
- get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve model residuals chart for the specified source.
- Parameters:
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.
- Returns:
- ResidualsChart
Model residuals chart data
- Raises:
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.
- Parameters:
- sourcestr
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
- Returns:
- RocCurve
Model ROC curve data
- Raises:
- ClientError
If the insight is not available for this model
- (New in version v3.0) TypeError
If the underlying project type is multilabel
- ValueError
If data_slice_filter passed as None
- get_rulesets()
List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.
- Returns:
- rulesetslist of Ruleset
- Return type:
List
[Ruleset
]
- get_supported_capabilities()
Retrieves a summary of the capabilities supported by a model.
Added in version v2.14.
- Returns:
- supportsBlending: bool
whether the model supports blending
- supportsMonotonicConstraints: bool
whether the model supports monotonic constraints
- hasWordCloud: bool
whether the model has word cloud data available
- eligibleForPrime: bool
(Deprecated in version v3.6) whether the model is eligible for Prime
- hasParameters: bool
whether the model has parameters that can be retrieved
- supportsCodeGeneration: bool
(New in version v2.18) whether the model supports code generation
- supportsShap: bool
- (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based
feature Importance
- supportsEarlyStopping: bool
(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.
- get_uri()
- Returns:
- urlstr
Permanent static hyperlink to this model at leaderboard.
- Return type:
str
- get_word_cloud(exclude_stop_words=False)
Retrieve word cloud data for the model.
- Parameters:
- exclude_stop_wordsbool, optional
Set to True if you want stopwords filtered out of response.
- Returns:
- WordCloud
Word cloud data for the model.
- incremental_train(data_stage_id, training_data_name=None)
Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.
- Return type:
- classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)
Retrieve paginated model records, sorted by scores, with optional filtering.
- Parameters:
- sort_by_partition: str, one of `validation`, `backtesting`, `crossValidation` or `holdout`
Set the partition to use for sorted (by score) list of models. validation is the default.
- sort_by_metric: str
Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.
- with_metric: str
For a single-metric list of results, specify that project metric.
- search_term: str
If specified, only models containing the term in their name or processes are returned.
- featurelists: list of str
If specified, only models trained on selected featurelists are returned.
- families: list of str
If specified, only models belonging to selected families are returned.
- blueprints: list of str
If specified, only models trained on specified blueprint IDs are returned.
- labels: list of str, `starred` or `prepared for deployment`
If specified, only models tagged with all listed labels are returned.
- characteristics: list of str
If specified, only models matching all listed characteristics are returned.
- training_filters: list of str
If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings
- number_of_clusters: list of int
Filter models by number of clusters. Applicable only in unsupervised clustering projects.
- limit: int
- offset: int
- Returns:
- generic_models: list of GenericModel
- Return type:
List
[GenericModel
]
- open_in_browser()
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type:
None
- request_approximation()
Request an approximation of this model using DataRobot Prime
This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.
- Returns:
- jobJob
the job generating the rulesets
- request_cross_class_accuracy_scores()
Request data disparity insights to be computed for the model.
- Returns:
- status_idstr
A statusId of computation request.
- request_data_disparity_insights(feature, compared_class_names)
Request data disparity insights to be computed for the model.
- Parameters:
- featurestr
Bias and Fairness protected feature name.
- compared_class_nameslist(str)
List of two classes to compare
- Returns:
- status_idstr
A statusId of computation request.
- request_external_test(dataset_id, actual_value_column=None)
Request external test to compute scores and insights on an external test dataset
- Parameters:
- dataset_idstring
The dataset to make predictions against (as uploaded from Project.upload_dataset)
- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- Returns
- ——-
- jobJob
a Job representing external dataset insights computation
- request_fairness_insights(fairness_metrics_set=None)
Request fairness insights to be computed for the model.
- Parameters:
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- Returns:
- status_idstr
A statusId of computation request.
- Return type:
str
- request_feature_effect(row_count=None, data_slice_id=None)
Submit request to compute Feature Effects for the model.
See
get_feature_effect
for more information on the result of the job.- Parameters:
- row_countint
(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- jobJob
A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
- Raises:
- JobAlreadyRequested (422)
If the feature effect have already been requested.
- request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)
Request Feature Effects computation for the multiclass model.
See
get_feature_effect
for more information on the result of the job.- Parameters:
- row_countint
The number of rows from dataset to use for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by feature impact) used to calculate Feature Effects.
- featureslist or None
The list of features used to calculate Feature Effects.
- Returns:
- jobJob
A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.
- request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)
Request feature impacts to be computed for the model.
See
get_feature_impact
for more information on the result of the job.- Parameters:
- row_countint, optional
The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.
- with_metadatabool, optional
Flag indicating whether the result should include the metadata. If true, metadata is included.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- jobJob or status_id
Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.
- Raises:
- JobAlreadyRequested (422)
If the feature impacts have already been requested.
- request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)
Train a new frozen model with parameters from this model.
Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.
Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.
Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Parameters:
- training_row_countint, optional
the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
- training_durationstr, optional
a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
- training_start_datedatetime.datetime, optional
the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
- training_end_datedatetime.datetime, optional
the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
- time_window_sample_pctint, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_methodstr, optional
(New in version v2.23) defines the way training data is selected. Can be either
random
orlatest
. In combination withtraining_row_count
defines how rows are selected from backtest (latest
by default). When training data is defined using time range (training_duration
oruse_project_settings
) this setting changes the waytime_window_sample_pct
is applied (random
by default). Applicable to OTV projects only.
- Returns:
- model_jobModelJob
the modeling job training a frozen model
- Return type:
- request_frozen_model(sample_pct=None, training_row_count=None)
Train a new frozen model with parameters from this model :rtype:
ModelJob
Note
This method only works if project the model belongs to is not datetime partitioned. If it is, use
request_frozen_datetime_model
instead.Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
- Parameters:
- sample_pctfloat
optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.
- training_row_countint
(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.
- Returns:
- model_jobModelJob
the modeling job training a frozen model
- request_lift_chart(source, data_slice_id=None)
Request the model Lift Chart for the specified source.
- Parameters:
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type:
- request_per_class_fairness_insights(fairness_metrics_set=None)
Request per-class fairness insights be computed for the model.
- Parameters:
- fairness_metrics_setstr, optional
The fairness metric used to calculate the fairness scores. Value can be any one of <datarobot.enums.FairnessMetricsSet>.
- Returns:
- status_check_jobStatusCheckJob
The returned object contains all needed logic for a periodical status check of an async job.
- Return type:
- request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)
Requests predictions against a previously uploaded dataset.
- Parameters:
- dataset_idstring, optional
The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataset
Dataset
, optional The dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataframepd.DataFrame, optional
(New in v3.0) The dataframe to make predictions against
- file_pathstr, optional
(New in v3.0) Path to file to make predictions against
- fileIOBase, optional
(New in v3.0) File to make predictions against
- include_prediction_intervalsbool, optional
(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_sizeint, optional
(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_pointdatetime.datetime or None, optional
(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the
response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
- max_explanations: (New in version v2.21) int optional; specifies the maximum number of
explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
- max_ngram_explanations: optional; int or str
(New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.
- Returns:
- jobPredictJob
The job computing the predictions
- Return type:
- request_residuals_chart(source, data_slice_id=None)
Request the model residuals chart for the specified source.
- Parameters:
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type:
- request_roc_curve(source, data_slice_id=None)
Request the model Roc Curve for the specified source.
- Parameters:
- sourcestr
Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type:
- request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)
Start a job to build training predictions
- Parameters:
- data_subsetstr
data set definition to build predictions on. Choices are:
- dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
models in datetime partitioned projects
- dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
all data except training set. Not valid for models in datetime partitioned projects
dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
- dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
- explanation_algorithmdr.enums.EXPLANATIONS_ALGORITHM
(New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).
- max_explanationsint
(New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the
max_explanations
, the sum of remaining values will also be returned asshap_remaining_total
. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored ifexplanation_algorithm
is not set.
- Returns:
- Job
an instance of created async job
- retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)
Submit a job to the queue to train a blender model.
- Parameters:
- sample_pct: float, optional
The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
- featurelist_idstr, optional
The featurelist id
- training_row_countint, optional
The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.
- n_clusters: int, optional
(new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.
- Returns:
- jobModelJob
The created job that is retraining the model
- Return type:
- set_prediction_threshold(threshold)
Set a custom prediction threshold for the model.
May not be used once
prediction_threshold_read_only
is True for this model.- Parameters:
- thresholdfloat
only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).
- star_model()
Mark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type:
None
- start_advanced_tuning_session()
Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns:
- AdvancedTuningSession
Session for setting up and running Advanced Tuning on a model
- train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)
Train the blueprint used in model on a particular featurelist or amount of data.
This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.
Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.
In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class. :rtype:
str
Note
For datetime partitioned projects, see
train_datetime
instead.- Parameters:
- sample_pctfloat, optional
The amount of data to use for training, as a percentage of the project dataset from 0 to 100.
- featurelist_idstr, optional
The identifier of the featurelist to use. If not defined, the featurelist of this model is used.
- scoring_typestr, optional
Either
validation
orcrossValidation
(alsodr.SCORING_TYPE.validation
ordr.SCORING_TYPE.cross_validation
).validation
is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning,crossValidation
can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.- training_row_countint, optional
The number of rows to use to train the requested model.
- monotonic_increasing_featurelist_idstr
(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_idstr
(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.
- Returns:
- model_job_idstr
id of created job, can be used as parameter to
ModelJob.get
method orwait_for_async_model_creation
function
Examples
project = Project.get('project-id') model = Model.get('project-id', 'model-id') model_job_id = model.train(training_row_count=project.max_train_rows)
- train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)
Trains this model on a different featurelist or sample size.
Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Parameters:
- featurelist_idstr, optional
the featurelist to use to train the model. If not specified, the featurelist of this model is used.
- training_row_countint, optional
the number of rows of data that should be used to train the model. If specified, neither
training_duration
noruse_project_settings
may be specified.- training_durationstr, optional
a duration string specifying what time range the data used to train the model should span. If specified, neither
training_row_count
noruse_project_settings
may be specified.- use_project_settingsbool, optional
(New in version v2.20) defaults to
False
. IfTrue
, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neithertraining_row_count
nortraining_duration
may be specified.- time_window_sample_pctint, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_methodstr, optional
(New in version v2.23) defines the way training data is selected. Can be either
random
orlatest
. In combination withtraining_row_count
defines how rows are selected from backtest (latest
by default). When training data is defined using time range (training_duration
oruse_project_settings
) this setting changes the waytime_window_sample_pct
is applied (random
by default). Applicable to OTV projects only.- monotonic_increasing_featurelist_idstr, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_idstr, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- n_clusters: int, optional
(New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.
- Returns:
- jobModelJob
the created job to build the model
- Return type:
- train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)
Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.
This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.
- Parameters:
- data_stage_id: str
The id of the data stage to use for training.
- training_data_namestr, optional
The name of the iteration or data stage to indicate what the incremental learning was performed on.
- data_stage_encodingstr, optional
The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252
- data_stage_encodingstr, optional
The delimiter used by the data in the data stage (default: ‘,’).
- data_stage_compressionstr, optional
The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip
- Returns:
- jobModelJob
The created job that is retraining the model
- unstar_model()
Unmark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type:
None
DatetimeModel
- class datarobot.models.DatetimeModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, training_info=None, holdout_score=None, holdout_status=None, data_selection_method=None, backtests=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, effective_feature_derivation_window_start=None, effective_feature_derivation_window_end=None, forecast_window_start=None, forecast_window_end=None, windows_basis_unit=None, model_number=None, parent_model_id=None, supports_composable_ml=None, n_clusters=None, is_n_clusters_dynamically_determined=None, has_empty_clusters=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None, **kwargs)
Represents a model from a datetime partitioned project
All durations are specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.Note that only one of training_row_count, training_duration, and training_start_date and training_end_date will be specified, depending on the data_selection_method of the model. Whichever method was selected determines the amount of data used to train on when making predictions and scoring the backtests and the holdout.
- Attributes:
- idstr
the id of the model
- project_idstr
the id of the project the model belongs to
- processeslist of str
the processes used by the model
- featurelist_namestr
the name of the featurelist used by the model
- featurelist_idstr
the id of the featurelist used by the model
- sample_pctfloat
the percentage of the project dataset used in training the model
- training_row_countint or None
If specified, an int specifying the number of rows used to train the model and evaluate backtest scores.
- training_durationstr or None
If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- time_window_sample_pctint or None
An integer between 1 and 99 indicating the percentage of sampling within the training window. The points kept are determined by a random uniform sample. If not specified, no sampling was done.
- sampling_methodstr or None
(New in v2.23) indicates the way training data has been selected (either how rows have been selected within backtest or how
time_window_sample_pct
has been applied).- model_typestr
what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
- model_categorystr
what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
- is_frozenbool
whether this model is a frozen model
- blueprint_idstr
the id of the blueprint used in this model
- metricsdict
a mapping from each metric to the model’s scores for that metric. The keys in metrics are the different metrics used to evaluate the model, and the values are the results. The dictionaries inside of metrics will be as described here: ‘validation’, the score for a single backtest; ‘crossValidation’, always None; ‘backtesting’, the average score for all backtests if all are available and computed, or None otherwise; ‘backtestingScores’, a list of scores for all backtests where the score is None if that backtest does not have a score available; and ‘holdout’, the score for the holdout or None if the holdout is locked or the score is unavailable.
- backtestslist of dict
describes what data was used to fit each backtest, the score for the project metric, and why the backtest score is unavailable if it is not provided.
- data_selection_methodstr
which of training_row_count, training_duration, or training_start_data and training_end_date were used to determine the data used to fit the model. One of ‘rowCount’, ‘duration’, or ‘selectedDateRange’.
- training_infodict
describes which data was used to train on when scoring the holdout and making predictions. training_info` will have the following keys: holdout_training_start_date, holdout_training_duration, holdout_training_row_count, holdout_training_end_date, prediction_training_start_date, prediction_training_duration, prediction_training_row_count, prediction_training_end_date. Start and end dates will be datetimes, durations will be duration strings, and rows will be integers.
- holdout_scorefloat or None
the score against the holdout, if available and the holdout is unlocked, according to the project metric.
- holdout_statusstring or None
the status of the holdout score, e.g. “COMPLETED”, “HOLDOUT_BOUNDARIES_EXCEEDED”. Unavailable if the holdout fold was disabled in the partitioning configuration.
- monotonic_increasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraintsbool
optional, whether this model supports enforcing monotonic constraints
- is_starredbool
whether this model marked as starred
- prediction_thresholdfloat
for binary classification projects, the threshold used for predictions
- prediction_threshold_read_onlybool
indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- effective_feature_derivation_window_startint or None
(New in v2.16) For time series projects only. How many units of the
windows_basis_unit
into the past relative to the forecast point the user needs to provide history for at prediction time. This can differ from thefeature_derivation_window_start
set on the project due to the differencing method and period selected, or if the model is a time series native model such as ARIMA. Will be a negative integer in time series projects andNone
otherwise.- effective_feature_derivation_window_endint or None
(New in v2.16) For time series projects only. How many units of the
windows_basis_unit
into the past relative to the forecast point the feature derivation window should end. Will be a non-positive integer in time series projects andNone
otherwise.- forecast_window_startint or None
(New in v2.16) For time series projects only. How many units of the
windows_basis_unit
into the future relative to the forecast point the forecast window should start. Note that this field will be the same as what is shown in the project settings. Will be a non-negative integer in time series projects and None otherwise.- forecast_window_endint or None
(New in v2.16) For time series projects only. How many units of the
windows_basis_unit
into the future relative to the forecast point the forecast window should end. Note that this field will be the same as what is shown in the project settings. Will be a non-negative integer in time series projects and None otherwise.- windows_basis_unitstr or None
(New in v2.16) For time series projects only. Indicates which unit is the basis for the feature derivation window and the forecast window. Note that this field will be the same as what is shown in the project settings. In time series projects, will be either the detected time unit or “ROW”, and None otherwise.
- model_numberinteger
model number assigned to a model
- parent_model_idstr or None
(New in version v2.20) the id of the model that tuning parameters are derived from
- supports_composable_mlbool or None
(New in version v2.26) whether this model is supported in the Composable ML.
- is_n_clusters_dynamically_determinedbool, optional
(New in version 2.27) if
True
, indicates that model determines number of clusters automatically.- n_clustersint, optional
(New in version 2.27) Number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.
- classmethod get(project, model_id)
Retrieve a specific datetime model.
If the project does not use datetime partitioning, a ClientError will occur.
- Parameters:
- projectstr
the id of the project the model belongs to
- model_idstr
the id of the model to retrieve
- Returns:
- modelDatetimeModel
the model
- score_backtests()
Compute the scores for all available backtests.
Some backtests may be unavailable if the model is trained into their validation data.
- Returns:
- jobJob
a job tracking the backtest computation. When it is complete, all available backtests will have scores computed.
- cross_validate()
Inherited from the model. DatetimeModels cannot request cross validation scores; use backtests instead.
- Return type:
NoReturn
- get_cross_validation_scores(partition=None, metric=None)
Inherited from Model - DatetimeModels cannot request Cross Validation scores,
Use
backtests
instead.- Return type:
NoReturn
- request_training_predictions(data_subset, *args, **kwargs)
Start a job that builds training predictions.
- Parameters:
- data_subsetstr
data set definition to build predictions on. Choices are:
dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only
- dr.enums.DATA_SUBSET.ALL_BACKTESTS for downloading the predictions for all
backtest validation folds. Requires the model to have successfully scored all backtests.
- Returns
- ——-
- Job
an instance of created async job
- get_series_accuracy_as_dataframe(offset=0, limit=100, metric=None, multiseries_value=None, order_by=None, reverse=False)
Retrieve series accuracy results for the specified model as a pandas.DataFrame.
- Parameters:
- offsetint, optional
The number of results to skip. Defaults to 0 if not specified.
- limitint, optional
The maximum number of results to return. Defaults to 100 if not specified.
- metricstr, optional
The name of the metric to retrieve scores for. If omitted, the default project metric will be used.
- multiseries_valuestr, optional
If specified, only the series containing the given value in one of the series ID columns will be returned.
- order_bystr, optional
Used for sorting the series. Attribute must be one of
datarobot.enums.SERIES_ACCURACY_ORDER_BY
.- reversebool, optional
Used for sorting the series. If
True
, will sort the series in descending order by the attribute specified byorder_by
.
- Returns:
- data
A pandas.DataFrame with the Series Accuracy for the specified model.
- download_series_accuracy_as_csv(filename, encoding='utf-8', offset=0, limit=100, metric=None, multiseries_value=None, order_by=None, reverse=False)
Save series accuracy results for the specified model in a CSV file.
- Parameters:
- filenamestr or file object
The path or file object to save the data to.
- encodingstr, optional
A string representing the encoding to use in the output csv file. Defaults to ‘utf-8’.
- offsetint, optional
The number of results to skip. Defaults to 0 if not specified.
- limitint, optional
The maximum number of results to return. Defaults to 100 if not specified.
- metricstr, optional
The name of the metric to retrieve scores for. If omitted, the default project metric will be used.
- multiseries_valuestr, optional
If specified, only the series containing the given value in one of the series ID columns will be returned.
- order_bystr, optional
Used for sorting the series. Attribute must be one of
datarobot.enums.SERIES_ACCURACY_ORDER_BY
.- reversebool, optional
Used for sorting the series. If
True
, will sort the series in descending order by the attribute specified byorder_by
.
- get_series_clusters(offset=0, limit=100, order_by=None, reverse=False)
Retrieve a dictionary of series and the clusters assigned to each series. This is only usable for clustering projects.
- Parameters:
- offsetint, optional
The number of results to skip. Defaults to 0 if not specified.
- limitint, optional
The maximum number of results to return. Defaults to 100 if not specified.
- order_bystr, optional
Used for sorting the series. Attribute must be one of
datarobot.enums.SERIES_ACCURACY_ORDER_BY
.- reversebool, optional
Used for sorting the series. If
True
, will sort the series in descending order by the attribute specified byorder_by
.
- Returns:
- Dict
A dictionary of the series in the dataset with their associated cluster
- Raises:
- ValueError
If the model type returns an unsupported insight
- ClientError
If the insight is not available for this model
- Return type:
Dict
[str
,str
]
- compute_series_accuracy(compute_all_series=False)
Compute series accuracy for the model.
- Parameters:
- compute_all_seriesbool, optional
Calculate accuracy for all series or only first 1000.
- Returns:
- Job
an instance of the created async job
- retrain(time_window_sample_pct=None, featurelist_id=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, sampling_method=None, n_clusters=None)
Retrain an existing datetime model using a new training period for the model’s training set (with optional time window sampling) or a different feature list.
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Parameters:
- featurelist_idstr, optional
The ID of the featurelist to use.
- training_row_countint, optional
The number of rows to train the model on. If this parameter is used then sample_pct cannot be specified.
- time_window_sample_pctint, optional
An int between
1
and99
indicating the percentage of sampling within the time window. The points kept are determined by a random uniform sample. If specified, training_row_count must not be specified and either training_duration or training_start_date and training_end_date must be specified.- training_durationstr, optional
A duration string representing the training duration for the submitted model. If specified then training_row_count, training_start_date, and training_end_date cannot be specified.
- training_start_datestr, optional
A datetime string representing the start date of the data to use for training this model. If specified, training_end_date must also be specified, and training_duration cannot be specified. The value must be before the training_end_date value.
- training_end_datestr, optional
A datetime string representing the end date of the data to use for training this model. If specified, training_start_date must also be specified, and training_duration cannot be specified. The value must be after the training_start_date value.
- sampling_methodstr, optional
(New in version v2.23) defines the way training data is selected. Can be either
random
orlatest
. In combination withtraining_row_count
defines how rows are selected from backtest (latest
by default). When training data is defined using time range (training_duration
oruse_project_settings
) this setting changes the waytime_window_sample_pct
is applied (random
by default). Applicable to OTV projects only.- n_clustersint, optional
(New in version 2.27) Number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.
- Returns:
- jobModelJob
The created job that is retraining the model
- get_feature_effect_metadata()
Retrieve Feature Effect metadata for each backtest. Response contains status and available sources for each backtest of the model.
Each backtest is available for training and validation
If holdout is configured for the project it has holdout as backtestIndex. It has training and holdout sources available.
Start/stop models contain a single response item with startstop value for backtestIndex.
Feature Effect of training is always available (except for the old project which supports only Feature Effect for validation).
When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Effect is not available for validation or holdout.
Feature Effect for holdout is not available when there is no holdout configured for the project.
source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.
backtestIndex is expected parameter to submit compute request and retrieve Feature Effect. One of provided backtest indexes shall be used.
- Returns:
- feature_effect_metadata: FeatureEffectMetadataDatetime
- request_feature_effect(backtest_index, data_slice_filter=<datarobot.models.model.Sentinel object>)
Request feature effects to be computed for the model.
See
get_feature_effect
for more information on the result of the job.See
get_feature_effect_metadata
for retrieving information of backtest_index.- Parameters:
- backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.
The backtest index to retrieve Feature Effects for.
- Returns:
- jobJob
A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
- Raises:
- JobAlreadyRequested (422)
If the feature effect have already been requested.
- get_feature_effect(source, backtest_index, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve Feature Effects for the model.
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information of source, backtest_index.- Parameters:
- source: string
The source Feature Effects are retrieved for. One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the available sources for feature effect.
- backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.
The backtest index to retrieve Feature Effects for.
- Returns:
- feature_effects: FeatureEffects
The feature effects data.
- Raises:
- ClientError (404)
If the feature effects have not been computed or source is not valid value.
- get_or_request_feature_effect(source, backtest_index, max_wait=600, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve Feature Effects computations for the model, requesting a new job if it hasn’t been run previously.
See
get_feature_effect_metadata
for retrieving information of source, backtest_index.- Parameters:
- max_waitint, optional
The maximum time to wait for a requested feature effect job to complete before erroring
- sourcestring
The source Feature Effects are retrieved for. One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the available sources for feature effect.
- backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.
The backtest index to retrieve Feature Effects for.
- Returns:
- feature_effectsFeatureEffects
The feature effects data.
- request_feature_effects_multiclass(backtest_index, row_count=None, top_n_features=None, features=None)
Request feature effects to be computed for the multiclass datetime model.
See
get_feature_effect
for more information on the result of the job.- Parameters:
- backtest_indexstr
The backtest index to use for Feature Effects calculation.
- row_countint
The number of rows from dataset to use for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
- featureslist or None
The list of features to use to calculate Feature Effects.
- Returns:
- jobJob
A Job representing Feature Effects computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.
- get_feature_effects_multiclass(backtest_index, source='training', class_=None)
Retrieve Feature Effects for the multiclass datetime model.
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the available sources.- Parameters:
- backtest_indexstr
The backtest index to retrieve Feature Effects for.
- sourcestr
The source Feature Effects are retrieved for.
- class_str or None
The class name Feature Effects are retrieved for.
- Returns:
- list
The list of multiclass Feature Effects.
- Raises:
- ClientError (404)
If the Feature Effects have not been computed or source is not valid value.
- get_or_request_feature_effects_multiclass(backtest_index, source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)
Retrieve Feature Effects for a datetime multiclass model, and request a job if it hasn’t been run previously.
- Parameters:
- backtest_indexstr
The backtest index to retrieve Feature Effects for.
- sourcestring
The source from which Feature Effects are retrieved.
- class_str or None
The class name Feature Effects retrieve for.
- row_countint
The number of rows used from the dataset for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by feature impact) used to calculate Feature Effects.
- featureslist or None
The list of features used to calculate Feature Effects.
- max_waitint, optional
The maximum time to wait for a requested feature effect job to complete before erroring.
- Returns:
- feature_effectslist of FeatureEffectsMulticlass
The list of multiclass feature effects data.
- calculate_prediction_intervals(prediction_intervals_size)
Calculate prediction intervals for this DatetimeModel for the specified size. :rtype:
Job
Added in version v2.19.
- Parameters:
- prediction_intervals_sizeint
The prediction interval’s size to calculate for this model. See the prediction intervals documentation for more information.
- Returns:
- jobJob
a
Job
tracking the prediction intervals computation
- get_calculated_prediction_intervals(offset=None, limit=None)
Retrieve a list of already-calculated prediction intervals for this model
Added in version v2.19.
- Parameters:
- offsetint, optional
If provided, this many results will be skipped
- limitint, optional
If provided, at most this many results will be returned. If not provided, will return at most 100 results.
- Returns:
- list[int]
A descending-ordered list of already-calculated prediction interval sizes
- compute_datetime_trend_plots(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance_start=None, forecast_distance_end=None)
Computes datetime trend plots (Accuracy over Time, Forecast vs Actual, Anomaly over Time) for this model
Added in version v2.25.
- Parameters:
- backtestint or string, optional
Compute plots for a specific backtest (use the backtest index starting from zero). To compute plots for holdout, use
dr.enums.DATA_SUBSET.HOLDOUT
- sourcestring, optional
The source of the data for the backtest/holdout. Attribute must be one of
dr.enums.SOURCE_TYPE
- forecast_distance_startint, optional:
The start of forecast distance range (forecast window) to compute. If not specified, the first forecast distance for this project will be used. Only for time series supervised models
- forecast_distance_endint, optional:
The end of forecast distance range (forecast window) to compute. If not specified, the last forecast distance for this project will be used. Only for time series supervised models
- Returns:
- jobJob
a
Job
tracking the datetime trend plots computation
Notes
Forecast distance specifies the number of time steps between the predicted point and the origin point.
For the multiseries models only first 1000 series in alphabetical order and an average plot for them will be computed.
Maximum 100 forecast distances can be requested for calculation in time series supervised projects.
- get_accuracy_over_time_plots_metadata(forecast_distance=None)
Retrieve Accuracy over Time plots metadata for this model.
Added in version v2.25.
- Parameters:
- forecast_distanceint, optional
Forecast distance to retrieve the metadata for. If not specified, the first forecast distance for this project will be used. Only available for time series projects.
- Returns:
- metadataAccuracyOverTimePlotsMetadata
a
AccuracyOverTimePlotsMetadata
representing Accuracy over Time plots metadata
- get_accuracy_over_time_plot(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance=None, series_id=None, resolution=None, max_bin_size=None, start_date=None, end_date=None, max_wait=600)
Retrieve Accuracy over Time plots for this model.
Added in version v2.25.
- Parameters:
- backtestint or string, optional
Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use
dr.enums.DATA_SUBSET.HOLDOUT
- sourcestring, optional
The source of the data for the backtest/holdout. Attribute must be one of
dr.enums.SOURCE_TYPE
- forecast_distanceint, optional
Forecast distance to retrieve the plots for. If not specified, the first forecast distance for this project will be used. Only available for time series projects.
- series_idstring, optional
The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
- resolutionstring, optional
Specifying at which resolution the data should be binned. If not provided an optimal resolution will be used to build chart data with number of bins <=
max_bin_size
. One ofdr.enums.DATETIME_TREND_PLOTS_RESOLUTION
.- max_bin_sizeint, optional
An int between
1
and1000
, which specifies the maximum number of bins for the retrieval. Default is500
.- start_datedatetime.datetime, optional
The start of the date range to return. If not specified, start date for requested plot will be used.
- end_datedatetime.datetime, optional
The end of the date range to return. If not specified, end date for requested plot will be used.
- max_waitint or None, optional
The maximum time to wait for a compute job to complete before retrieving the plots. Default is
dr.enums.DEFAULT_MAX_WAIT
. If0
orNone
, the plots would be retrieved without attempting the computation.
- Returns:
- plotAccuracyOverTimePlot
a
AccuracyOverTimePlot
representing Accuracy over Time plot
Examples
import datarobot as dr import pandas as pd model = dr.DatetimeModel(project_id=project_id, id=model_id) plot = model.get_accuracy_over_time_plot() df = pd.DataFrame.from_dict(plot.bins) figure = df.plot("start_date", ["actual", "predicted"]).get_figure() figure.savefig("accuracy_over_time.png")
- get_accuracy_over_time_plot_preview(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance=None, series_id=None, max_wait=600)
Retrieve Accuracy over Time preview plots for this model.
Added in version v2.25.
- Parameters:
- backtestint or string, optional
Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use
dr.enums.DATA_SUBSET.HOLDOUT
- sourcestring, optional
The source of the data for the backtest/holdout. Attribute must be one of
dr.enums.SOURCE_TYPE
- forecast_distanceint, optional
Forecast distance to retrieve the plots for. If not specified, the first forecast distance for this project will be used. Only available for time series projects.
- series_idstring, optional
The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
- max_waitint or None, optional
The maximum time to wait for a compute job to complete before retrieving the plots. Default is
dr.enums.DEFAULT_MAX_WAIT
. If0
orNone
, the plots would be retrieved without attempting the computation.
- Returns:
- plotAccuracyOverTimePlotPreview
a
AccuracyOverTimePlotPreview
representing Accuracy over Time plot preview
Examples
import datarobot as dr import pandas as pd model = dr.DatetimeModel(project_id=project_id, id=model_id) plot = model.get_accuracy_over_time_plot_preview() df = pd.DataFrame.from_dict(plot.bins) figure = df.plot("start_date", ["actual", "predicted"]).get_figure() figure.savefig("accuracy_over_time_preview.png")
- get_forecast_vs_actual_plots_metadata()
Retrieve Forecast vs Actual plots metadata for this model.
Added in version v2.25.
- Returns:
- metadataForecastVsActualPlotsMetadata
a
ForecastVsActualPlotsMetadata
representing Forecast vs Actual plots metadata
- get_forecast_vs_actual_plot(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance_start=None, forecast_distance_end=None, series_id=None, resolution=None, max_bin_size=None, start_date=None, end_date=None, max_wait=600)
Retrieve Forecast vs Actual plots for this model.
Added in version v2.25.
- Parameters:
- backtestint or string, optional
Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use
dr.enums.DATA_SUBSET.HOLDOUT
- sourcestring, optional
The source of the data for the backtest/holdout. Attribute must be one of
dr.enums.SOURCE_TYPE
- forecast_distance_startint, optional:
The start of forecast distance range (forecast window) to retrieve. If not specified, the first forecast distance for this project will be used.
- forecast_distance_endint, optional:
The end of forecast distance range (forecast window) to retrieve. If not specified, the last forecast distance for this project will be used.
- series_idstring, optional
The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
- resolutionstring, optional
Specifying at which resolution the data should be binned. If not provided an optimal resolution will be used to build chart data with number of bins <=
max_bin_size
. One ofdr.enums.DATETIME_TREND_PLOTS_RESOLUTION
.- max_bin_sizeint, optional
An int between
1
and1000
, which specifies the maximum number of bins for the retrieval. Default is500
.- start_datedatetime.datetime, optional
The start of the date range to return. If not specified, start date for requested plot will be used.
- end_datedatetime.datetime, optional
The end of the date range to return. If not specified, end date for requested plot will be used.
- max_waitint or None, optional
The maximum time to wait for a compute job to complete before retrieving the plots. Default is
dr.enums.DEFAULT_MAX_WAIT
. If0
orNone
, the plots would be retrieved without attempting the computation.
- Returns:
- plotForecastVsActualPlot
a
ForecastVsActualPlot
representing Forecast vs Actual plot
Examples
import datarobot as dr import pandas as pd import matplotlib.pyplot as plt model = dr.DatetimeModel(project_id=project_id, id=model_id) plot = model.get_forecast_vs_actual_plot() df = pd.DataFrame.from_dict(plot.bins) # As an example, get the forecasts for the 10th point forecast_point_index = 10 # Pad the forecasts for plotting. The forecasts length must match the df length forecasts = [None] * forecast_point_index + df.forecasts[forecast_point_index] forecasts = forecasts + [None] * (len(df) - len(forecasts)) plt.plot(df.start_date, df.actual, label="Actual") plt.plot(df.start_date, forecasts, label="Forecast") forecast_point = df.start_date[forecast_point_index] plt.title("Forecast vs Actual (Forecast Point {})".format(forecast_point)) plt.legend() plt.savefig("forecast_vs_actual.png")
- get_forecast_vs_actual_plot_preview(backtest=0, source=SOURCE_TYPE.VALIDATION, series_id=None, max_wait=600)
Retrieve Forecast vs Actual preview plots for this model.
Added in version v2.25.
- Parameters:
- backtestint or string, optional
Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use
dr.enums.DATA_SUBSET.HOLDOUT
- sourcestring, optional
The source of the data for the backtest/holdout. Attribute must be one of
dr.enums.SOURCE_TYPE
- series_idstring, optional
The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
- max_waitint or None, optional
The maximum time to wait for a compute job to complete before retrieving the plots. Default is
dr.enums.DEFAULT_MAX_WAIT
. If0
orNone
, the plots would be retrieved without attempting the computation.
- Returns:
- plotForecastVsActualPlotPreview
a
ForecastVsActualPlotPreview
representing Forecast vs Actual plot preview
Examples
import datarobot as dr import pandas as pd model = dr.DatetimeModel(project_id=project_id, id=model_id) plot = model.get_forecast_vs_actual_plot_preview() df = pd.DataFrame.from_dict(plot.bins) figure = df.plot("start_date", ["actual", "predicted"]).get_figure() figure.savefig("forecast_vs_actual_preview.png")
- get_anomaly_over_time_plots_metadata()
Retrieve Anomaly over Time plots metadata for this model.
Added in version v2.25.
- Returns:
- metadataAnomalyOverTimePlotsMetadata
a
AnomalyOverTimePlotsMetadata
representing Anomaly over Time plots metadata
- get_anomaly_over_time_plot(backtest=0, source=SOURCE_TYPE.VALIDATION, series_id=None, resolution=None, max_bin_size=None, start_date=None, end_date=None, max_wait=600)
Retrieve Anomaly over Time plots for this model.
Added in version v2.25.
- Parameters:
- backtestint or string, optional
Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use
dr.enums.DATA_SUBSET.HOLDOUT
- sourcestring, optional
The source of the data for the backtest/holdout. Attribute must be one of
dr.enums.SOURCE_TYPE
- series_idstring, optional
The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
- resolutionstring, optional
Specifying at which resolution the data should be binned. If not provided an optimal resolution will be used to build chart data with number of bins <=
max_bin_size
. One ofdr.enums.DATETIME_TREND_PLOTS_RESOLUTION
.- max_bin_sizeint, optional
An int between
1
and1000
, which specifies the maximum number of bins for the retrieval. Default is500
.- start_datedatetime.datetime, optional
The start of the date range to return. If not specified, start date for requested plot will be used.
- end_datedatetime.datetime, optional
The end of the date range to return. If not specified, end date for requested plot will be used.
- max_waitint or None, optional
The maximum time to wait for a compute job to complete before retrieving the plots. Default is
dr.enums.DEFAULT_MAX_WAIT
. If0
orNone
, the plots would be retrieved without attempting the computation.
- Returns:
- plotAnomalyOverTimePlot
a
AnomalyOverTimePlot
representing Anomaly over Time plot
Examples
import datarobot as dr import pandas as pd model = dr.DatetimeModel(project_id=project_id, id=model_id) plot = model.get_anomaly_over_time_plot() df = pd.DataFrame.from_dict(plot.bins) figure = df.plot("start_date", "predicted").get_figure() figure.savefig("anomaly_over_time.png")
- get_anomaly_over_time_plot_preview(prediction_threshold=0.5, backtest=0, source=SOURCE_TYPE.VALIDATION, series_id=None, max_wait=600)
Retrieve Anomaly over Time preview plots for this model.
Added in version v2.25.
- Parameters:
- prediction_threshold: float, optional
Only bins with predictions exceeding this threshold will be returned in the response.
- backtestint or string, optional
Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use
dr.enums.DATA_SUBSET.HOLDOUT
- sourcestring, optional
The source of the data for the backtest/holdout. Attribute must be one of
dr.enums.SOURCE_TYPE
- series_idstring, optional
The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
- max_waitint or None, optional
The maximum time to wait for a compute job to complete before retrieving the plots. Default is
dr.enums.DEFAULT_MAX_WAIT
. If0
orNone
, the plots would be retrieved without attempting the computation.
- Returns:
- plotAnomalyOverTimePlotPreview
a
AnomalyOverTimePlotPreview
representing Anomaly over Time plot preview
Examples
import datarobot as dr import pandas as pd import matplotlib.pyplot as plt model = dr.DatetimeModel(project_id=project_id, id=model_id) plot = model.get_anomaly_over_time_plot_preview(prediction_threshold=0.01) df = pd.DataFrame.from_dict(plot.bins) x = pd.date_range( plot.start_date, plot.end_date, freq=df.end_date[0] - df.start_date[0] ) plt.plot(x, [0] * len(x), label="Date range") plt.plot(df.start_date, [0] * len(df.start_date), "ro", label="Anomaly") plt.yticks([]) plt.legend() plt.savefig("anomaly_over_time_preview.png")
- initialize_anomaly_assessment(backtest, source, series_id=None)
Initialize the anomaly assessment insight and calculate Shapley explanations for the most anomalous points in the subset. The insight is available for anomaly detection models in time series unsupervised projects which also support calculation of Shapley values.
- Parameters:
- backtest: int starting with 0 or “holdout”
The backtest to compute insight for.
- source: “training” or “validation”
The source to compute insight for.
- series_id: string
Required for multiseries projects. The series id to compute insight for. Say if there is a series column containing cities, the example of the series name to pass would be “Boston”
- Returns:
- AnomalyAssessmentRecord
- get_anomaly_assessment_records(backtest=None, source=None, series_id=None, limit=100, offset=0, with_data_only=False)
Retrieve computed Anomaly Assessment records for this model. Model must be an anomaly detection model in time series unsupervised project which also supports calculation of Shapley values.
Records can be filtered by the data backtest, source and series_id. The results can be limited.
Added in version v2.25.
- Parameters:
- backtest: int starting with 0 or “holdout”
The backtest of the data to filter records by.
- source: “training” or “validation”
The source of the data to filter records by.
- series_id: string
The series id to filter records by.
- limit: int, optional
- offset: int, optional
- with_data_only: bool, optional
Whether to return only records with preview and explanations available. False by default.
- Returns:
- recordslist of AnomalyAssessmentRecord
a
AnomalyAssessmentRecord
representing Anomaly Assessment Record
- get_feature_impact(with_metadata=False, backtest=None, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.
Else where this technique is sometimes called ‘Permutation Importance’.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Parameters:
- with_metadatabool
The flag indicating if the result should include the metadata as well.
- backtestint or string
The index of the backtest unless it is holdout then it is string ‘holdout’. This is supported only in DatetimeModels
- data_slice_filterDataSlice, optional
(New in version v3.4) A data slice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
- Returns:
- list or dict
The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.
Each List item is a dict with the keys
featureName
,impactNormalized
, andimpactUnnormalized
,redundantWith
andcount
.For dict response available keys are:
featureImpacts
- Feature Impact data as a dictionary. Each item is a dict withkeys:
featureName
,impactNormalized
, andimpactUnnormalized
, andredundantWith
.
shapBased
- A boolean that indicates whether Feature Impact was calculated usingShapley values.
ranRedundancyDetection
- A boolean that indicates whether redundant featureidentification was run while calculating this Feature Impact.
rowCount
- An integer or None that indicates the number of rows that was used tocalculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.
count
- An integer with the number of features under thefeatureImpacts
.
- Raises:
- ClientError (404)
If the feature impacts have not been computed.
- request_feature_impact(row_count=None, with_metadata=False, backtest=None, data_slice_filter=<datarobot.models.model.Sentinel object>)
Request feature impacts to be computed for the model.
See
get_feature_impact
for more information on the result of the job.- Parameters:
- row_countint
The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multi-class (that has a separate method) and time series projects.
- with_metadatabool
The flag indicating if the result should include the metadata as well.
- backtestint or string
The index of the backtest unless it is holdout then it is string ‘holdout’. This is supported only in DatetimeModels
- data_slice_filterDataSlice, optional
(New in version v3.4) A data slice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
- Returns:
- jobJob
A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.
- Raises:
- JobAlreadyRequested (422)
If the feature impacts have already been requested.
- get_or_request_feature_impact(max_wait=600, row_count=None, with_metadata=False, backtest=None, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve feature impact for the model, requesting a job if it hasn’t been run previously
- Parameters:
- max_waitint, optional
The maximum time to wait for a requested feature impact job to complete before erroring
- row_countint
The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multi-class (that has a separate method) and time series projects.
- with_metadatabool
The flag indicating if the result should include the metadata as well.
- backteststr
Feature Impact backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
- data_slice_filterDataSlice, optional
(New in version v3.4) A data slice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
- Returns
- ——-
- feature_impactslist or dict
The feature impact data. See
get_feature_impact
for the exact schema.
- request_lift_chart(source=None, backtest_index=None, data_slice_filter=<datarobot.models.model.Sentinel object>)
(New in version v3.4) Request the model Lift Chart for the specified backtest data slice.
- Parameters:
- sourcestr
(Deprecated in version v3.4) Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. If backtest_index is present then this will be ignored.
- backtest_indexstr
Lift chart data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
- data_slice_filterDataSlice, optional
A data slice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then request_lift_chart will raise a ValueError.
- Returns:
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type:
- get_lift_chart(source=None, backtest_index=None, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)
(New in version v3.4) Retrieve the model Lift chart for the specified backtest and data slice.
- Parameters:
- sourcestr
(Deprecated in version v3.4) Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model. If backtest_index is present then this will be ignored.
- backtest_indexstr
Lift chart data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filterDataSlice, optional
A data slice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- Returns:
- LiftChart
Model lift chart data
- Raises:
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- request_roc_curve(source=None, backtest_index=None, data_slice_filter=<datarobot.models.model.Sentinel object>)
(New in version v3.4) Request the binary model Roc Curve for the specified backtest and data slice.
- Parameters:
- sourcestr
(Deprecated in version v3.4) Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. If backtest_index is present then this will be ignored.
- backtest_indexstr
ROC curve data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
- data_slice_filterDataSlice, optional
A data slice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then request_roc_curve will raise a ValueError.
- Returns:
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type:
- get_roc_curve(source=None, backtest_index=None, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)
(New in version v3.4) Retrieve the ROC curve for a binary model for the specified backtest and data slice.
- Parameters:
- sourcestr
(Deprecated in version v3.4) ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model. If backtest_index is present then this will be ignored.
- backtest_indexstr
ROC curve data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
- fallback_to_parent_insightsbool
Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- data_slice_filterDataSlice, optional
A data slice used to filter the return values based on the data slice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
- Returns:
- RocCurve
Model ROC curve data
- Raises:
- ClientError
If the insight is not available for this model
- TypeError
If the underlying project type is multilabel
- ValueError
If data_slice_filter passed as None
- advanced_tune(params, description=None)
Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Parameters:
- paramsdict
Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- descriptionstr
Human-readable string describing the newly advanced-tuned model
- Returns:
- ModelJob
The created job to build the model
- Return type:
- delete()
Delete a model from the project’s leaderboard.
- Return type:
None
- download_scoring_code(file_name, source_code=False)
Download the Scoring Code JAR.
- Parameters:
- file_namestr
File path where scoring code will be saved.
- source_codebool, optional
Set to True to download source code archive. It will not be executable.
- Return type:
None
- download_training_artifact(file_name)
Retrieve trained artifact(s) from a model containing one or more custom tasks.
Artifact(s) will be downloaded to the specified local filepath.
- Parameters:
- file_namestr
File path where trained model artifact(s) will be saved.
- classmethod from_data(data)
Instantiate an object of this class using a dict.
- Parameters:
- datadict
Correctly snake_cased keys and their values.
- Return type:
TypeVar
(T
, bound= APIObject)
- get_advanced_tuning_parameters()
Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns:
- dict
A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.
tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.
tuning_parameters is a list of a dicts, each has the following keys
parameter_name : (str) name of the parameter (unique per task, see below)
parameter_id : (str) opaque ID string uniquely identifying parameter
default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)
current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.
task_name : (str) name of the task that this parameter belongs to
constraints: (dict) see the notes below
vertex_id: (str) ID of vertex that this parameter belongs to
- Return type:
Notes
The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.
"constraints": { "select": { "values": [<list(basestring or number) : possible values>] }, "ascii": {}, "unicode": {}, "int": { "min": <int : minimum valid value>, "max": <int : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "float": { "min": <float : minimum valid value>, "max": <float : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "intList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <int : minimum valid value>, "max_val": <int : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "floatList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <float : minimum valid value>, "max_val": <float : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> } }
The keys have meaning as follows:
select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.
- get_all_confusion_charts(fallback_to_parent_insights=False)
Retrieve a list of all confusion matrices available for the model.
- Parameters:
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns:
- list of ConfusionChart
Data for all available confusion charts for model.
- get_all_feature_impacts(data_slice_filter=None)
Retrieve a list of all feature impact results available for the model.
- Parameters:
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.
- Returns:
- list of dicts
Data for all available model feature impacts. Or an empty list if not data found.
Examples
model = datarobot.Model(id='model-id', project_id='project-id') # Get feature impact insights for sliced data data_slice = datarobot.DataSlice(id='data-slice-id') sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get feature impact insights for unsliced data data_slice = datarobot.DataSlice() unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get all feature impact insights all_fi = model.get_all_feature_impacts()
- get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)
Retrieve a list of all Lift charts available for the model.
- Parameters:
- fallback_to_parent_insightsbool, optional
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns:
- list of LiftChart
Data for all available model lift charts. Or an empty list if no data found.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get lift chart insights for sliced data sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id') # Get lift chart insights for unsliced data unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True) # Get all lift chart insights all_lift_charts = model.get_all_lift_charts()
- get_all_multiclass_lift_charts(fallback_to_parent_insights=False)
Retrieve a list of all Lift charts available for the model.
- Parameters:
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns:
- list of LiftChart
Data for all available model lift charts.
- get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)
Retrieve a list of all residuals charts available for the model.
- Parameters:
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns:
- list of ResidualsChart
Data for all available model residuals charts.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get residuals chart insights for sliced data sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id') # Get residuals chart insights for unsliced data unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True) # Get all residuals chart insights all_residuals_charts = model.get_all_residuals_charts()
- get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)
Retrieve a list of all ROC curves available for the model.
- Parameters:
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns:
- list of RocCurve
Data for all available model ROC curves. Or an empty list if no RocCurves are found.
Examples
model = datarobot.Model.get('project-id', 'model-id') ds_filter=DataSlice(id='data-slice-id') # Get roc curve insights for sliced data sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get roc curve insights for unsliced data data_slice_filter=DataSlice(id=None) unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get all roc curve insights all_roc_curves = model.get_all_roc_curves()
- get_confusion_chart(source, fallback_to_parent_insights=False)
Retrieve a multiclass model’s confusion matrix for the specified source.
- Parameters:
- sourcestr
Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns:
- ConfusionChart
Model ConfusionChart data
- Raises:
- ClientError
If the insight is not available for this model
- get_cross_class_accuracy_scores()
Retrieves a list of Cross Class Accuracy scores for the model.
- Returns:
- json
- get_data_disparity_insights(feature, class_name1, class_name2)
Retrieve a list of Cross Class Data Disparity insights for the model.
- Parameters:
- featurestr
Bias and Fairness protected feature name.
- class_name1str
One of the compared classes
- class_name2str
Another compared class
- Returns:
- json
- get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)
Retrieve a list of Per Class Bias insights for the model.
- Parameters:
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- offsetint, optional
Number of items to skip.
- limitint, optional
Number of items to return.
- Returns:
- json
- get_features_used()
Query the server to determine which features were used.
Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.
- Returns:
- featureslist of str
The names of the features used in the model.
- Return type:
List
[str
]
- get_frozen_child_models()
Retrieve the IDs for all models that are frozen from this model.
- Returns:
- A list of Models
- get_labelwise_roc_curves(source, fallback_to_parent_insights=False)
Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .
Added in version v2.24.
- Parameters:
- sourcestr
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- Returns:
- list of
LabelwiseRocCurve
Labelwise ROC Curve instances for
source
and all labels
- list of
- Raises:
- ClientError
If the insight is not available for this model
- get_missing_report_info()
Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.
- Returns:
- An iterable of MissingReportPerFeature
The queried model missing report, sorted by missing count (DESCENDING order).
- get_model_blueprint_chart()
Retrieve a diagram that can be used to understand data flow in the blueprint.
- Returns:
- ModelBlueprintChart
The queried model blueprint chart.
- get_model_blueprint_documents()
Get documentation for tasks used in this model.
- Returns:
- list of BlueprintTaskDocument
All documents available for the model.
- get_model_blueprint_json()
Get the blueprint json representation used by this model.
- Returns:
- BlueprintJson
Json representation of the blueprint stages.
- Return type:
Dict
[str
,Tuple
[List
[str
],List
[str
],str
]]
- get_multiclass_feature_impact()
For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Returns:
- feature_impactslist of dict
The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
- Raises:
- ClientError (404)
If the multiclass feature impacts have not been computed.
- get_multiclass_lift_chart(source, fallback_to_parent_insights=False)
Retrieve model Lift chart for the specified source.
- Parameters:
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns:
- list of LiftChart
Model lift chart data for each saved target class
- Raises:
- ClientError
If the insight is not available for this model
- get_multilabel_lift_charts(source, fallback_to_parent_insights=False)
Retrieve model Lift charts for the specified source.
Added in version v2.24.
- Parameters:
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns:
- list of LiftChart
Model lift chart data for each saved target class
- Raises:
- ClientError
If the insight is not available for this model
- get_num_iterations_trained()
Retrieves the number of estimators trained by early-stopping tree-based models.
– versionadded:: v2.22
- Returns:
- projectId: str
id of project containing the model
- modelId: str
id of the model
- data: array
list of numEstimatorsItem objects, one for each modeling stage.
- numEstimatorsItem will be of the form:
- stage: str
indicates the modeling stage (for multi-stage models); None of single-stage models
- numIterations: int
the number of estimators or iterations trained by the model
- get_parameters()
Retrieve model parameters.
- Returns:
- ModelParameters
Model parameters for this model.
- get_pareto_front()
Retrieve the Pareto Front for a Eureqa model.
This method is only supported for Eureqa models.
- Returns:
- ParetoFront
Model ParetoFront data
- get_prime_eligibility()
Check if this model can be approximated with DataRobot Prime
- Returns:
- prime_eligibilitydict
a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
- get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve model residuals chart for the specified source.
- Parameters:
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.
- Returns:
- ResidualsChart
Model residuals chart data
- Raises:
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- get_rulesets()
List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.
- Returns:
- rulesetslist of Ruleset
- Return type:
List
[Ruleset
]
- get_supported_capabilities()
Retrieves a summary of the capabilities supported by a model.
Added in version v2.14.
- Returns:
- supportsBlending: bool
whether the model supports blending
- supportsMonotonicConstraints: bool
whether the model supports monotonic constraints
- hasWordCloud: bool
whether the model has word cloud data available
- eligibleForPrime: bool
(Deprecated in version v3.6) whether the model is eligible for Prime
- hasParameters: bool
whether the model has parameters that can be retrieved
- supportsCodeGeneration: bool
(New in version v2.18) whether the model supports code generation
- supportsShap: bool
- (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based
feature Importance
- supportsEarlyStopping: bool
(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.
- get_uri()
- Returns:
- urlstr
Permanent static hyperlink to this model at leaderboard.
- Return type:
str
- get_word_cloud(exclude_stop_words=False)
Retrieve word cloud data for the model.
- Parameters:
- exclude_stop_wordsbool, optional
Set to True if you want stopwords filtered out of response.
- Returns:
- WordCloud
Word cloud data for the model.
- incremental_train(data_stage_id, training_data_name=None)
Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.
- Return type:
- classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)
Retrieve paginated model records, sorted by scores, with optional filtering.
- Parameters:
- sort_by_partition: str, one of `validation`, `backtesting`, `crossValidation` or `holdout`
Set the partition to use for sorted (by score) list of models. validation is the default.
- sort_by_metric: str
Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.
- with_metric: str
For a single-metric list of results, specify that project metric.
- search_term: str
If specified, only models containing the term in their name or processes are returned.
- featurelists: list of str
If specified, only models trained on selected featurelists are returned.
- families: list of str
If specified, only models belonging to selected families are returned.
- blueprints: list of str
If specified, only models trained on specified blueprint IDs are returned.
- labels: list of str, `starred` or `prepared for deployment`
If specified, only models tagged with all listed labels are returned.
- characteristics: list of str
If specified, only models matching all listed characteristics are returned.
- training_filters: list of str
If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings
- number_of_clusters: list of int
Filter models by number of clusters. Applicable only in unsupervised clustering projects.
- limit: int
- offset: int
- Returns:
- generic_models: list of GenericModel
- Return type:
List
[GenericModel
]
- open_in_browser()
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type:
None
- request_approximation()
Request an approximation of this model using DataRobot Prime
This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.
- Returns:
- jobJob
the job generating the rulesets
- request_cross_class_accuracy_scores()
Request data disparity insights to be computed for the model.
- Returns:
- status_idstr
A statusId of computation request.
- request_data_disparity_insights(feature, compared_class_names)
Request data disparity insights to be computed for the model.
- Parameters:
- featurestr
Bias and Fairness protected feature name.
- compared_class_nameslist(str)
List of two classes to compare
- Returns:
- status_idstr
A statusId of computation request.
- request_external_test(dataset_id, actual_value_column=None)
Request external test to compute scores and insights on an external test dataset
- Parameters:
- dataset_idstring
The dataset to make predictions against (as uploaded from Project.upload_dataset)
- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- Returns
- ——-
- jobJob
a Job representing external dataset insights computation
- request_fairness_insights(fairness_metrics_set=None)
Request fairness insights to be computed for the model.
- Parameters:
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- Returns:
- status_idstr
A statusId of computation request.
- Return type:
str
- request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)
Train a new frozen model with parameters from this model.
Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.
Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.
Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Parameters:
- training_row_countint, optional
the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
- training_durationstr, optional
a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
- training_start_datedatetime.datetime, optional
the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
- training_end_datedatetime.datetime, optional
the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
- time_window_sample_pctint, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_methodstr, optional
(New in version v2.23) defines the way training data is selected. Can be either
random
orlatest
. In combination withtraining_row_count
defines how rows are selected from backtest (latest
by default). When training data is defined using time range (training_duration
oruse_project_settings
) this setting changes the waytime_window_sample_pct
is applied (random
by default). Applicable to OTV projects only.
- Returns:
- model_jobModelJob
the modeling job training a frozen model
- Return type:
- request_per_class_fairness_insights(fairness_metrics_set=None)
Request per-class fairness insights be computed for the model.
- Parameters:
- fairness_metrics_setstr, optional
The fairness metric used to calculate the fairness scores. Value can be any one of <datarobot.enums.FairnessMetricsSet>.
- Returns:
- status_check_jobStatusCheckJob
The returned object contains all needed logic for a periodical status check of an async job.
- Return type:
- request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)
Requests predictions against a previously uploaded dataset.
- Parameters:
- dataset_idstring, optional
The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataset
Dataset
, optional The dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataframepd.DataFrame, optional
(New in v3.0) The dataframe to make predictions against
- file_pathstr, optional
(New in v3.0) Path to file to make predictions against
- fileIOBase, optional
(New in v3.0) File to make predictions against
- include_prediction_intervalsbool, optional
(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_sizeint, optional
(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_pointdatetime.datetime or None, optional
(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the
response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
- max_explanations: (New in version v2.21) int optional; specifies the maximum number of
explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
- max_ngram_explanations: optional; int or str
(New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.
- Returns:
- jobPredictJob
The job computing the predictions
- Return type:
- request_residuals_chart(source, data_slice_id=None)
Request the model residuals chart for the specified source.
- Parameters:
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type:
- set_prediction_threshold(threshold)
Set a custom prediction threshold for the model.
May not be used once
prediction_threshold_read_only
is True for this model.- Parameters:
- thresholdfloat
only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).
- star_model()
Mark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type:
None
- start_advanced_tuning_session()
Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns:
- AdvancedTuningSession
Session for setting up and running Advanced Tuning on a model
- train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)
Trains this model on a different featurelist or sample size.
Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Parameters:
- featurelist_idstr, optional
the featurelist to use to train the model. If not specified, the featurelist of this model is used.
- training_row_countint, optional
the number of rows of data that should be used to train the model. If specified, neither
training_duration
noruse_project_settings
may be specified.- training_durationstr, optional
a duration string specifying what time range the data used to train the model should span. If specified, neither
training_row_count
noruse_project_settings
may be specified.- use_project_settingsbool, optional
(New in version v2.20) defaults to
False
. IfTrue
, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neithertraining_row_count
nortraining_duration
may be specified.- time_window_sample_pctint, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_methodstr, optional
(New in version v2.23) defines the way training data is selected. Can be either
random
orlatest
. In combination withtraining_row_count
defines how rows are selected from backtest (latest
by default). When training data is defined using time range (training_duration
oruse_project_settings
) this setting changes the waytime_window_sample_pct
is applied (random
by default). Applicable to OTV projects only.- monotonic_increasing_featurelist_idstr, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_idstr, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- n_clusters: int, optional
(New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.
- Returns:
- jobModelJob
the created job to build the model
- Return type:
- train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)
Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.
This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.
- Parameters:
- data_stage_id: str
The id of the data stage to use for training.
- training_data_namestr, optional
The name of the iteration or data stage to indicate what the incremental learning was performed on.
- data_stage_encodingstr, optional
The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252
- data_stage_encodingstr, optional
The delimiter used by the data in the data stage (default: ‘,’).
- data_stage_compressionstr, optional
The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip
- Returns:
- jobModelJob
The created job that is retraining the model
- unstar_model()
Unmark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type:
None
Frozen Model
- class datarobot.models.FrozenModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, is_n_clusters_dynamically_determined=None, blueprint_id=None, metrics=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, n_clusters=None, has_empty_clusters=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)
Represents a model tuned with parameters which are derived from another model
All durations are specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Attributes:
- idstr
the id of the model
- project_idstr
the id of the project the model belongs to
- processeslist of str
the processes used by the model
- featurelist_namestr
the name of the featurelist used by the model
- featurelist_idstr
the id of the featurelist used by the model
- sample_pctfloat
the percentage of the project dataset used in training the model
- training_row_countint or None
the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
- training_durationstr or None
only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- model_typestr
what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
- model_categorystr
what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
- is_frozenbool
whether this model is a frozen model
- parent_model_idstr
the id of the model that tuning parameters are derived from
- blueprint_idstr
the id of the blueprint used in this model
- metricsdict
a mapping from each metric to the model’s scores for that metric
- monotonic_increasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraintsbool
optional, whether this model supports enforcing monotonic constraints
- is_starredbool
whether this model marked as starred
- prediction_thresholdfloat
for binary classification projects, the threshold used for predictions
- prediction_threshold_read_onlybool
indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- model_numberinteger
model number assigned to a model
- supports_composable_mlbool or None
(New in version v2.26) whether this model is supported in the Composable ML.
- classmethod get(project_id, model_id)
Retrieve a specific frozen model.
- Parameters:
- project_idstr
The project’s id.
- model_idstr
The
model_id
of the leaderboard item to retrieve.
- Returns:
- modelFrozenModel
The queried instance.
RatingTableModel
- class datarobot.models.RatingTableModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, rating_table_id=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)
A model that has a rating table.
All durations are specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Attributes:
- idstr
the id of the model
- project_idstr
the id of the project the model belongs to
- processeslist of str
the processes used by the model
- featurelist_namestr
the name of the featurelist used by the model
- featurelist_idstr
the id of the featurelist used by the model
- sample_pctfloat or None
the percentage of the project dataset used in training the model. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date and training_end_date instead.
- training_row_countint or None
the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
- training_durationstr or None
only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- model_typestr
what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
- model_categorystr
what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
- is_frozenbool
whether this model is a frozen model
- blueprint_idstr
the id of the blueprint used in this model
- metricsdict
a mapping from each metric to the model’s scores for that metric
- rating_table_idstr
the id of the rating table that belongs to this model
- monotonic_increasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraintsbool
optional, whether this model supports enforcing monotonic constraints
- is_starredbool
whether this model marked as starred
- prediction_thresholdfloat
for binary classification projects, the threshold used for predictions
- prediction_threshold_read_onlybool
indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- model_numberinteger
model number assigned to a model
- supports_composable_mlbool or None
(New in version v2.26) whether this model is supported in the Composable ML.
- classmethod get(project_id, model_id)
Retrieve a specific rating table model
If the project does not have a rating table, a ClientError will occur.
- Parameters:
- project_idstr
the id of the project the model belongs to
- model_idstr
the id of the model to retrieve
- Returns:
- modelRatingTableModel
the model
- classmethod create_from_rating_table(project_id, rating_table_id)
Creates a new model from a validated rating table record. The RatingTable must not be associated with an existing model.
- Parameters:
- project_idstr
the id of the project the rating table belongs to
- rating_table_idstr
the id of the rating table to create this model from
- Returns:
- job: Job
an instance of created async job
- Raises:
- ClientError (422)
Raised if creating model from a RatingTable that failed validation
- JobAlreadyRequested
Raised if creating model from a RatingTable that is already associated with a RatingTableModel
- Return type:
- advanced_tune(params, description=None)
Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Parameters:
- paramsdict
Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- descriptionstr
Human-readable string describing the newly advanced-tuned model
- Returns:
- ModelJob
The created job to build the model
- Return type:
- cross_validate()
Run cross validation on the model.
Note
To perform Cross Validation on a new model with new parameters, use
train
instead.- Returns:
- ModelJob
The created job to build the model
- delete()
Delete a model from the project’s leaderboard.
- Return type:
None
- download_scoring_code(file_name, source_code=False)
Download the Scoring Code JAR.
- Parameters:
- file_namestr
File path where scoring code will be saved.
- source_codebool, optional
Set to True to download source code archive. It will not be executable.
- Return type:
None
- download_training_artifact(file_name)
Retrieve trained artifact(s) from a model containing one or more custom tasks.
Artifact(s) will be downloaded to the specified local filepath.
- Parameters:
- file_namestr
File path where trained model artifact(s) will be saved.
- classmethod from_data(data)
Instantiate an object of this class using a dict.
- Parameters:
- datadict
Correctly snake_cased keys and their values.
- Return type:
TypeVar
(T
, bound= APIObject)
- get_advanced_tuning_parameters()
Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns:
- dict
A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.
tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.
tuning_parameters is a list of a dicts, each has the following keys
parameter_name : (str) name of the parameter (unique per task, see below)
parameter_id : (str) opaque ID string uniquely identifying parameter
default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)
current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.
task_name : (str) name of the task that this parameter belongs to
constraints: (dict) see the notes below
vertex_id: (str) ID of vertex that this parameter belongs to
- Return type:
Notes
The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.
"constraints": { "select": { "values": [<list(basestring or number) : possible values>] }, "ascii": {}, "unicode": {}, "int": { "min": <int : minimum valid value>, "max": <int : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "float": { "min": <float : minimum valid value>, "max": <float : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "intList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <int : minimum valid value>, "max_val": <int : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "floatList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <float : minimum valid value>, "max_val": <float : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> } }
The keys have meaning as follows:
select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.
- get_all_confusion_charts(fallback_to_parent_insights=False)
Retrieve a list of all confusion matrices available for the model.
- Parameters:
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns:
- list of ConfusionChart
Data for all available confusion charts for model.
- get_all_feature_impacts(data_slice_filter=None)
Retrieve a list of all feature impact results available for the model.
- Parameters:
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.
- Returns:
- list of dicts
Data for all available model feature impacts. Or an empty list if not data found.
Examples
model = datarobot.Model(id='model-id', project_id='project-id') # Get feature impact insights for sliced data data_slice = datarobot.DataSlice(id='data-slice-id') sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get feature impact insights for unsliced data data_slice = datarobot.DataSlice() unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get all feature impact insights all_fi = model.get_all_feature_impacts()
- get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)
Retrieve a list of all Lift charts available for the model.
- Parameters:
- fallback_to_parent_insightsbool, optional
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns:
- list of LiftChart
Data for all available model lift charts. Or an empty list if no data found.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get lift chart insights for sliced data sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id') # Get lift chart insights for unsliced data unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True) # Get all lift chart insights all_lift_charts = model.get_all_lift_charts()
- get_all_multiclass_lift_charts(fallback_to_parent_insights=False)
Retrieve a list of all Lift charts available for the model.
- Parameters:
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns:
- list of LiftChart
Data for all available model lift charts.
- get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)
Retrieve a list of all residuals charts available for the model.
- Parameters:
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns:
- list of ResidualsChart
Data for all available model residuals charts.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get residuals chart insights for sliced data sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id') # Get residuals chart insights for unsliced data unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True) # Get all residuals chart insights all_residuals_charts = model.get_all_residuals_charts()
- get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)
Retrieve a list of all ROC curves available for the model.
- Parameters:
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns:
- list of RocCurve
Data for all available model ROC curves. Or an empty list if no RocCurves are found.
Examples
model = datarobot.Model.get('project-id', 'model-id') ds_filter=DataSlice(id='data-slice-id') # Get roc curve insights for sliced data sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get roc curve insights for unsliced data data_slice_filter=DataSlice(id=None) unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get all roc curve insights all_roc_curves = model.get_all_roc_curves()
- get_confusion_chart(source, fallback_to_parent_insights=False)
Retrieve a multiclass model’s confusion matrix for the specified source.
- Parameters:
- sourcestr
Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns:
- ConfusionChart
Model ConfusionChart data
- Raises:
- ClientError
If the insight is not available for this model
- get_cross_class_accuracy_scores()
Retrieves a list of Cross Class Accuracy scores for the model.
- Returns:
- json
- get_cross_validation_scores(partition=None, metric=None)
Return a dictionary, keyed by metric, showing cross validation scores per partition.
Cross Validation should already have been performed using
cross_validate
ortrain
.Note
Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.
- Parameters:
- partitionfloat
optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.
- metric: unicode
optional name of the metric to filter to resulting cross validation scores by
- Returns:
- cross_validation_scores: dict
A dictionary keyed by metric showing cross validation scores per partition.
- get_data_disparity_insights(feature, class_name1, class_name2)
Retrieve a list of Cross Class Data Disparity insights for the model.
- Parameters:
- featurestr
Bias and Fairness protected feature name.
- class_name1str
One of the compared classes
- class_name2str
Another compared class
- Returns:
- json
- get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)
Retrieve a list of Per Class Bias insights for the model.
- Parameters:
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- offsetint, optional
Number of items to skip.
- limitint, optional
Number of items to return.
- Returns:
- json
- get_feature_effect(source, data_slice_id=None)
Retrieve Feature Effects for the model.
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the available sources.- Parameters:
- sourcestring
The source Feature Effects are retrieved for.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, retrieve unsliced insight data.
- Returns:
- feature_effectsFeatureEffects
The feature effects data.
- Raises:
- ClientError (404)
If the feature effects have not been computed or source is not valid value.
- get_feature_effect_metadata()
Retrieve Feature Effects metadata. Response contains status and available model sources.
Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for the project.
Use source to retrieve Feature Effects, selecting one of the provided sources.
- Returns:
- feature_effect_metadata: FeatureEffectMetadata
- get_feature_effects_multiclass(source='training', class_=None)
Retrieve Feature Effects for the multiclass model.
Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the available sources.- Parameters:
- sourcestr
The source Feature Effects are retrieved for.
- class_str or None
The class name Feature Effects are retrieved for.
- Returns:
- list
The list of multiclass feature effects.
- Raises:
- ClientError (404)
If Feature Effects have not been computed or source is not valid value.
- get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.
Elsewhere this technique is sometimes called ‘Permutation Importance’.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Parameters:
- with_metadatabool
The flag indicating if the result should include the metadata as well.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.
- Returns:
- list or dict
The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.
Each List item is a dict with the keys
featureName
,impactNormalized
, andimpactUnnormalized
,redundantWith
andcount
.For dict response available keys are:
featureImpacts
- Feature Impact data as a dictionary. Each item is a dict withkeys:
featureName
,impactNormalized
, andimpactUnnormalized
, andredundantWith
.
shapBased
- A boolean that indicates whether Feature Impact was calculated usingShapley values.
ranRedundancyDetection
- A boolean that indicates whether redundant featureidentification was run while calculating this Feature Impact.
rowCount
- An integer or None that indicates the number of rows that was used tocalculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.
count
- An integer with the number of features under thefeatureImpacts
.
- Raises:
- ClientError (404)
If the feature impacts have not been computed.
- ValueError
If data_slice_filter passed as None
- get_features_used()
Query the server to determine which features were used.
Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.
- Returns:
- featureslist of str
The names of the features used in the model.
- Return type:
List
[str
]
- get_frozen_child_models()
Retrieve the IDs for all models that are frozen from this model.
- Returns:
- A list of Models
- get_labelwise_roc_curves(source, fallback_to_parent_insights=False)
Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .
Added in version v2.24.
- Parameters:
- sourcestr
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- Returns:
- list of
LabelwiseRocCurve
Labelwise ROC Curve instances for
source
and all labels
- list of
- Raises:
- ClientError
If the insight is not available for this model
- get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve the model Lift chart for the specified source.
- Parameters:
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- Returns:
- LiftChart
Model lift chart data
- Raises:
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- get_missing_report_info()
Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.
- Returns:
- An iterable of MissingReportPerFeature
The queried model missing report, sorted by missing count (DESCENDING order).
- get_model_blueprint_chart()
Retrieve a diagram that can be used to understand data flow in the blueprint.
- Returns:
- ModelBlueprintChart
The queried model blueprint chart.
- get_model_blueprint_documents()
Get documentation for tasks used in this model.
- Returns:
- list of BlueprintTaskDocument
All documents available for the model.
- get_model_blueprint_json()
Get the blueprint json representation used by this model.
- Returns:
- BlueprintJson
Json representation of the blueprint stages.
- Return type:
Dict
[str
,Tuple
[List
[str
],List
[str
],str
]]
- get_multiclass_feature_impact()
For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Returns:
- feature_impactslist of dict
The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
- Raises:
- ClientError (404)
If the multiclass feature impacts have not been computed.
- get_multiclass_lift_chart(source, fallback_to_parent_insights=False)
Retrieve model Lift chart for the specified source.
- Parameters:
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns:
- list of LiftChart
Model lift chart data for each saved target class
- Raises:
- ClientError
If the insight is not available for this model
- get_multilabel_lift_charts(source, fallback_to_parent_insights=False)
Retrieve model Lift charts for the specified source.
Added in version v2.24.
- Parameters:
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns:
- list of LiftChart
Model lift chart data for each saved target class
- Raises:
- ClientError
If the insight is not available for this model
- get_num_iterations_trained()
Retrieves the number of estimators trained by early-stopping tree-based models.
– versionadded:: v2.22
- Returns:
- projectId: str
id of project containing the model
- modelId: str
id of the model
- data: array
list of numEstimatorsItem objects, one for each modeling stage.
- numEstimatorsItem will be of the form:
- stage: str
indicates the modeling stage (for multi-stage models); None of single-stage models
- numIterations: int
the number of estimators or iterations trained by the model
- get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)
Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.
See
get_feature_effect_metadata
for retrieving information of source.- Parameters:
- sourcestring
The source Feature Effects are retrieved for.
- max_waitint, optional
The maximum time to wait for a requested Feature Effect job to complete before erroring.
- row_countint, optional
(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- feature_effectsFeatureEffects
The Feature Effects data.
- get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)
Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.
- Parameters:
- sourcestring
The source Feature Effects retrieve for.
- class_str or None
The class name Feature Effects retrieve for.
- row_countint
The number of rows from dataset to use for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
- featureslist or None
The list of features used to calculate Feature Effects.
- max_waitint, optional
The maximum time to wait for a requested Feature Effects job to complete before erroring.
- Returns:
- feature_effectslist of FeatureEffectsMulticlass
The list of multiclass feature effects data.
- get_or_request_feature_impact(max_wait=600, **kwargs)
Retrieve feature impact for the model, requesting a job if it hasn’t been run previously
- Parameters:
- max_waitint, optional
The maximum time to wait for a requested feature impact job to complete before erroring
- **kwargs
Arbitrary keyword arguments passed to
request_feature_impact
.
- Returns:
- feature_impactslist or dict
The feature impact data. See
get_feature_impact
for the exact schema.
- get_parameters()
Retrieve model parameters.
- Returns:
- ModelParameters
Model parameters for this model.
- get_pareto_front()
Retrieve the Pareto Front for a Eureqa model.
This method is only supported for Eureqa models.
- Returns:
- ParetoFront
Model ParetoFront data
- get_prime_eligibility()
Check if this model can be approximated with DataRobot Prime
- Returns:
- prime_eligibilitydict
a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
- get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve model residuals chart for the specified source.
- Parameters:
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.
- Returns:
- ResidualsChart
Model residuals chart data
- Raises:
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)
Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.
- Parameters:
- sourcestr
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
- Returns:
- RocCurve
Model ROC curve data
- Raises:
- ClientError
If the insight is not available for this model
- (New in version v3.0) TypeError
If the underlying project type is multilabel
- ValueError
If data_slice_filter passed as None
- get_rulesets()
List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.
- Returns:
- rulesetslist of Ruleset
- Return type:
List
[Ruleset
]
- get_supported_capabilities()
Retrieves a summary of the capabilities supported by a model.
Added in version v2.14.
- Returns:
- supportsBlending: bool
whether the model supports blending
- supportsMonotonicConstraints: bool
whether the model supports monotonic constraints
- hasWordCloud: bool
whether the model has word cloud data available
- eligibleForPrime: bool
(Deprecated in version v3.6) whether the model is eligible for Prime
- hasParameters: bool
whether the model has parameters that can be retrieved
- supportsCodeGeneration: bool
(New in version v2.18) whether the model supports code generation
- supportsShap: bool
- (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based
feature Importance
- supportsEarlyStopping: bool
(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.
- get_uri()
- Returns:
- urlstr
Permanent static hyperlink to this model at leaderboard.
- Return type:
str
- get_word_cloud(exclude_stop_words=False)
Retrieve word cloud data for the model.
- Parameters:
- exclude_stop_wordsbool, optional
Set to True if you want stopwords filtered out of response.
- Returns:
- WordCloud
Word cloud data for the model.
- incremental_train(data_stage_id, training_data_name=None)
Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.
- Return type:
- classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)
Retrieve paginated model records, sorted by scores, with optional filtering.
- Parameters:
- sort_by_partition: str, one of `validation`, `backtesting`, `crossValidation` or `holdout`
Set the partition to use for sorted (by score) list of models. validation is the default.
- sort_by_metric: str
Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.
- with_metric: str
For a single-metric list of results, specify that project metric.
- search_term: str
If specified, only models containing the term in their name or processes are returned.
- featurelists: list of str
If specified, only models trained on selected featurelists are returned.
- families: list of str
If specified, only models belonging to selected families are returned.
- blueprints: list of str
If specified, only models trained on specified blueprint IDs are returned.
- labels: list of str, `starred` or `prepared for deployment`
If specified, only models tagged with all listed labels are returned.
- characteristics: list of str
If specified, only models matching all listed characteristics are returned.
- training_filters: list of str
If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings
- number_of_clusters: list of int
Filter models by number of clusters. Applicable only in unsupervised clustering projects.
- limit: int
- offset: int
- Returns:
- generic_models: list of GenericModel
- Return type:
List
[GenericModel
]
- open_in_browser()
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type:
None
- request_approximation()
Request an approximation of this model using DataRobot Prime
This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.
- Returns:
- jobJob
the job generating the rulesets
- request_cross_class_accuracy_scores()
Request data disparity insights to be computed for the model.
- Returns:
- status_idstr
A statusId of computation request.
- request_data_disparity_insights(feature, compared_class_names)
Request data disparity insights to be computed for the model.
- Parameters:
- featurestr
Bias and Fairness protected feature name.
- compared_class_nameslist(str)
List of two classes to compare
- Returns:
- status_idstr
A statusId of computation request.
- request_external_test(dataset_id, actual_value_column=None)
Request external test to compute scores and insights on an external test dataset
- Parameters:
- dataset_idstring
The dataset to make predictions against (as uploaded from Project.upload_dataset)
- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- Returns
- ——-
- jobJob
a Job representing external dataset insights computation
- request_fairness_insights(fairness_metrics_set=None)
Request fairness insights to be computed for the model.
- Parameters:
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- Returns:
- status_idstr
A statusId of computation request.
- Return type:
str
- request_feature_effect(row_count=None, data_slice_id=None)
Submit request to compute Feature Effects for the model.
See
get_feature_effect
for more information on the result of the job.- Parameters:
- row_countint
(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- jobJob
A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
- Raises:
- JobAlreadyRequested (422)
If the feature effect have already been requested.
- request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)
Request Feature Effects computation for the multiclass model.
See
get_feature_effect
for more information on the result of the job.- Parameters:
- row_countint
The number of rows from dataset to use for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by feature impact) used to calculate Feature Effects.
- featureslist or None
The list of features used to calculate Feature Effects.
- Returns:
- jobJob
A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.
- request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)
Request feature impacts to be computed for the model.
See
get_feature_impact
for more information on the result of the job.- Parameters:
- row_countint, optional
The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.
- with_metadatabool, optional
Flag indicating whether the result should include the metadata. If true, metadata is included.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- jobJob or status_id
Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.
- Raises:
- JobAlreadyRequested (422)
If the feature impacts have already been requested.
- request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)
Train a new frozen model with parameters from this model.
Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.
Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.
Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Parameters:
- training_row_countint, optional
the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
- training_durationstr, optional
a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
- training_start_datedatetime.datetime, optional
the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
- training_end_datedatetime.datetime, optional
the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
- time_window_sample_pctint, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_methodstr, optional
(New in version v2.23) defines the way training data is selected. Can be either
random
orlatest
. In combination withtraining_row_count
defines how rows are selected from backtest (latest
by default). When training data is defined using time range (training_duration
oruse_project_settings
) this setting changes the waytime_window_sample_pct
is applied (random
by default). Applicable to OTV projects only.
- Returns:
- model_jobModelJob
the modeling job training a frozen model
- Return type:
- request_frozen_model(sample_pct=None, training_row_count=None)
Train a new frozen model with parameters from this model :rtype:
ModelJob
Note
This method only works if project the model belongs to is not datetime partitioned. If it is, use
request_frozen_datetime_model
instead.Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
- Parameters:
- sample_pctfloat
optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.
- training_row_countint
(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.
- Returns:
- model_jobModelJob
the modeling job training a frozen model
- request_lift_chart(source, data_slice_id=None)
Request the model Lift Chart for the specified source.
- Parameters:
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type:
- request_per_class_fairness_insights(fairness_metrics_set=None)
Request per-class fairness insights be computed for the model.
- Parameters:
- fairness_metrics_setstr, optional
The fairness metric used to calculate the fairness scores. Value can be any one of <datarobot.enums.FairnessMetricsSet>.
- Returns:
- status_check_jobStatusCheckJob
The returned object contains all needed logic for a periodical status check of an async job.
- Return type:
- request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)
Requests predictions against a previously uploaded dataset.
- Parameters:
- dataset_idstring, optional
The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataset
Dataset
, optional The dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataframepd.DataFrame, optional
(New in v3.0) The dataframe to make predictions against
- file_pathstr, optional
(New in v3.0) Path to file to make predictions against
- fileIOBase, optional
(New in v3.0) File to make predictions against
- include_prediction_intervalsbool, optional
(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_sizeint, optional
(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_pointdatetime.datetime or None, optional
(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the
response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
- max_explanations: (New in version v2.21) int optional; specifies the maximum number of
explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
- max_ngram_explanations: optional; int or str
(New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.
- Returns:
- jobPredictJob
The job computing the predictions
- Return type:
- request_residuals_chart(source, data_slice_id=None)
Request the model residuals chart for the specified source.
- Parameters:
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type:
- request_roc_curve(source, data_slice_id=None)
Request the model Roc Curve for the specified source.
- Parameters:
- sourcestr
Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns:
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type:
- request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)
Start a job to build training predictions
- Parameters:
- data_subsetstr
data set definition to build predictions on. Choices are:
- dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
models in datetime partitioned projects
- dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
all data except training set. Not valid for models in datetime partitioned projects
dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
- dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
- explanation_algorithmdr.enums.EXPLANATIONS_ALGORITHM
(New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).
- max_explanationsint
(New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the
max_explanations
, the sum of remaining values will also be returned asshap_remaining_total
. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored ifexplanation_algorithm
is not set.
- Returns:
- Job
an instance of created async job
- retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)
Submit a job to the queue to train a blender model.
- Parameters:
- sample_pct: float, optional
The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
- featurelist_idstr, optional
The featurelist id
- training_row_countint, optional
The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.
- n_clusters: int, optional
(new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.
- Returns:
- jobModelJob
The created job that is retraining the model
- Return type:
- set_prediction_threshold(threshold)
Set a custom prediction threshold for the model.
May not be used once
prediction_threshold_read_only
is True for this model.- Parameters:
- thresholdfloat
only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).
- star_model()
Mark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type:
None
- start_advanced_tuning_session()
Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns:
- AdvancedTuningSession
Session for setting up and running Advanced Tuning on a model
- train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)
Train the blueprint used in model on a particular featurelist or amount of data.
This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.
Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.
In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class. :rtype:
str
Note
For datetime partitioned projects, see
train_datetime
instead.- Parameters:
- sample_pctfloat, optional
The amount of data to use for training, as a percentage of the project dataset from 0 to 100.
- featurelist_idstr, optional
The identifier of the featurelist to use. If not defined, the featurelist of this model is used.
- scoring_typestr, optional
Either
validation
orcrossValidation
(alsodr.SCORING_TYPE.validation
ordr.SCORING_TYPE.cross_validation
).validation
is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning,crossValidation
can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.- training_row_countint, optional
The number of rows to use to train the requested model.
- monotonic_increasing_featurelist_idstr
(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_idstr
(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.
- Returns:
- model_job_idstr
id of created job, can be used as parameter to
ModelJob.get
method orwait_for_async_model_creation
function
Examples
project = Project.get('project-id') model = Model.get('project-id', 'model-id') model_job_id = model.train(training_row_count=project.max_train_rows)
- train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)
Trains this model on a different featurelist or sample size.
Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Parameters:
- featurelist_idstr, optional
the featurelist to use to train the model. If not specified, the featurelist of this model is used.
- training_row_countint, optional
the number of rows of data that should be used to train the model. If specified, neither
training_duration
noruse_project_settings
may be specified.- training_durationstr, optional
a duration string specifying what time range the data used to train the model should span. If specified, neither
training_row_count
noruse_project_settings
may be specified.- use_project_settingsbool, optional
(New in version v2.20) defaults to
False
. IfTrue
, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neithertraining_row_count
nortraining_duration
may be specified.- time_window_sample_pctint, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_methodstr, optional
(New in version v2.23) defines the way training data is selected. Can be either
random
orlatest
. In combination withtraining_row_count
defines how rows are selected from backtest (latest
by default). When training data is defined using time range (training_duration
oruse_project_settings
) this setting changes the waytime_window_sample_pct
is applied (random
by default). Applicable to OTV projects only.- monotonic_increasing_featurelist_idstr, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_idstr, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- n_clusters: int, optional
(New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.
- Returns:
- jobModelJob
the created job to build the model
- Return type:
- train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)
Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.
This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.
- Parameters:
- data_stage_id: str
The id of the data stage to use for training.
- training_data_namestr, optional
The name of the iteration or data stage to indicate what the incremental learning was performed on.
- data_stage_encodingstr, optional
The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252
- data_stage_encodingstr, optional
The delimiter used by the data in the data stage (default: ‘,’).
- data_stage_compressionstr, optional
The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip
- Returns:
- jobModelJob
The created job that is retraining the model
- unstar_model()
Unmark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type:
None
Combined Model
See API reference for Combined Model in Segmented Modeling API Reference