Model API

class datarobot.models.Model(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, project=None, data=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None)

A model trained on a project’s dataset capable of making predictions

Attributes

id (str) the id of the model
project_id (str) the id of the project the model belongs to
processes (list of str) the processes used by the model
featurelist_name (str) the name of the featurelist used by the model
featurelist_id (str) the id of the featurelist used by the model
sample_pct (float or None) the percentage of the project dataset used in training the model. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date and training_end_date instead.
training_row_count (int or None) the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
training_duration (str or None) only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
training_start_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
training_end_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
model_type (str) what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
model_category (str) what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
is_frozen (bool) whether this model is a frozen model
blueprint_id (str) the id of the blueprint used in this model
metrics (dict) a mapping from each metric to the model’s scores for that metric
monotonic_increasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
monotonic_decreasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
supports_monotonic_constraints (bool) optinonal, whether this model supports enforcing montonic constraints
classmethod get(project, model_id)

Retrieve a specific model.

Parameters:

project : str

The project’s id.

model_id : str

The model_id of the leaderboard item to retrieve.

Returns:

model : Model

The queried instance.

Raises:

ValueError

passed project parameter value is of not supported type

classmethod fetch_resource_data(*args, **kwargs)

(Deprecated.) Used to acquire model data directly from its url.

Consider using get instead, as this is a convenience function used for development of datarobot

Parameters:

url : string

The resource we are acquiring

join_endpoint : boolean, optional

Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint

Returns:

model_data : dict

The queried model’s data

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:

features : list of str

The names of the features used in the model.

delete()

Delete a model from the project’s leaderboard.

Returns:

url : str

Permanent static hyperlink to this model at leaderboard.

open_model_browser()

Opens model at project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Note

For datetime partitioned projects, use train_datetime instead.

Parameters:

sample_pct : float, optional

The amount of data to use for training, as a percentage of the project dataset from 0 to 100.

featurelist_id : str, optional

The identifier of the featurelist to use. If not defined, the featurelist of this model is used.

scoring_type : str, optional

Either SCORING_TYPE.validation or SCORING_TYPE.cross_validation. SCORING_TYPE.validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, SCORING_TYPE.cross_validation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.

training_row_count : int, optional

The number of rows to use to train the requested model.

monotonic_increasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:

model_job_id : str

id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Examples

project = Project.get('p-id')
model = Model.get('p-id', 'l-id')
model_job_id = model.train(training_row_count=project.max_train_rows)
train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None)

Train this model on a different featurelist or amount of data

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

Parameters:

featurelist_id : str, optional

the featurelist to use to train the model. If not specified, the featurelist of this model is used.

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:

job : ModelJob

the created job to build the model

request_predictions(dataset_id)

Request predictions against a previously uploaded dataset

Parameters:

dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

Returns:

job : PredictJob

The job computing the predictions

get_feature_impact()

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:

feature_impacts : list[dict]

The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’. See the help for Model.request_feature_impact for more details.

Raises:

ClientError (404)

If the feature impacts have not been computed.

request_feature_impact()

Request feature impacts to be computed for the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

Returns:

job : Job

A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises:

JobAlreadyRequested (422)

If the feature impacts have already been requested.

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:

prime_eligibility : dict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:

job : Job

the job generating the rulesets

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:rulesets : list of Ruleset
download_export(filepath)

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters:

filepath : str

The path at which to save the exported model file.

request_transferable_export()

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Examples

model = datarobot.Model.get('p-id', 'l-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
request_frozen_model(sample_pct=None, training_row_count=None)

Train a new frozen model with parameters from this model

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:

sample_pct : float

optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.

training_row_count : int

(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns:

model_job : ModelJob

the modeling job training a frozen model

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None)

Train a new frozen model with parameters from this model

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

Parameters:

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

training_start_date : datetime.datetime, optional

the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

training_end_date : datetime.datetime, optional

the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:

model_job : ModelJob

the modeling job training a frozen model

get_parameters()

Retrieve model parameters.

Returns:

ModelParameters

Model parameters for this model.

get_lift_chart(source)

Retrieve model lift chart for the specified source.

Parameters:

source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

LiftChart

Model lift chart data

get_all_lift_charts()

Retrieve a list of all lift charts available for the model.

Returns:

list of LiftChart

Data for all available model lift charts.

get_confusion_chart(source)

Retrieve model’s confusion chart for the specified source.

Parameters:

source : str

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

ConfusionChart

Model ConfusionChart data

get_all_confusion_charts()

Retrieve a list of all confusion charts available for the model.

Returns:

list of ConfusionChart

Data for all available confusion charts for model.

get_roc_curve(source)

Retrieve model ROC curve for the specified source.

Parameters:

source : str

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

RocCurve

Model ROC curve data

get_all_roc_curves()

Retrieve a list of all ROC curves available for the model.

Returns:

list of RocCurve

Data for all available model ROC curves.

get_word_cloud(exclude_stop_words=False)

Retrieve a word cloud data for the model.

Parameters:

exclude_stop_words : bool, optional

Set to True if you want stopwords filtered out of response.

Returns:

WordCloud

Word cloud data for the model.

download_scoring_code(file_name, source_code=False)

Download scoring code JAR.

Parameters:

file_name : str

File path where scoring code will be saved.

source_code : bool, optional

Set to True to download source code archive. It will not be executable.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:

list of BlueprintTaskDocument

All documents available for the model.

get_model_blueprint_chart()

Retrieve a model blueprint chart that can be used to understand data flow in blueprint.

Returns:

ModelBlueprintChart

The queried model blueprint chart.

request_training_predictions(data_subset)

Start a job to build training predictions

Parameters:

data_subset : str

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL for all data available
  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT for all data except training set
  • dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only
Returns:

Job

an instance of created async job

cross_validate()

Run Cross Validation on this model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:

ModelJob

The created job to build the model

PrimeModel API

class datarobot.models.PrimeModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, parent_model_id=None, ruleset_id=None, rule_count=None, score=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None)

A DataRobot Prime model approximating a parent model with downloadable code

Attributes

id (str) the id of the model
project_id (str) the id of the project the model belongs to
processes (list of str) the processes used by the model
featurelist_name (str) the name of the featurelist used by the model
featurelist_id (str) the id of the featurelist used by the model
sample_pct (float) the percentage of the project dataset used in training the model
training_row_count (int or None) the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
training_duration (str or None) only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
training_start_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
training_end_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
model_type (str) what model this is, e.g. ‘DataRobot Prime’
model_category (str) what kind of model this is - always ‘prime’ for DataRobot Prime models
is_frozen (bool) whether this model is a frozen model
blueprint_id (str) the id of the blueprint used in this model
metrics (dict) a mapping from each metric to the model’s scores for that metric
ruleset (Ruleset) the ruleset used in the Prime model
parent_model_id (str) the id of the model that this Prime model approximates
monotonic_increasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
monotonic_decreasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
supports_monotonic_constraints (bool) optinonal, whether this model supports enforcing montonic constraints
classmethod get(project_id, model_id)

Retrieve a specific prime model.

Parameters:

project_id : str

The id of the project the prime model belongs to

model_id : str

The model_id of the prime model to retrieve.

Returns:

model : PrimeModel

The queried instance.

request_download_validation(language)

Prep and validate the downloadable code for the ruleset associated with this model

Parameters:

language : str

the language the code should be downloaded in - see datarobot.enums.PRIME_LANGUAGE for available languages

Returns:

job : Job

A job tracking the code preparation and validation

cross_validate()

Run Cross Validation on this model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:

ModelJob

The created job to build the model

delete()

Delete a model from the project’s leaderboard.

download_export(filepath)

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters:

filepath : str

The path at which to save the exported model file.

download_scoring_code(file_name, source_code=False)

Download scoring code JAR.

Parameters:

file_name : str

File path where scoring code will be saved.

source_code : bool, optional

Set to True to download source code archive. It will not be executable.

fetch_resource_data(*args, **kwargs)

(Deprecated.) Used to acquire model data directly from its url.

Consider using get instead, as this is a convenience function used for development of datarobot

Parameters:

url : string

The resource we are acquiring

join_endpoint : boolean, optional

Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint

Returns:

model_data : dict

The queried model’s data

get_all_confusion_charts()

Retrieve a list of all confusion charts available for the model.

Returns:

list of ConfusionChart

Data for all available confusion charts for model.

get_all_lift_charts()

Retrieve a list of all lift charts available for the model.

Returns:

list of LiftChart

Data for all available model lift charts.

get_all_roc_curves()

Retrieve a list of all ROC curves available for the model.

Returns:

list of RocCurve

Data for all available model ROC curves.

get_confusion_chart(source)

Retrieve model’s confusion chart for the specified source.

Parameters:

source : str

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

ConfusionChart

Model ConfusionChart data

get_feature_impact()

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:

feature_impacts : list[dict]

The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’. See the help for Model.request_feature_impact for more details.

Raises:

ClientError (404)

If the feature impacts have not been computed.

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:

features : list of str

The names of the features used in the model.

Returns:

url : str

Permanent static hyperlink to this model at leaderboard.

get_lift_chart(source)

Retrieve model lift chart for the specified source.

Parameters:

source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

LiftChart

Model lift chart data

get_model_blueprint_chart()

Retrieve a model blueprint chart that can be used to understand data flow in blueprint.

Returns:

ModelBlueprintChart

The queried model blueprint chart.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:

list of BlueprintTaskDocument

All documents available for the model.

get_parameters()

Retrieve model parameters.

Returns:

ModelParameters

Model parameters for this model.

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:

prime_eligibility : dict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_roc_curve(source)

Retrieve model ROC curve for the specified source.

Parameters:

source : str

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

RocCurve

Model ROC curve data

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:rulesets : list of Ruleset
get_word_cloud(exclude_stop_words=False)

Retrieve a word cloud data for the model.

Parameters:

exclude_stop_words : bool, optional

Set to True if you want stopwords filtered out of response.

Returns:

WordCloud

Word cloud data for the model.

open_model_browser()

Opens model at project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

request_feature_impact()

Request feature impacts to be computed for the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

Returns:

job : Job

A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises:

JobAlreadyRequested (422)

If the feature impacts have already been requested.

request_predictions(dataset_id)

Request predictions against a previously uploaded dataset

Parameters:

dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

Returns:

job : PredictJob

The job computing the predictions

request_training_predictions(data_subset)

Start a job to build training predictions

Parameters:

data_subset : str

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL for all data available
  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT for all data except training set
  • dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only
Returns:

Job

an instance of created async job

request_transferable_export()

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Examples

model = datarobot.Model.get('p-id', 'l-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')

BlenderModel API

class datarobot.models.BlenderModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, model_ids=None, blender_method=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None)

Blender model that combines prediction results from other models.

Attributes

id (str) the id of the model
project_id (str) the id of the project the model belongs to
processes (list of str) the processes used by the model
featurelist_name (str) the name of the featurelist used by the model
featurelist_id (str) the id of the featurelist used by the model
sample_pct (float) the percentage of the project dataset used in training the model
training_row_count (int or None) the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
training_duration (str or None) only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
training_start_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
training_end_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
model_type (str) what model this is, e.g. ‘DataRobot Prime’
model_category (str) what kind of model this is - always ‘prime’ for DataRobot Prime models
is_frozen (bool) whether this model is a frozen model
blueprint_id (str) the id of the blueprint used in this model
metrics (dict) a mapping from each metric to the model’s scores for that metric
model_ids (list of str) List of model ids used in blender
blender_method (str) Method used to blend results from underlying models
monotonic_increasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
monotonic_decreasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
supports_monotonic_constraints (bool) optinonal, whether this model supports enforcing montonic constraints
classmethod get(project_id, model_id)

Retrieve a specific blender.

Parameters:

project_id : str

The project’s id.

model_id : str

The model_id of the leaderboard item to retrieve.

Returns:

model : BlenderModel

The queried instance.

cross_validate()

Run Cross Validation on this model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:

ModelJob

The created job to build the model

delete()

Delete a model from the project’s leaderboard.

download_export(filepath)

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters:

filepath : str

The path at which to save the exported model file.

download_scoring_code(file_name, source_code=False)

Download scoring code JAR.

Parameters:

file_name : str

File path where scoring code will be saved.

source_code : bool, optional

Set to True to download source code archive. It will not be executable.

fetch_resource_data(*args, **kwargs)

(Deprecated.) Used to acquire model data directly from its url.

Consider using get instead, as this is a convenience function used for development of datarobot

Parameters:

url : string

The resource we are acquiring

join_endpoint : boolean, optional

Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint

Returns:

model_data : dict

The queried model’s data

get_all_confusion_charts()

Retrieve a list of all confusion charts available for the model.

Returns:

list of ConfusionChart

Data for all available confusion charts for model.

get_all_lift_charts()

Retrieve a list of all lift charts available for the model.

Returns:

list of LiftChart

Data for all available model lift charts.

get_all_roc_curves()

Retrieve a list of all ROC curves available for the model.

Returns:

list of RocCurve

Data for all available model ROC curves.

get_confusion_chart(source)

Retrieve model’s confusion chart for the specified source.

Parameters:

source : str

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

ConfusionChart

Model ConfusionChart data

get_feature_impact()

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:

feature_impacts : list[dict]

The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’. See the help for Model.request_feature_impact for more details.

Raises:

ClientError (404)

If the feature impacts have not been computed.

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:

features : list of str

The names of the features used in the model.

Returns:

url : str

Permanent static hyperlink to this model at leaderboard.

get_lift_chart(source)

Retrieve model lift chart for the specified source.

Parameters:

source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

LiftChart

Model lift chart data

get_model_blueprint_chart()

Retrieve a model blueprint chart that can be used to understand data flow in blueprint.

Returns:

ModelBlueprintChart

The queried model blueprint chart.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:

list of BlueprintTaskDocument

All documents available for the model.

get_parameters()

Retrieve model parameters.

Returns:

ModelParameters

Model parameters for this model.

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:

prime_eligibility : dict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_roc_curve(source)

Retrieve model ROC curve for the specified source.

Parameters:

source : str

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

RocCurve

Model ROC curve data

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:rulesets : list of Ruleset
get_word_cloud(exclude_stop_words=False)

Retrieve a word cloud data for the model.

Parameters:

exclude_stop_words : bool, optional

Set to True if you want stopwords filtered out of response.

Returns:

WordCloud

Word cloud data for the model.

open_model_browser()

Opens model at project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:

job : Job

the job generating the rulesets

request_feature_impact()

Request feature impacts to be computed for the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

Returns:

job : Job

A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises:

JobAlreadyRequested (422)

If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None)

Train a new frozen model with parameters from this model

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

Parameters:

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

training_start_date : datetime.datetime, optional

the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

training_end_date : datetime.datetime, optional

the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:

model_job : ModelJob

the modeling job training a frozen model

request_frozen_model(sample_pct=None, training_row_count=None)

Train a new frozen model with parameters from this model

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:

sample_pct : float

optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.

training_row_count : int

(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns:

model_job : ModelJob

the modeling job training a frozen model

request_predictions(dataset_id)

Request predictions against a previously uploaded dataset

Parameters:

dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

Returns:

job : PredictJob

The job computing the predictions

request_training_predictions(data_subset)

Start a job to build training predictions

Parameters:

data_subset : str

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL for all data available
  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT for all data except training set
  • dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only
Returns:

Job

an instance of created async job

request_transferable_export()

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Examples

model = datarobot.Model.get('p-id', 'l-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Note

For datetime partitioned projects, use train_datetime instead.

Parameters:

sample_pct : float, optional

The amount of data to use for training, as a percentage of the project dataset from 0 to 100.

featurelist_id : str, optional

The identifier of the featurelist to use. If not defined, the featurelist of this model is used.

scoring_type : str, optional

Either SCORING_TYPE.validation or SCORING_TYPE.cross_validation. SCORING_TYPE.validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, SCORING_TYPE.cross_validation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.

training_row_count : int, optional

The number of rows to use to train the requested model.

monotonic_increasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:

model_job_id : str

id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Examples

project = Project.get('p-id')
model = Model.get('p-id', 'l-id')
model_job_id = model.train(training_row_count=project.max_train_rows)
train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None)

Train this model on a different featurelist or amount of data

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

Parameters:

featurelist_id : str, optional

the featurelist to use to train the model. If not specified, the featurelist of this model is used.

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:

job : ModelJob

the created job to build the model

DatetimeModel API

class datarobot.models.DatetimeModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, training_info=None, holdout_score=None, holdout_status=None, data_selection_method=None, backtests=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None)

A model from a datetime partitioned project

Only one of training_row_count, training_duration, and training_start_date and training_end_date will be specified, depending on the data_selection_method of the model. Whichever method was selected determines the amount of data used to train on when making predictions and scoring the backtests and the holdout.

Attributes

id (str) the id of the model
project_id (str) the id of the project the model belongs to
processes (list of str) the processes used by the model
featurelist_name (str) the name of the featurelist used by the model
featurelist_id (str) the id of the featurelist used by the model
sample_pct (float) the percentage of the project dataset used in training the model
training_row_count (int or None) If specified, an int specifying the number of rows used to train the model and evaluate backtest scores.
training_duration (str or None) If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
training_start_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
training_end_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
time_window_sample_pct (int or None) An integer between 1 and 99 indicating the percentage of sampling within the training window. The points kept are determined by a random uniform sample. If not specified, no sampling was done.
model_type (str) what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
model_category (str) what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
is_frozen (bool) whether this model is a frozen model
blueprint_id (str) the id of the blueprint used in this model
metrics (dict) a mapping from each metric to the model’s scores for that metric. The keys in metrics are the different metrics used to evaluate the model, and the values are the results. The dictionaries inside of metrics will be as described here: ‘validation’, the score for a single backtest; ‘crossValidation’, always None; ‘backtesting’, the average score for all backtests if all are available and computed, or None otherwise; ‘backtestingScores’, a list of scores for all backtests where the score is None if that backtest does not have a score available; and ‘holdout’, the score for the holdout or None if the holdout is locked or the score is unavailable.
backtests (list of dict) describes what data was used to fit each backtest, the score for the project metric, and why the backtest score is unavailable if it is not provided.
data_selection_method (str) which of training_row_count, training_duration, or training_start_data and training_end_date were used to determine the data used to fit the model. One of ‘rowCount’, ‘duration’, or ‘selectedDateRange’.
training_info (dict) describes which data was used to train on when scoring the holdout and making predictions. training_info` will have the following keys: holdout_training_start_date, holdout_training_duration, holdout_training_row_count, holdout_training_end_date, prediction_training_start_date, prediction_training_duration, prediction_training_row_count, prediction_training_end_date. Start and end dates will be datetimes, durations will be duration strings, and rows will be integers.
holdout_score (float or None) the score against the holdout, if available and the holdout is unlocked, according to the project metric.
holdout_status (string or None) the status of the holdout score, e.g. “COMPLETED”, “HOLDOUT_BOUNDARIES_EXCEEDED”. Unavailable if the holdout fold was disabled in the partitioning configuration.
monotonic_increasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
monotonic_decreasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
supports_monotonic_constraints (bool) optinonal, whether this model supports enforcing montonic constraints
classmethod get(project, model_id)

Retrieve a specific datetime model

If the project does not use datetime partitioning, a ClientError will occur.

Parameters:

project : str

the id of the project the model belongs to

model_id : str

the id of the model to retrieve

Returns:

model : DatetimeModel

the model

score_backtests()

Compute the scores for all available backtests

Some backtests may be unavailable if the model is trained into their validation data.

Returns:

job : Job

a job tracking the backtest computation. When it is complete, all available backtests will have scores computed.

cross_validate()

Inherited from Model - DatetimeModels cannot request Cross Validation,

Use score_backtests instead.

delete()

Delete a model from the project’s leaderboard.

download_export(filepath)

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters:

filepath : str

The path at which to save the exported model file.

download_scoring_code(file_name, source_code=False)

Download scoring code JAR.

Parameters:

file_name : str

File path where scoring code will be saved.

source_code : bool, optional

Set to True to download source code archive. It will not be executable.

fetch_resource_data(*args, **kwargs)

(Deprecated.) Used to acquire model data directly from its url.

Consider using get instead, as this is a convenience function used for development of datarobot

Parameters:

url : string

The resource we are acquiring

join_endpoint : boolean, optional

Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint

Returns:

model_data : dict

The queried model’s data

get_all_confusion_charts()

Retrieve a list of all confusion charts available for the model.

Returns:

list of ConfusionChart

Data for all available confusion charts for model.

get_all_lift_charts()

Retrieve a list of all lift charts available for the model.

Returns:

list of LiftChart

Data for all available model lift charts.

get_all_roc_curves()

Retrieve a list of all ROC curves available for the model.

Returns:

list of RocCurve

Data for all available model ROC curves.

get_confusion_chart(source)

Retrieve model’s confusion chart for the specified source.

Parameters:

source : str

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

ConfusionChart

Model ConfusionChart data

get_feature_impact()

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:

feature_impacts : list[dict]

The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’. See the help for Model.request_feature_impact for more details.

Raises:

ClientError (404)

If the feature impacts have not been computed.

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:

features : list of str

The names of the features used in the model.

Returns:

url : str

Permanent static hyperlink to this model at leaderboard.

get_lift_chart(source)

Retrieve model lift chart for the specified source.

Parameters:

source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

LiftChart

Model lift chart data

get_model_blueprint_chart()

Retrieve a model blueprint chart that can be used to understand data flow in blueprint.

Returns:

ModelBlueprintChart

The queried model blueprint chart.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:

list of BlueprintTaskDocument

All documents available for the model.

get_parameters()

Retrieve model parameters.

Returns:

ModelParameters

Model parameters for this model.

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:

prime_eligibility : dict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_roc_curve(source)

Retrieve model ROC curve for the specified source.

Parameters:

source : str

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

RocCurve

Model ROC curve data

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:rulesets : list of Ruleset
get_word_cloud(exclude_stop_words=False)

Retrieve a word cloud data for the model.

Parameters:

exclude_stop_words : bool, optional

Set to True if you want stopwords filtered out of response.

Returns:

WordCloud

Word cloud data for the model.

open_model_browser()

Opens model at project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:

job : Job

the job generating the rulesets

request_feature_impact()

Request feature impacts to be computed for the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

Returns:

job : Job

A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises:

JobAlreadyRequested (422)

If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None)

Train a new frozen model with parameters from this model

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

Parameters:

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

training_start_date : datetime.datetime, optional

the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

training_end_date : datetime.datetime, optional

the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:

model_job : ModelJob

the modeling job training a frozen model

request_predictions(dataset_id)

Request predictions against a previously uploaded dataset

Parameters:

dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

Returns:

job : PredictJob

The job computing the predictions

request_training_predictions(data_subset)

Start a job to build training predictions

Parameters:

data_subset : str

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL for all data available
  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT for all data except training set
  • dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only
Returns:

Job

an instance of created async job

request_transferable_export()

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Examples

model = datarobot.Model.get('p-id', 'l-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None)

Train this model on a different featurelist or amount of data

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

Parameters:

featurelist_id : str, optional

the featurelist to use to train the model. If not specified, the featurelist of this model is used.

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:

job : ModelJob

the created job to build the model

RatingTableModel API

class datarobot.models.RatingTableModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, rating_table_id=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None)

A model that has a rating table.

Attributes

id (str) the id of the model
project_id (str) the id of the project the model belongs to
processes (list of str) the processes used by the model
featurelist_name (str) the name of the featurelist used by the model
featurelist_id (str) the id of the featurelist used by the model
sample_pct (float or None) the percentage of the project dataset used in training the model. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date and training_end_date instead.
training_row_count (int or None) the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
training_duration (str or None) only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
training_start_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
training_end_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
model_type (str) what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
model_category (str) what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
is_frozen (bool) whether this model is a frozen model
blueprint_id (str) the id of the blueprint used in this model
metrics (dict) a mapping from each metric to the model’s scores for that metric
rating_table_id (str) the id of the rating table that belongs to this model
monotonic_increasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
monotonic_decreasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
supports_monotonic_constraints (bool) optinonal, whether this model supports enforcing montonic constraints
classmethod get(project_id, model_id)

Retrieve a specific rating table model

If the project does not have a rating table, a ClientError will occur.

Parameters:

project_id : str

the id of the project the model belongs to

model_id : str

the id of the model to retrieve

Returns:

model : RatingTableModel

the model

classmethod create_from_rating_table(project_id, rating_table_id)

Creates a new model from a validated rating table record. The RatingTable must not be associated with an existing model.

Parameters:

project_id : str

the id of the project the rating table belongs to

rating_table_id : str

the id of the rating table to create this model from

Returns:

job: Job

an instance of created async job

Raises:

ClientError (422)

Raised if creating model from a RatingTable that failed validation

JobAlreadyRequested

Raised if creating model from a RatingTable that is already associated with a RatingTableModel

cross_validate()

Run Cross Validation on this model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:

ModelJob

The created job to build the model

delete()

Delete a model from the project’s leaderboard.

download_export(filepath)

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters:

filepath : str

The path at which to save the exported model file.

download_scoring_code(file_name, source_code=False)

Download scoring code JAR.

Parameters:

file_name : str

File path where scoring code will be saved.

source_code : bool, optional

Set to True to download source code archive. It will not be executable.

fetch_resource_data(*args, **kwargs)

(Deprecated.) Used to acquire model data directly from its url.

Consider using get instead, as this is a convenience function used for development of datarobot

Parameters:

url : string

The resource we are acquiring

join_endpoint : boolean, optional

Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint

Returns:

model_data : dict

The queried model’s data

get_all_confusion_charts()

Retrieve a list of all confusion charts available for the model.

Returns:

list of ConfusionChart

Data for all available confusion charts for model.

get_all_lift_charts()

Retrieve a list of all lift charts available for the model.

Returns:

list of LiftChart

Data for all available model lift charts.

get_all_roc_curves()

Retrieve a list of all ROC curves available for the model.

Returns:

list of RocCurve

Data for all available model ROC curves.

get_confusion_chart(source)

Retrieve model’s confusion chart for the specified source.

Parameters:

source : str

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

ConfusionChart

Model ConfusionChart data

get_feature_impact()

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:

feature_impacts : list[dict]

The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’. See the help for Model.request_feature_impact for more details.

Raises:

ClientError (404)

If the feature impacts have not been computed.

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:

features : list of str

The names of the features used in the model.

Returns:

url : str

Permanent static hyperlink to this model at leaderboard.

get_lift_chart(source)

Retrieve model lift chart for the specified source.

Parameters:

source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

LiftChart

Model lift chart data

get_model_blueprint_chart()

Retrieve a model blueprint chart that can be used to understand data flow in blueprint.

Returns:

ModelBlueprintChart

The queried model blueprint chart.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:

list of BlueprintTaskDocument

All documents available for the model.

get_parameters()

Retrieve model parameters.

Returns:

ModelParameters

Model parameters for this model.

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:

prime_eligibility : dict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_roc_curve(source)

Retrieve model ROC curve for the specified source.

Parameters:

source : str

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

RocCurve

Model ROC curve data

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:rulesets : list of Ruleset
get_word_cloud(exclude_stop_words=False)

Retrieve a word cloud data for the model.

Parameters:

exclude_stop_words : bool, optional

Set to True if you want stopwords filtered out of response.

Returns:

WordCloud

Word cloud data for the model.

open_model_browser()

Opens model at project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:

job : Job

the job generating the rulesets

request_feature_impact()

Request feature impacts to be computed for the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

Returns:

job : Job

A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises:

JobAlreadyRequested (422)

If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None)

Train a new frozen model with parameters from this model

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

Parameters:

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

training_start_date : datetime.datetime, optional

the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

training_end_date : datetime.datetime, optional

the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:

model_job : ModelJob

the modeling job training a frozen model

request_frozen_model(sample_pct=None, training_row_count=None)

Train a new frozen model with parameters from this model

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:

sample_pct : float

optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.

training_row_count : int

(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns:

model_job : ModelJob

the modeling job training a frozen model

request_predictions(dataset_id)

Request predictions against a previously uploaded dataset

Parameters:

dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

Returns:

job : PredictJob

The job computing the predictions

request_training_predictions(data_subset)

Start a job to build training predictions

Parameters:

data_subset : str

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL for all data available
  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT for all data except training set
  • dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only
Returns:

Job

an instance of created async job

request_transferable_export()

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Examples

model = datarobot.Model.get('p-id', 'l-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Note

For datetime partitioned projects, use train_datetime instead.

Parameters:

sample_pct : float, optional

The amount of data to use for training, as a percentage of the project dataset from 0 to 100.

featurelist_id : str, optional

The identifier of the featurelist to use. If not defined, the featurelist of this model is used.

scoring_type : str, optional

Either SCORING_TYPE.validation or SCORING_TYPE.cross_validation. SCORING_TYPE.validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, SCORING_TYPE.cross_validation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.

training_row_count : int, optional

The number of rows to use to train the requested model.

monotonic_increasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:

model_job_id : str

id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Examples

project = Project.get('p-id')
model = Model.get('p-id', 'l-id')
model_job_id = model.train(training_row_count=project.max_train_rows)
train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None)

Train this model on a different featurelist or amount of data

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

Parameters:

featurelist_id : str, optional

the featurelist to use to train the model. If not specified, the featurelist of this model is used.

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:

job : ModelJob

the created job to build the model