Model API

class datarobot.models.Model(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, project=None, data=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None)

A model trained on a project’s dataset capable of making predictions

Attributes

id (str) the id of the model
project_id (str) the id of the project the model belongs to
processes (list of str) the processes used by the model
featurelist_name (str) the name of the featurelist used by the model
featurelist_id (str) the id of the featurelist used by the model
sample_pct (float or None) the percentage of the project dataset used in training the model. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date and training_end_date instead.
training_row_count (int or None) the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
training_duration (str or None) only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
training_start_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
training_end_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
model_type (str) what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
model_category (str) what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
is_frozen (bool) whether this model is a frozen model
blueprint_id (str) the id of the blueprint used in this model
metrics (dict) a mapping from each metric to the model’s scores for that metric
monotonic_increasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
monotonic_decreasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
supports_monotonic_constraints (bool) optinonal, whether this model supports enforcing monotonic constraints
is_starred (bool) whether this model marked as starred
prediction_threshold (float) for binary classification projects, the threshold used for predictions
prediction_threshold_read_only (bool) indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
classmethod get(project, model_id)

Retrieve a specific model.

Parameters:

project : str

The project’s id.

model_id : str

The model_id of the leaderboard item to retrieve.

Returns:

model : Model

The queried instance.

Raises:

ValueError

passed project parameter value is of not supported type

classmethod fetch_resource_data(*args, **kwargs)

(Deprecated.) Used to acquire model data directly from its url.

Consider using get instead, as this is a convenience function used for development of datarobot

Parameters:

url : string

The resource we are acquiring

join_endpoint : boolean, optional

Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint

Returns:

model_data : dict

The queried model’s data

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:

features : list of str

The names of the features used in the model.

delete()

Delete a model from the project’s leaderboard.

Returns:

url : str

Permanent static hyperlink to this model at leaderboard.

open_model_browser()

Opens model at project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Note

For datetime partitioned projects, use train_datetime instead.

Parameters:

sample_pct : float, optional

The amount of data to use for training, as a percentage of the project dataset from 0 to 100.

featurelist_id : str, optional

The identifier of the featurelist to use. If not defined, the featurelist of this model is used.

scoring_type : str, optional

Either SCORING_TYPE.validation or SCORING_TYPE.cross_validation. SCORING_TYPE.validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, SCORING_TYPE.cross_validation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.

training_row_count : int, optional

The number of rows to use to train the requested model.

monotonic_increasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:

model_job_id : str

id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Examples

project = Project.get('p-id')
model = Model.get('p-id', 'l-id')
model_job_id = model.train(training_row_count=project.max_train_rows)
train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None)

Train this model on a different featurelist or amount of data

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

Parameters:

featurelist_id : str, optional

the featurelist to use to train the model. If not specified, the featurelist of this model is used.

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:

job : ModelJob

the created job to build the model

request_predictions(dataset_id)

Request predictions against a previously uploaded dataset

Parameters:

dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

Returns:

job : PredictJob

The job computing the predictions

get_feature_impact()

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in additional, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:

feature_impacts : list of dict

The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:

ClientError (404)

If the feature impacts have not been computed.

request_feature_impact()

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Returns:

job : Job

A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises:

JobAlreadyRequested (422)

If the feature impacts have already been requested.

get_or_request_feature_impact(max_wait=600)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:

max_wait : int, optional

The maximum time to wait for a requested feature impact job to complete before erroring

Returns:

feature_impacts : list of dict

The feature impact data. See get_feature_impact for the exact schema.

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:

prime_eligibility : dict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:

job : Job

the job generating the rulesets

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:rulesets : list of Ruleset
download_export(filepath)

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters:

filepath : str

The path at which to save the exported model file.

request_transferable_export()

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Examples

model = datarobot.Model.get('p-id', 'l-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
request_frozen_model(sample_pct=None, training_row_count=None)

Train a new frozen model with parameters from this model

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:

sample_pct : float

optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.

training_row_count : int

(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns:

model_job : ModelJob

the modeling job training a frozen model

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None)

Train a new frozen model with parameters from this model

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

Parameters:

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

training_start_date : datetime.datetime, optional

the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

training_end_date : datetime.datetime, optional

the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:

model_job : ModelJob

the modeling job training a frozen model

get_parameters()

Retrieve model parameters.

Returns:

ModelParameters

Model parameters for this model.

get_lift_chart(source)

Retrieve model lift chart for the specified source.

Parameters:

source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

LiftChart

Model lift chart data

get_all_lift_charts()

Retrieve a list of all lift charts available for the model.

Returns:

list of LiftChart

Data for all available model lift charts.

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:

ParetoFront

Model ParetoFront data

get_confusion_chart(source)

Retrieve model’s confusion chart for the specified source.

Parameters:

source : str

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

ConfusionChart

Model ConfusionChart data

get_all_confusion_charts()

Retrieve a list of all confusion charts available for the model.

Returns:

list of ConfusionChart

Data for all available confusion charts for model.

get_roc_curve(source)

Retrieve model ROC curve for the specified source.

Parameters:

source : str

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

RocCurve

Model ROC curve data

get_all_roc_curves()

Retrieve a list of all ROC curves available for the model.

Returns:

list of RocCurve

Data for all available model ROC curves.

get_word_cloud(exclude_stop_words=False)

Retrieve a word cloud data for the model.

Parameters:

exclude_stop_words : bool, optional

Set to True if you want stopwords filtered out of response.

Returns:

WordCloud

Word cloud data for the model.

download_scoring_code(file_name, source_code=False)

Download scoring code JAR.

Parameters:

file_name : str

File path where scoring code will be saved.

source_code : bool, optional

Set to True to download source code archive. It will not be executable.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:

list of BlueprintTaskDocument

All documents available for the model.

get_model_blueprint_chart()

Retrieve a model blueprint chart that can be used to understand data flow in blueprint.

Returns:

ModelBlueprintChart

The queried model blueprint chart.

get_missing_report_info()

Retrieve a model missing data report on training data that can be used to understand missing values treatment in a model. Report consists of missing values reports for features which took part in modelling and are numeric or categorical.

Returns:

An iterable of MissingReportPerFeature

The queried model missing report, sorted by missing count (DESCENDING order).

request_training_predictions(data_subset)

Start a job to build training predictions

Parameters:

data_subset : str

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL for all data available
  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT for all data except training set
  • dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only
Returns:

Job

an instance of created async job

cross_validate()

Run Cross Validation on this model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:

ModelJob

The created job to build the model

get_cross_validation_scores(partition=None, metric=None)

Returns a dictionary keyed by metric showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:

partition : float

optional, the id of the partition (1,2,3.0,4.0,etc...) to filter results by can be a whole number positive integer or float value.

metric: unicode

optional name of the metric to filter to resulting cross validation scores by

Returns:

cross_validation_scores: dict

A dictionary keyed by metric showing cross validation scores per partition.

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.13, only Eureqa blueprints (blueprints whose title starts with ‘Eureqa’) support Advanced Tuning.

Parameters:

params : dict

Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

description : unicode

Human-readable string describing the newly advanced-tuned model

Returns:

ModelJob

The created job to build the model

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.13, only Eureqa blueprints (blueprints whose title starts with ‘Eureqa’) support Advanced Tuning.

Returns:

dict

A dictionary describing the advanced-tuning parameters for the current model. Dictionary is of the following form:

{
    "tuningDescription": <User-specified description>  (or `None` if not available),
    "tuningParameters": [
        {
            "parameterName": <unicode : name of parameter (unique per task, see below)>,
            "parameterId": <unicode : opaque ID string uniquely identifying parameter>,
            "defaultValue": <* : default value of the parameter for the blueprint>,
            "currentValue": <* : value of the parameter that was used for this model>,
            "taskName": <unicode : name of the task that this parameter belongs to>,
            "constraints": {
                (...)
            }
        },
        (...)
    ]
}

The type of defaultValue and currentValue is defined by the constraints structure.

It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys.

The presence of a key indicates that the parameter may take on the specified type.

(If a key is absent, this means that the parameter may not take on the specified type.)

If a key on constraints is present, its value will be a dict containing

all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "length": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select:

    Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.

  • ascii:

    The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.

  • unicode:

    The parameter may be any Python unicode object.

  • int:

    The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].

  • float:

    The value may be an object of type float within the specified range (inclusive).

  • intList, floatList:

    The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple

keys, the parameter may take on any value permitted by any key.

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.13, only Eureqa blueprints (blueprints whose title starts with ‘Eureqa’) support Advanced Tuning.

Returns:

AdvancedTuningSession

Session for setting up and running Advanced Tuning on a model

star_model()

Mark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

unstar_model()

Unmark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

set_prediction_threshold(threshold)

Set a custom prediction threshold for the model

May not be used once prediction_threshold_read_only is True for this model.

Parameters:

threshold : float

only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

PrimeModel API

class datarobot.models.PrimeModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, parent_model_id=None, ruleset_id=None, rule_count=None, score=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None)

A DataRobot Prime model approximating a parent model with downloadable code

Attributes

id (str) the id of the model
project_id (str) the id of the project the model belongs to
processes (list of str) the processes used by the model
featurelist_name (str) the name of the featurelist used by the model
featurelist_id (str) the id of the featurelist used by the model
sample_pct (float) the percentage of the project dataset used in training the model
training_row_count (int or None) the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
training_duration (str or None) only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
training_start_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
training_end_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
model_type (str) what model this is, e.g. ‘DataRobot Prime’
model_category (str) what kind of model this is - always ‘prime’ for DataRobot Prime models
is_frozen (bool) whether this model is a frozen model
blueprint_id (str) the id of the blueprint used in this model
metrics (dict) a mapping from each metric to the model’s scores for that metric
ruleset (Ruleset) the ruleset used in the Prime model
parent_model_id (str) the id of the model that this Prime model approximates
monotonic_increasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
monotonic_decreasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
supports_monotonic_constraints (bool) optional, whether this model supports enforcing monotonic constraints
is_starred (bool) whether this model is marked as starred
prediction_threshold (float) for binary classification projects, the threshold used for predictions
prediction_threshold_read_only (bool) indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
classmethod get(project_id, model_id)

Retrieve a specific prime model.

Parameters:

project_id : str

The id of the project the prime model belongs to

model_id : str

The model_id of the prime model to retrieve.

Returns:

model : PrimeModel

The queried instance.

request_download_validation(language)

Prep and validate the downloadable code for the ruleset associated with this model

Parameters:

language : str

the language the code should be downloaded in - see datarobot.enums.PRIME_LANGUAGE for available languages

Returns:

job : Job

A job tracking the code preparation and validation

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.13, only Eureqa blueprints (blueprints whose title starts with ‘Eureqa’) support Advanced Tuning.

Parameters:

params : dict

Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

description : unicode

Human-readable string describing the newly advanced-tuned model

Returns:

ModelJob

The created job to build the model

cross_validate()

Run Cross Validation on this model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:

ModelJob

The created job to build the model

delete()

Delete a model from the project’s leaderboard.

download_export(filepath)

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters:

filepath : str

The path at which to save the exported model file.

download_scoring_code(file_name, source_code=False)

Download scoring code JAR.

Parameters:

file_name : str

File path where scoring code will be saved.

source_code : bool, optional

Set to True to download source code archive. It will not be executable.

fetch_resource_data(*args, **kwargs)

(Deprecated.) Used to acquire model data directly from its url.

Consider using get instead, as this is a convenience function used for development of datarobot

Parameters:

url : string

The resource we are acquiring

join_endpoint : boolean, optional

Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint

Returns:

model_data : dict

The queried model’s data

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.13, only Eureqa blueprints (blueprints whose title starts with ‘Eureqa’) support Advanced Tuning.

Returns:

dict

A dictionary describing the advanced-tuning parameters for the current model. Dictionary is of the following form:

{
    "tuningDescription": <User-specified description>  (or `None` if not available),
    "tuningParameters": [
        {
            "parameterName": <unicode : name of parameter (unique per task, see below)>,
            "parameterId": <unicode : opaque ID string uniquely identifying parameter>,
            "defaultValue": <* : default value of the parameter for the blueprint>,
            "currentValue": <* : value of the parameter that was used for this model>,
            "taskName": <unicode : name of the task that this parameter belongs to>,
            "constraints": {
                (...)
            }
        },
        (...)
    ]
}

The type of defaultValue and currentValue is defined by the constraints structure.

It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys.

The presence of a key indicates that the parameter may take on the specified type.

(If a key is absent, this means that the parameter may not take on the specified type.)

If a key on constraints is present, its value will be a dict containing

all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "length": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select:

    Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.

  • ascii:

    The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.

  • unicode:

    The parameter may be any Python unicode object.

  • int:

    The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].

  • float:

    The value may be an object of type float within the specified range (inclusive).

  • intList, floatList:

    The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple

keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts()

Retrieve a list of all confusion charts available for the model.

Returns:

list of ConfusionChart

Data for all available confusion charts for model.

get_all_lift_charts()

Retrieve a list of all lift charts available for the model.

Returns:

list of LiftChart

Data for all available model lift charts.

get_all_roc_curves()

Retrieve a list of all ROC curves available for the model.

Returns:

list of RocCurve

Data for all available model ROC curves.

get_confusion_chart(source)

Retrieve model’s confusion chart for the specified source.

Parameters:

source : str

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

ConfusionChart

Model ConfusionChart data

get_cross_validation_scores(partition=None, metric=None)

Returns a dictionary keyed by metric showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:

partition : float

optional, the id of the partition (1,2,3.0,4.0,etc...) to filter results by can be a whole number positive integer or float value.

metric: unicode

optional name of the metric to filter to resulting cross validation scores by

Returns:

cross_validation_scores: dict

A dictionary keyed by metric showing cross validation scores per partition.

get_feature_impact()

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in additional, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:

feature_impacts : list of dict

The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:

ClientError (404)

If the feature impacts have not been computed.

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:

features : list of str

The names of the features used in the model.

Returns:

url : str

Permanent static hyperlink to this model at leaderboard.

get_lift_chart(source)

Retrieve model lift chart for the specified source.

Parameters:

source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

LiftChart

Model lift chart data

get_missing_report_info()

Retrieve a model missing data report on training data that can be used to understand missing values treatment in a model. Report consists of missing values reports for features which took part in modelling and are numeric or categorical.

Returns:

An iterable of MissingReportPerFeature

The queried model missing report, sorted by missing count (DESCENDING order).

get_model_blueprint_chart()

Retrieve a model blueprint chart that can be used to understand data flow in blueprint.

Returns:

ModelBlueprintChart

The queried model blueprint chart.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:

list of BlueprintTaskDocument

All documents available for the model.

get_or_request_feature_impact(max_wait=600)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:

max_wait : int, optional

The maximum time to wait for a requested feature impact job to complete before erroring

Returns:

feature_impacts : list of dict

The feature impact data. See get_feature_impact for the exact schema.

get_parameters()

Retrieve model parameters.

Returns:

ModelParameters

Model parameters for this model.

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:

ParetoFront

Model ParetoFront data

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:

prime_eligibility : dict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_roc_curve(source)

Retrieve model ROC curve for the specified source.

Parameters:

source : str

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

RocCurve

Model ROC curve data

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:rulesets : list of Ruleset
get_word_cloud(exclude_stop_words=False)

Retrieve a word cloud data for the model.

Parameters:

exclude_stop_words : bool, optional

Set to True if you want stopwords filtered out of response.

Returns:

WordCloud

Word cloud data for the model.

open_model_browser()

Opens model at project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

request_feature_impact()

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Returns:

job : Job

A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises:

JobAlreadyRequested (422)

If the feature impacts have already been requested.

request_predictions(dataset_id)

Request predictions against a previously uploaded dataset

Parameters:

dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

Returns:

job : PredictJob

The job computing the predictions

request_training_predictions(data_subset)

Start a job to build training predictions

Parameters:

data_subset : str

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL for all data available
  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT for all data except training set
  • dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only
Returns:

Job

an instance of created async job

request_transferable_export()

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Examples

model = datarobot.Model.get('p-id', 'l-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
set_prediction_threshold(threshold)

Set a custom prediction threshold for the model

May not be used once prediction_threshold_read_only is True for this model.

Parameters:

threshold : float

only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.13, only Eureqa blueprints (blueprints whose title starts with ‘Eureqa’) support Advanced Tuning.

Returns:

AdvancedTuningSession

Session for setting up and running Advanced Tuning on a model

unstar_model()

Unmark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

BlenderModel API

class datarobot.models.BlenderModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, model_ids=None, blender_method=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None)

Blender model that combines prediction results from other models.

Attributes

id (str) the id of the model
project_id (str) the id of the project the model belongs to
processes (list of str) the processes used by the model
featurelist_name (str) the name of the featurelist used by the model
featurelist_id (str) the id of the featurelist used by the model
sample_pct (float) the percentage of the project dataset used in training the model
training_row_count (int or None) the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
training_duration (str or None) only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
training_start_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
training_end_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
model_type (str) what model this is, e.g. ‘DataRobot Prime’
model_category (str) what kind of model this is - always ‘prime’ for DataRobot Prime models
is_frozen (bool) whether this model is a frozen model
blueprint_id (str) the id of the blueprint used in this model
metrics (dict) a mapping from each metric to the model’s scores for that metric
model_ids (list of str) List of model ids used in blender
blender_method (str) Method used to blend results from underlying models
monotonic_increasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
monotonic_decreasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
supports_monotonic_constraints (bool) optional, whether this model supports enforcing monotonic constraints
is_starred (bool) whether this model marked as starred
prediction_threshold (float) for binary classification projects, the threshold used for predictions
prediction_threshold_read_only (bool) indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
classmethod get(project_id, model_id)

Retrieve a specific blender.

Parameters:

project_id : str

The project’s id.

model_id : str

The model_id of the leaderboard item to retrieve.

Returns:

model : BlenderModel

The queried instance.

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.13, only Eureqa blueprints (blueprints whose title starts with ‘Eureqa’) support Advanced Tuning.

Parameters:

params : dict

Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

description : unicode

Human-readable string describing the newly advanced-tuned model

Returns:

ModelJob

The created job to build the model

cross_validate()

Run Cross Validation on this model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:

ModelJob

The created job to build the model

delete()

Delete a model from the project’s leaderboard.

download_export(filepath)

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters:

filepath : str

The path at which to save the exported model file.

download_scoring_code(file_name, source_code=False)

Download scoring code JAR.

Parameters:

file_name : str

File path where scoring code will be saved.

source_code : bool, optional

Set to True to download source code archive. It will not be executable.

fetch_resource_data(*args, **kwargs)

(Deprecated.) Used to acquire model data directly from its url.

Consider using get instead, as this is a convenience function used for development of datarobot

Parameters:

url : string

The resource we are acquiring

join_endpoint : boolean, optional

Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint

Returns:

model_data : dict

The queried model’s data

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.13, only Eureqa blueprints (blueprints whose title starts with ‘Eureqa’) support Advanced Tuning.

Returns:

dict

A dictionary describing the advanced-tuning parameters for the current model. Dictionary is of the following form:

{
    "tuningDescription": <User-specified description>  (or `None` if not available),
    "tuningParameters": [
        {
            "parameterName": <unicode : name of parameter (unique per task, see below)>,
            "parameterId": <unicode : opaque ID string uniquely identifying parameter>,
            "defaultValue": <* : default value of the parameter for the blueprint>,
            "currentValue": <* : value of the parameter that was used for this model>,
            "taskName": <unicode : name of the task that this parameter belongs to>,
            "constraints": {
                (...)
            }
        },
        (...)
    ]
}

The type of defaultValue and currentValue is defined by the constraints structure.

It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys.

The presence of a key indicates that the parameter may take on the specified type.

(If a key is absent, this means that the parameter may not take on the specified type.)

If a key on constraints is present, its value will be a dict containing

all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "length": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select:

    Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.

  • ascii:

    The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.

  • unicode:

    The parameter may be any Python unicode object.

  • int:

    The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].

  • float:

    The value may be an object of type float within the specified range (inclusive).

  • intList, floatList:

    The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple

keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts()

Retrieve a list of all confusion charts available for the model.

Returns:

list of ConfusionChart

Data for all available confusion charts for model.

get_all_lift_charts()

Retrieve a list of all lift charts available for the model.

Returns:

list of LiftChart

Data for all available model lift charts.

get_all_roc_curves()

Retrieve a list of all ROC curves available for the model.

Returns:

list of RocCurve

Data for all available model ROC curves.

get_confusion_chart(source)

Retrieve model’s confusion chart for the specified source.

Parameters:

source : str

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

ConfusionChart

Model ConfusionChart data

get_cross_validation_scores(partition=None, metric=None)

Returns a dictionary keyed by metric showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:

partition : float

optional, the id of the partition (1,2,3.0,4.0,etc...) to filter results by can be a whole number positive integer or float value.

metric: unicode

optional name of the metric to filter to resulting cross validation scores by

Returns:

cross_validation_scores: dict

A dictionary keyed by metric showing cross validation scores per partition.

get_feature_impact()

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in additional, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:

feature_impacts : list of dict

The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:

ClientError (404)

If the feature impacts have not been computed.

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:

features : list of str

The names of the features used in the model.

Returns:

url : str

Permanent static hyperlink to this model at leaderboard.

get_lift_chart(source)

Retrieve model lift chart for the specified source.

Parameters:

source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

LiftChart

Model lift chart data

get_missing_report_info()

Retrieve a model missing data report on training data that can be used to understand missing values treatment in a model. Report consists of missing values reports for features which took part in modelling and are numeric or categorical.

Returns:

An iterable of MissingReportPerFeature

The queried model missing report, sorted by missing count (DESCENDING order).

get_model_blueprint_chart()

Retrieve a model blueprint chart that can be used to understand data flow in blueprint.

Returns:

ModelBlueprintChart

The queried model blueprint chart.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:

list of BlueprintTaskDocument

All documents available for the model.

get_or_request_feature_impact(max_wait=600)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:

max_wait : int, optional

The maximum time to wait for a requested feature impact job to complete before erroring

Returns:

feature_impacts : list of dict

The feature impact data. See get_feature_impact for the exact schema.

get_parameters()

Retrieve model parameters.

Returns:

ModelParameters

Model parameters for this model.

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:

ParetoFront

Model ParetoFront data

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:

prime_eligibility : dict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_roc_curve(source)

Retrieve model ROC curve for the specified source.

Parameters:

source : str

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

RocCurve

Model ROC curve data

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:rulesets : list of Ruleset
get_word_cloud(exclude_stop_words=False)

Retrieve a word cloud data for the model.

Parameters:

exclude_stop_words : bool, optional

Set to True if you want stopwords filtered out of response.

Returns:

WordCloud

Word cloud data for the model.

open_model_browser()

Opens model at project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:

job : Job

the job generating the rulesets

request_feature_impact()

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Returns:

job : Job

A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises:

JobAlreadyRequested (422)

If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None)

Train a new frozen model with parameters from this model

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

Parameters:

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

training_start_date : datetime.datetime, optional

the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

training_end_date : datetime.datetime, optional

the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:

model_job : ModelJob

the modeling job training a frozen model

request_frozen_model(sample_pct=None, training_row_count=None)

Train a new frozen model with parameters from this model

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:

sample_pct : float

optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.

training_row_count : int

(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns:

model_job : ModelJob

the modeling job training a frozen model

request_predictions(dataset_id)

Request predictions against a previously uploaded dataset

Parameters:

dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

Returns:

job : PredictJob

The job computing the predictions

request_training_predictions(data_subset)

Start a job to build training predictions

Parameters:

data_subset : str

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL for all data available
  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT for all data except training set
  • dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only
Returns:

Job

an instance of created async job

request_transferable_export()

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Examples

model = datarobot.Model.get('p-id', 'l-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
set_prediction_threshold(threshold)

Set a custom prediction threshold for the model

May not be used once prediction_threshold_read_only is True for this model.

Parameters:

threshold : float

only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.13, only Eureqa blueprints (blueprints whose title starts with ‘Eureqa’) support Advanced Tuning.

Returns:

AdvancedTuningSession

Session for setting up and running Advanced Tuning on a model

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Note

For datetime partitioned projects, use train_datetime instead.

Parameters:

sample_pct : float, optional

The amount of data to use for training, as a percentage of the project dataset from 0 to 100.

featurelist_id : str, optional

The identifier of the featurelist to use. If not defined, the featurelist of this model is used.

scoring_type : str, optional

Either SCORING_TYPE.validation or SCORING_TYPE.cross_validation. SCORING_TYPE.validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, SCORING_TYPE.cross_validation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.

training_row_count : int, optional

The number of rows to use to train the requested model.

monotonic_increasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:

model_job_id : str

id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Examples

project = Project.get('p-id')
model = Model.get('p-id', 'l-id')
model_job_id = model.train(training_row_count=project.max_train_rows)
train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None)

Train this model on a different featurelist or amount of data

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

Parameters:

featurelist_id : str, optional

the featurelist to use to train the model. If not specified, the featurelist of this model is used.

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:

job : ModelJob

the created job to build the model

unstar_model()

Unmark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

DatetimeModel API

class datarobot.models.DatetimeModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, training_info=None, holdout_score=None, holdout_status=None, data_selection_method=None, backtests=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None)

A model from a datetime partitioned project

Only one of training_row_count, training_duration, and training_start_date and training_end_date will be specified, depending on the data_selection_method of the model. Whichever method was selected determines the amount of data used to train on when making predictions and scoring the backtests and the holdout.

Attributes

id (str) the id of the model
project_id (str) the id of the project the model belongs to
processes (list of str) the processes used by the model
featurelist_name (str) the name of the featurelist used by the model
featurelist_id (str) the id of the featurelist used by the model
sample_pct (float) the percentage of the project dataset used in training the model
training_row_count (int or None) If specified, an int specifying the number of rows used to train the model and evaluate backtest scores.
training_duration (str or None) If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
training_start_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
training_end_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
time_window_sample_pct (int or None) An integer between 1 and 99 indicating the percentage of sampling within the training window. The points kept are determined by a random uniform sample. If not specified, no sampling was done.
model_type (str) what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
model_category (str) what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
is_frozen (bool) whether this model is a frozen model
blueprint_id (str) the id of the blueprint used in this model
metrics (dict) a mapping from each metric to the model’s scores for that metric. The keys in metrics are the different metrics used to evaluate the model, and the values are the results. The dictionaries inside of metrics will be as described here: ‘validation’, the score for a single backtest; ‘crossValidation’, always None; ‘backtesting’, the average score for all backtests if all are available and computed, or None otherwise; ‘backtestingScores’, a list of scores for all backtests where the score is None if that backtest does not have a score available; and ‘holdout’, the score for the holdout or None if the holdout is locked or the score is unavailable.
backtests (list of dict) describes what data was used to fit each backtest, the score for the project metric, and why the backtest score is unavailable if it is not provided.
data_selection_method (str) which of training_row_count, training_duration, or training_start_data and training_end_date were used to determine the data used to fit the model. One of ‘rowCount’, ‘duration’, or ‘selectedDateRange’.
training_info (dict) describes which data was used to train on when scoring the holdout and making predictions. training_info` will have the following keys: holdout_training_start_date, holdout_training_duration, holdout_training_row_count, holdout_training_end_date, prediction_training_start_date, prediction_training_duration, prediction_training_row_count, prediction_training_end_date. Start and end dates will be datetimes, durations will be duration strings, and rows will be integers.
holdout_score (float or None) the score against the holdout, if available and the holdout is unlocked, according to the project metric.
holdout_status (string or None) the status of the holdout score, e.g. “COMPLETED”, “HOLDOUT_BOUNDARIES_EXCEEDED”. Unavailable if the holdout fold was disabled in the partitioning configuration.
monotonic_increasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
monotonic_decreasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
supports_monotonic_constraints (bool) optional, whether this model supports enforcing monotonic constraints
is_starred (bool) whether this model marked as starred
prediction_threshold (float) for binary classification projects, the threshold used for predictions
prediction_threshold_read_only (bool) indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
classmethod get(project, model_id)

Retrieve a specific datetime model

If the project does not use datetime partitioning, a ClientError will occur.

Parameters:

project : str

the id of the project the model belongs to

model_id : str

the id of the model to retrieve

Returns:

model : DatetimeModel

the model

score_backtests()

Compute the scores for all available backtests

Some backtests may be unavailable if the model is trained into their validation data.

Returns:

job : Job

a job tracking the backtest computation. When it is complete, all available backtests will have scores computed.

cross_validate()

Inherited from Model - DatetimeModels cannot request Cross Validation,

Use score_backtests instead.

get_cross_validation_scores(partition=None, metric=None)

Inherited from Model - DatetimeModels cannot request Cross Validation scores,

Use backtests instead.

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.13, only Eureqa blueprints (blueprints whose title starts with ‘Eureqa’) support Advanced Tuning.

Parameters:

params : dict

Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

description : unicode

Human-readable string describing the newly advanced-tuned model

Returns:

ModelJob

The created job to build the model

delete()

Delete a model from the project’s leaderboard.

download_export(filepath)

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters:

filepath : str

The path at which to save the exported model file.

download_scoring_code(file_name, source_code=False)

Download scoring code JAR.

Parameters:

file_name : str

File path where scoring code will be saved.

source_code : bool, optional

Set to True to download source code archive. It will not be executable.

fetch_resource_data(*args, **kwargs)

(Deprecated.) Used to acquire model data directly from its url.

Consider using get instead, as this is a convenience function used for development of datarobot

Parameters:

url : string

The resource we are acquiring

join_endpoint : boolean, optional

Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint

Returns:

model_data : dict

The queried model’s data

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.13, only Eureqa blueprints (blueprints whose title starts with ‘Eureqa’) support Advanced Tuning.

Returns:

dict

A dictionary describing the advanced-tuning parameters for the current model. Dictionary is of the following form:

{
    "tuningDescription": <User-specified description>  (or `None` if not available),
    "tuningParameters": [
        {
            "parameterName": <unicode : name of parameter (unique per task, see below)>,
            "parameterId": <unicode : opaque ID string uniquely identifying parameter>,
            "defaultValue": <* : default value of the parameter for the blueprint>,
            "currentValue": <* : value of the parameter that was used for this model>,
            "taskName": <unicode : name of the task that this parameter belongs to>,
            "constraints": {
                (...)
            }
        },
        (...)
    ]
}

The type of defaultValue and currentValue is defined by the constraints structure.

It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys.

The presence of a key indicates that the parameter may take on the specified type.

(If a key is absent, this means that the parameter may not take on the specified type.)

If a key on constraints is present, its value will be a dict containing

all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "length": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select:

    Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.

  • ascii:

    The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.

  • unicode:

    The parameter may be any Python unicode object.

  • int:

    The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].

  • float:

    The value may be an object of type float within the specified range (inclusive).

  • intList, floatList:

    The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple

keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts()

Retrieve a list of all confusion charts available for the model.

Returns:

list of ConfusionChart

Data for all available confusion charts for model.

get_all_lift_charts()

Retrieve a list of all lift charts available for the model.

Returns:

list of LiftChart

Data for all available model lift charts.

get_all_roc_curves()

Retrieve a list of all ROC curves available for the model.

Returns:

list of RocCurve

Data for all available model ROC curves.

get_confusion_chart(source)

Retrieve model’s confusion chart for the specified source.

Parameters:

source : str

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

ConfusionChart

Model ConfusionChart data

get_feature_impact()

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in additional, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:

feature_impacts : list of dict

The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:

ClientError (404)

If the feature impacts have not been computed.

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:

features : list of str

The names of the features used in the model.

Returns:

url : str

Permanent static hyperlink to this model at leaderboard.

get_lift_chart(source)

Retrieve model lift chart for the specified source.

Parameters:

source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

LiftChart

Model lift chart data

get_missing_report_info()

Retrieve a model missing data report on training data that can be used to understand missing values treatment in a model. Report consists of missing values reports for features which took part in modelling and are numeric or categorical.

Returns:

An iterable of MissingReportPerFeature

The queried model missing report, sorted by missing count (DESCENDING order).

get_model_blueprint_chart()

Retrieve a model blueprint chart that can be used to understand data flow in blueprint.

Returns:

ModelBlueprintChart

The queried model blueprint chart.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:

list of BlueprintTaskDocument

All documents available for the model.

get_or_request_feature_impact(max_wait=600)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:

max_wait : int, optional

The maximum time to wait for a requested feature impact job to complete before erroring

Returns:

feature_impacts : list of dict

The feature impact data. See get_feature_impact for the exact schema.

get_parameters()

Retrieve model parameters.

Returns:

ModelParameters

Model parameters for this model.

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:

ParetoFront

Model ParetoFront data

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:

prime_eligibility : dict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_roc_curve(source)

Retrieve model ROC curve for the specified source.

Parameters:

source : str

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

RocCurve

Model ROC curve data

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:rulesets : list of Ruleset
get_word_cloud(exclude_stop_words=False)

Retrieve a word cloud data for the model.

Parameters:

exclude_stop_words : bool, optional

Set to True if you want stopwords filtered out of response.

Returns:

WordCloud

Word cloud data for the model.

open_model_browser()

Opens model at project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:

job : Job

the job generating the rulesets

request_feature_impact()

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Returns:

job : Job

A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises:

JobAlreadyRequested (422)

If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None)

Train a new frozen model with parameters from this model

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

Parameters:

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

training_start_date : datetime.datetime, optional

the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

training_end_date : datetime.datetime, optional

the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:

model_job : ModelJob

the modeling job training a frozen model

request_predictions(dataset_id)

Request predictions against a previously uploaded dataset

Parameters:

dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

Returns:

job : PredictJob

The job computing the predictions

request_training_predictions(data_subset)

Start a job to build training predictions

Parameters:

data_subset : str

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL for all data available
  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT for all data except training set
  • dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only
Returns:

Job

an instance of created async job

request_transferable_export()

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Examples

model = datarobot.Model.get('p-id', 'l-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
set_prediction_threshold(threshold)

Set a custom prediction threshold for the model

May not be used once prediction_threshold_read_only is True for this model.

Parameters:

threshold : float

only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.13, only Eureqa blueprints (blueprints whose title starts with ‘Eureqa’) support Advanced Tuning.

Returns:

AdvancedTuningSession

Session for setting up and running Advanced Tuning on a model

train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None)

Train this model on a different featurelist or amount of data

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

Parameters:

featurelist_id : str, optional

the featurelist to use to train the model. If not specified, the featurelist of this model is used.

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:

job : ModelJob

the created job to build the model

unstar_model()

Unmark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

RatingTableModel API

class datarobot.models.RatingTableModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, rating_table_id=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None)

A model that has a rating table.

Attributes

id (str) the id of the model
project_id (str) the id of the project the model belongs to
processes (list of str) the processes used by the model
featurelist_name (str) the name of the featurelist used by the model
featurelist_id (str) the id of the featurelist used by the model
sample_pct (float or None) the percentage of the project dataset used in training the model. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date and training_end_date instead.
training_row_count (int or None) the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
training_duration (str or None) only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
training_start_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
training_end_date (datetime or None) only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
model_type (str) what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
model_category (str) what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
is_frozen (bool) whether this model is a frozen model
blueprint_id (str) the id of the blueprint used in this model
metrics (dict) a mapping from each metric to the model’s scores for that metric
rating_table_id (str) the id of the rating table that belongs to this model
monotonic_increasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
monotonic_decreasing_featurelist_id (str) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
supports_monotonic_constraints (bool) optional, whether this model supports enforcing monotonic constraints
is_starred (bool) whether this model marked as starred
prediction_threshold (float) for binary classification projects, the threshold used for predictions
prediction_threshold_read_only (bool) indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
classmethod get(project_id, model_id)

Retrieve a specific rating table model

If the project does not have a rating table, a ClientError will occur.

Parameters:

project_id : str

the id of the project the model belongs to

model_id : str

the id of the model to retrieve

Returns:

model : RatingTableModel

the model

classmethod create_from_rating_table(project_id, rating_table_id)

Creates a new model from a validated rating table record. The RatingTable must not be associated with an existing model.

Parameters:

project_id : str

the id of the project the rating table belongs to

rating_table_id : str

the id of the rating table to create this model from

Returns:

job: Job

an instance of created async job

Raises:

ClientError (422)

Raised if creating model from a RatingTable that failed validation

JobAlreadyRequested

Raised if creating model from a RatingTable that is already associated with a RatingTableModel

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.13, only Eureqa blueprints (blueprints whose title starts with ‘Eureqa’) support Advanced Tuning.

Parameters:

params : dict

Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

description : unicode

Human-readable string describing the newly advanced-tuned model

Returns:

ModelJob

The created job to build the model

cross_validate()

Run Cross Validation on this model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:

ModelJob

The created job to build the model

delete()

Delete a model from the project’s leaderboard.

download_export(filepath)

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters:

filepath : str

The path at which to save the exported model file.

download_scoring_code(file_name, source_code=False)

Download scoring code JAR.

Parameters:

file_name : str

File path where scoring code will be saved.

source_code : bool, optional

Set to True to download source code archive. It will not be executable.

fetch_resource_data(*args, **kwargs)

(Deprecated.) Used to acquire model data directly from its url.

Consider using get instead, as this is a convenience function used for development of datarobot

Parameters:

url : string

The resource we are acquiring

join_endpoint : boolean, optional

Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint

Returns:

model_data : dict

The queried model’s data

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.13, only Eureqa blueprints (blueprints whose title starts with ‘Eureqa’) support Advanced Tuning.

Returns:

dict

A dictionary describing the advanced-tuning parameters for the current model. Dictionary is of the following form:

{
    "tuningDescription": <User-specified description>  (or `None` if not available),
    "tuningParameters": [
        {
            "parameterName": <unicode : name of parameter (unique per task, see below)>,
            "parameterId": <unicode : opaque ID string uniquely identifying parameter>,
            "defaultValue": <* : default value of the parameter for the blueprint>,
            "currentValue": <* : value of the parameter that was used for this model>,
            "taskName": <unicode : name of the task that this parameter belongs to>,
            "constraints": {
                (...)
            }
        },
        (...)
    ]
}

The type of defaultValue and currentValue is defined by the constraints structure.

It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys.

The presence of a key indicates that the parameter may take on the specified type.

(If a key is absent, this means that the parameter may not take on the specified type.)

If a key on constraints is present, its value will be a dict containing

all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "length": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select:

    Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.

  • ascii:

    The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.

  • unicode:

    The parameter may be any Python unicode object.

  • int:

    The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].

  • float:

    The value may be an object of type float within the specified range (inclusive).

  • intList, floatList:

    The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple

keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts()

Retrieve a list of all confusion charts available for the model.

Returns:

list of ConfusionChart

Data for all available confusion charts for model.

get_all_lift_charts()

Retrieve a list of all lift charts available for the model.

Returns:

list of LiftChart

Data for all available model lift charts.

get_all_roc_curves()

Retrieve a list of all ROC curves available for the model.

Returns:

list of RocCurve

Data for all available model ROC curves.

get_confusion_chart(source)

Retrieve model’s confusion chart for the specified source.

Parameters:

source : str

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

ConfusionChart

Model ConfusionChart data

get_cross_validation_scores(partition=None, metric=None)

Returns a dictionary keyed by metric showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:

partition : float

optional, the id of the partition (1,2,3.0,4.0,etc...) to filter results by can be a whole number positive integer or float value.

metric: unicode

optional name of the metric to filter to resulting cross validation scores by

Returns:

cross_validation_scores: dict

A dictionary keyed by metric showing cross validation scores per partition.

get_feature_impact()

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in additional, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:

feature_impacts : list of dict

The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:

ClientError (404)

If the feature impacts have not been computed.

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:

features : list of str

The names of the features used in the model.

Returns:

url : str

Permanent static hyperlink to this model at leaderboard.

get_lift_chart(source)

Retrieve model lift chart for the specified source.

Parameters:

source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

LiftChart

Model lift chart data

get_missing_report_info()

Retrieve a model missing data report on training data that can be used to understand missing values treatment in a model. Report consists of missing values reports for features which took part in modelling and are numeric or categorical.

Returns:

An iterable of MissingReportPerFeature

The queried model missing report, sorted by missing count (DESCENDING order).

get_model_blueprint_chart()

Retrieve a model blueprint chart that can be used to understand data flow in blueprint.

Returns:

ModelBlueprintChart

The queried model blueprint chart.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:

list of BlueprintTaskDocument

All documents available for the model.

get_or_request_feature_impact(max_wait=600)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:

max_wait : int, optional

The maximum time to wait for a requested feature impact job to complete before erroring

Returns:

feature_impacts : list of dict

The feature impact data. See get_feature_impact for the exact schema.

get_parameters()

Retrieve model parameters.

Returns:

ModelParameters

Model parameters for this model.

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:

ParetoFront

Model ParetoFront data

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:

prime_eligibility : dict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_roc_curve(source)

Retrieve model ROC curve for the specified source.

Parameters:

source : str

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

Returns:

RocCurve

Model ROC curve data

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:rulesets : list of Ruleset
get_word_cloud(exclude_stop_words=False)

Retrieve a word cloud data for the model.

Parameters:

exclude_stop_words : bool, optional

Set to True if you want stopwords filtered out of response.

Returns:

WordCloud

Word cloud data for the model.

open_model_browser()

Opens model at project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:

job : Job

the job generating the rulesets

request_feature_impact()

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Returns:

job : Job

A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises:

JobAlreadyRequested (422)

If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None)

Train a new frozen model with parameters from this model

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

Parameters:

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

training_start_date : datetime.datetime, optional

the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

training_end_date : datetime.datetime, optional

the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:

model_job : ModelJob

the modeling job training a frozen model

request_frozen_model(sample_pct=None, training_row_count=None)

Train a new frozen model with parameters from this model

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:

sample_pct : float

optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.

training_row_count : int

(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns:

model_job : ModelJob

the modeling job training a frozen model

request_predictions(dataset_id)

Request predictions against a previously uploaded dataset

Parameters:

dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

Returns:

job : PredictJob

The job computing the predictions

request_training_predictions(data_subset)

Start a job to build training predictions

Parameters:

data_subset : str

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL for all data available
  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT for all data except training set
  • dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only
Returns:

Job

an instance of created async job

request_transferable_export()

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Examples

model = datarobot.Model.get('p-id', 'l-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
set_prediction_threshold(threshold)

Set a custom prediction threshold for the model

May not be used once prediction_threshold_read_only is True for this model.

Parameters:

threshold : float

only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.13, only Eureqa blueprints (blueprints whose title starts with ‘Eureqa’) support Advanced Tuning.

Returns:

AdvancedTuningSession

Session for setting up and running Advanced Tuning on a model

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Note

For datetime partitioned projects, use train_datetime instead.

Parameters:

sample_pct : float, optional

The amount of data to use for training, as a percentage of the project dataset from 0 to 100.

featurelist_id : str, optional

The identifier of the featurelist to use. If not defined, the featurelist of this model is used.

scoring_type : str, optional

Either SCORING_TYPE.validation or SCORING_TYPE.cross_validation. SCORING_TYPE.validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, SCORING_TYPE.cross_validation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.

training_row_count : int, optional

The number of rows to use to train the requested model.

monotonic_increasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:

model_job_id : str

id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Examples

project = Project.get('p-id')
model = Model.get('p-id', 'l-id')
model_job_id = model.train(training_row_count=project.max_train_rows)
train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None)

Train this model on a different featurelist or amount of data

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

Parameters:

featurelist_id : str, optional

the featurelist to use to train the model. If not specified, the featurelist of this model is used.

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:

job : ModelJob

the created job to build the model

unstar_model()

Unmark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Advanced Tuning API

class datarobot.models.advanced_tuning.AdvancedTuningSession(model)

A session enabling users to configure and run advanced tuning for a model.

Every model contains a set of one or more tasks. Every task contains a set of zero or more parameters. This class allows tuning the values of each parameter on each task of a model, before running that model.

This session is client-side only and is not persistent. Only the final model, constructed when run is called, is persisted on the DataRobot server.

Attributes

description (basestring) Description for the new advance-tuned model. Defaults to the same description as the base model.
get_task_names()

Get the list of task names that are available for this model

Returns:

list(basestring)

List of task names

get_parameter_names(task_name)

Get the list of parameter names available for a specific task

Returns:

list(basestring)

List of parameter names

set_parameter(value, task_name=None, parameter_name=None, parameter_id=None)

Set the value of a parameter to be used

The caller must supply enough of the optional arguments to this function to uniquely identify the parameter that is being set. For example, a less-common parameter name such as ‘building_block__complementary_error_function’ might only be used once (if at all) by a single task in a model. In which case it may be sufficient to simply specify ‘parameter_name’. But a more-common name such as ‘random_seed’ might be used by several of the model’s tasks, and it may be necessary to also specify ‘task_name’ to clarify which task’s random seed is to be set.

Parameters:

task_name : basestring

Name of the task whose parameter needs to be set

parameter_name : basestring

Name of the parameter to set

parameter_id : basestring

ID of the parameter to set

value : int, float, list, or basestring

New value for the parameter, with legal values determined by the parameter being set

Raises:

NoParametersFoundException

if no matching parameters are found.

NonUniqueParametersException

if multiple parameters matched the specified filtering criteria

get_parameters()

Returns the set of parameters available to this model

The returned parameters have one additional key, “value”, reflecting any new values that have been set in this AdvancedTuningSession. When the session is run, “value” will be used, or if it is unset, “current_value”.

Returns:

parameters : dict

“Parameters” dictionary, same as specified on Model.get_advanced_tuning_params.

An additional field is added per parameter to the ‘tuningParameters’ list in the dictionary:

value : int, float, list, or basestring

The current value of the parameter. None if none has been specified.

run()

Submit this model for Advanced Tuning.

Returns:

datarobot.models.modeljob.ModelJob

The created job to build the model