Project API

class datarobot.models.Project(id=None, project_name=None, mode=None, target=None, target_type=None, holdout_unlocked=None, metric=None, stage=None, partition=None, positive_class=None, created=None, advanced_options=None, recommender=None, max_train_pct=None, file_name=None)

A project built from a particular training dataset

Attributes

id (str) the id of the project
project_name (str) the name of the project
mode (int) the autopilot mode currently selected for the project - 0 for full autopilot, 1 for semi-automatic, and 2 for manual
target (str) the name of the selected target features
target_type (str) Indicating what kind of modeling is being done in this project Options are: ‘Regression’, ‘Binary’ (Binary classification), ‘Multiclass’ (Multiclass classification)
holdout_unlocked (bool) whether the holdout has been unlocked
metric (str) the selected project metric (e.g. LogLoss)
stage (str) the stage the project has reached - one of datarobot.enums.PROJECT_STAGE
partition (dict) information about the selected partitioning options
positive_class (str) for binary classification projects, the selected positive class; otherwise, None
created (datetime) the time the project was created
advanced_options (dict) information on the advanced options that were selected for the project settings, e.g. a weights column or a cap of the runtime of models that can advance autopilot stages
recommender (dict) information on the recommender settings of the project (i.e. whether it is a recommender project, or the id columns)
max_train_pct (float) the maximum percentage of the training dataset that can be used without going into the validation set
file_name (str) the name of the file uploaded for the project dataset
classmethod get(project_id)

Gets information about a project.

Parameters:

project_id : str

The identifier of the project you want to load.

Returns:

project : Project

The queried project

Examples

import datarobot as dr
p = dr.Project.get(project_id='54e639a18bd88f08078ca831')
p.id
>>>'54e639a18bd88f08078ca831'
p.project_name
>>>'Some project name'
classmethod create(sourcedata, project_name='Untitled Project', max_wait=600, read_timeout=600)

Creates a project with provided data.

Project creation is asynchronous process, which means that after initial request we will keep polling status of async process that is responsible for project creation until it’s finished. For SDK users this only means that this method might raise exceptions related to it’s async nature.

Parameters:

sourcedata : basestring, file or pandas.DataFrame

Data to be used for predictions. If string can be either a path to a local file, url to publicly available file or raw file content. If using a file, the filename must consist of ASCII characters only.

project_name : str, unicode, optional

The name to assign to the empty project.

max_wait : int, optional

Time in seconds after which project creation is considered unsuccessful

read_timeout: int

The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete

Returns:

project : Project

Instance with initialized data.

Raises:

InputNotUnderstoodError

Raised if sourcedata isn’t one of supported types.

AsyncFailureError

Polling for status of async process resulted in response with unsupported status code. Beginning in version 2.1, this will be ProjectAsyncFailureError, a subclass of AsyncFailureError

AsyncProcessUnsuccessfulError

Raised if project creation was unsuccessful

AsyncTimeoutError

Raised if project creation took more time, than specified by max_wait parameter

Examples

p = Project.create('/home/datasets/somedataset.csv',
                   project_name="New API project")
p.id
>>> '5921731dkqshda8yd28h'
p.project_name
>>> 'New API project'
classmethod encrypted_string(plaintext)

Sends a string to DataRobot to be encrypted

This is used for passwords that DataRobot uses to access external data sources

Parameters:

plaintext : str

The string to encrypt

Returns:

ciphertext : str

The encrypted string

classmethod create_from_mysql(server, database, table, user, port=None, prefetch=None, project_name=None, password=None, encrypted_password=None, max_wait=600)

Create a project from a MySQL table

Parameters:

server : str

The address of the MySQL server

database : str

The name of the database to use

table : str

The name of the table to fetch

user : str

The username to use to access the database

port : int, optional

The port to reach the MySQL server. If not specified, will use the default specified by DataRobot (3306).

prefetch : int, optional

If specified, specifies the number of rows to stream at a time from the database. If not specified, fetches all results at once. This is an optimization for reading from the database

project_name : str, optional

A name to give to the project

password : str, optional

The plaintext password for this user. Will be first encrypted with DataRobot. Only use this _or_ encrypted_password, not both.

encrypted_password : str, optional

The encrypted password for this user. Will be sent directly to DataRobot. Only use this _or_ password, not both.

max_wait : int

The maximum number of seconds to wait before giving up.

Returns:

Project

Raises:

ValueError

If both password and encrypted_password were used.

classmethod create_from_oracle(dbq, username, table, fetch_buffer_size=None, project_name=None, password=None, encrypted_password=None, max_wait=600)

Create a project from an Oracle table

Parameters:

dbq : str

tnsnames.ora entry in host:port/sid format

table : str

The name of the table to fetch

username : str

The username to use to access the database

fetch_buffer_size : int, optional

If specified, specifies the size of buffer that will be used to stream data from the database. Otherwise will use DataRobot default value.

project_name : str, optional

A name to give to the project

password : str, optional

The plaintext password for this user. Will be first encrypted with DataRobot. Only use this _or_ encrypted_password, not both.

encrypted_password : str, optional

The encrypted password for this user. Will be sent directly to DataRobot. Only use this _or_ password, not both.

max_wait : int

The maximum number of seconds to wait before giving up.

Returns:

Project

Raises:

ValueError

If both password and encrypted_password were used.

classmethod create_from_postgresql(server, database, table, username, port=None, driver=None, fetch=None, use_declare_fetch=None, project_name=None, password=None, encrypted_password=None, max_wait=600)

Create a project from a PostgreSQL table

Parameters:

server : str

The address of the PostgreSQL server

database : str

The name of the database to use

table : str

The name of the table to fetch

username : str

The username to use to access the database

port : int, optional

The port to reach the PostgreSQL server. If not specified, will use the default specified by DataRobot (5432).

driver : str, optional

Specify ODBC driver to use. If not specified - use DataRobot default. See the values within datarobot.enums.POSTGRESQL_DRIVER

fetch : int, optional

If specified, specifies the number of rows to stream at a time from the database. If not specified, use default value in DataRobot.

use_declare_fetch : bool, optional

On True, server will fetch result as available using DB cursor. On False it will try to retrieve entire result set - not recommended for big tables. If not specified - use the default specified by DataRobot.

project_name : str, optional

A name to give to the project

password : str, optional

The plaintext password for this user. Will be first encrypted with DataRobot. Only use this _or_ encrypted_password, not both.

encrypted_password : str, optional

The encrypted password for this user. Will be sent directly to DataRobot. Only use this _or_ password, not both.

max_wait : int

The maximum number of seconds to wait before giving up.

Returns:

Project

Raises:

ValueError

If both password and encrypted_password were used.

classmethod create_from_hdfs(url, port=None, project_name=None, max_wait=600)

Create a project from a datasource on a WebHDFS server.

Parameters:

url : str

The location of the WebHDFS file, both server and full path. Per the DataRobot specification, must begin with hdfs://

port : int, optional

The port to use. If not specified, will default to the server default (50070)

project_name : str, optional

A name to give to the project

max_wait : int

The maximum number of seconds to wait before giving up.

Returns:

Project

classmethod from_async(async_location, max_wait=600)

Given a temporary async status location poll for no more than max_wait seconds until the async process (project creation or setting the target, for example) finishes successfully, then return the ready project

Parameters:

async_location : str

The URL for the temporary async status resource. This is returned as a header in the response to a request that initiates an async process

max_wait : int

The maximum number of seconds to wait before giving up.

Returns:

project : Project

The project, now ready

Raises:

ProjectAsyncFailureError

If the server returned an unexpected response while polling for the asynchronous operation to resolve

AsyncProcessUnsuccessfulError

If the final result of the asynchronous operation was a failure

AsyncTimeoutError

If the asynchronous operation did not resolve within the time specified

classmethod start(sourcedata, target, project_name='Untitled Project', worker_count=None, metric=None, autopilot_on=True, blueprint_threshold=None, response_cap=None, partitioning_method=None, positive_class=None, target_type=None)

Chain together project creation, file upload, and target selection.

Parameters:

sourcedata : str or pandas.DataFrame

The path to the file to upload. Can be either a path to a local file or a publicly accessible URL. If the source is a DataFrame, it will be serialized to a temporary buffer. If using a file, the filename must consist of ASCII characters only.

target : str

The name of the target column in the uploaded file.

project_name : str

The project name.

Returns:

project : Project

The newly created and initialized project.

Other Parameters:
 

worker_count : int, optional

The number of workers that you want to allocate to this project.

metric : str, optional

The name of metric to use.

autopilot_on : boolean, default True

Whether or not to begin modeling automatically.

blueprint_threshold : int, optional

Number of hours the model is permitted to run. Minimum 1

response_cap : float, optional

Quantile of the response distribution to use for response capping Must be in range 0.5 .. 1.0

partitioning_method : PartitioningMethod object, optional

It should be one of PartitioningMethod object.

positive_class : str, float, or int; optional

Specifies a level of the target column that should be used for binary classification. Can be used to specify any of the available levels as the binary target - all other levels will be treated as a single negative class.

target_type : str, optional

Override the automaticially selected target_type. An example usage would be setting the target_type=’Mutliclass’ when you want to preform a multiclass classification task on a numeric column that has a low cardinality. You can use TARGET_TYPE enum.

Raises:

AsyncFailureError

Polling for status of async process resulted in response with unsupported status code

AsyncProcessUnsuccessfulError

Raised if project creation or target setting was unsuccessful

AsyncTimeoutError

Raised if project creation or target setting timed out

Examples

Project.start("./tests/fixtures/file.csv",
              "a_target",
              project_name="test_name",
              worker_count=4,
              metric="a_metric")
classmethod list(search_params=None)

Returns the projects associated with this account.

Parameters:

search_params : dict, optional.

If not None, the returned projects are filtered by lookup. Currently you can query projects by:

  • project_name
Returns:

projects : list of Project instances

Contains a list of projects associated with this user account.

Raises:

TypeError

Raised if search_params parameter is provided, but is not of supported type.

Examples

List all projects .. code-block:: python

p_list = Project.list() p_list >>> [Project(‘Project One’), Project(‘Two’)]

Search for projects by name .. code-block:: python

Project.list(search_params={‘project_name’: ‘red’}) >>> [Project(‘Predtime’), Project(‘Fred Project’)]
refresh()

Fetches the latest state of the project, and updates this object with that information. This is an inplace update, not a new object.

Returns:

self : Project

the now-updated project

delete()

Removes this project from your account.

set_target(target, mode='auto', metric=None, quickrun=None, worker_count=None, positive_class=None, partitioning_method=None, featurelist_id=None, advanced_options=None, max_wait=600, target_type=None)

Set target variable of an existing project that has a file uploaded to it.

Target setting is asynchronous process, which means that after initial request we will keep polling status of async process that is responsible for target setting until it’s finished. For SDK users this only means that this method might raise exceptions related to it’s async nature.

Parameters:

target : str

Name of variable.

mode : str, optional

You can use AUTOPILOT_MODE enum to choose between

  • AUTOPILOT_MODE.FULL_AUTO
  • AUTOPILOT_MODE.MANUAL
  • AUTOPILOT_MODE.QUICK

If unspecified, FULL_AUTO is used

metric : str, optional

Name of the metric to use for evaluating models. You can query the metrics available for the target by way of Project.get_metrics. If none is specified, then the default recommended by DataRobot is used.

quickrun : bool, optional

Deprecated - pass AUTOPILOT_MODE.QUICK as mode instead. Sets whether project should be run in quick run mode. This setting causes DataRobot to recommend a more limited set of models in order to get a base set of models and insights more quickly.

worker_count : int, optional

The number of concurrent workers to request for this project. If None, then the default is used

partitioning_method : PartitioningMethod object, optional

It should be one of PartitioningMethod object.

positive_class : str, float, or int; optional

Specifies a level of the target column that treated as the positive class for binary classification. May only be specified for binary classification targets.

featurelist_id : str, optional

Specifies which feature list to use.

advanced_options : AdvancedOptions, optional

Used to set advanced options of project creation.

max_wait : int, optional

Time in seconds after which target setting is considered unsuccessful.

target_type : str, optional

Override the automaticially selected target_type. An example usage would be setting the target_type=’Mutliclass’ when you want to preform a multiclass classification task on a numeric column that has a low cardinality. You can use TARGET_TYPE enum.

Returns:

project : Project

The instance with updated attributes.

Raises:

AsyncFailureError

Polling for status of async process resulted in response with unsupported status code

AsyncProcessUnsuccessfulError

Raised if target setting was unsuccessful

AsyncTimeoutError

Raised if target setting took more time, than specified by max_wait parameter

TypeError

Raised if advanced_options, partitioning_method or target_type is provided, but is not of supported type

See also

Project.start
combines project creation, file upload, and target selection
get_models(order_by=None, search_params=None, with_metric=None)

List all completed, successful models in the leaderboard for the given project.

Parameters:

order_by : str or list of strings, optional

If not None, the returned models are ordered by this attribute. If None, the default return is the order of default project metric.

Allowed attributes to sort by are:

  • metric
  • sample_pct

If the sort attribute is preceded by a hyphen, models will be sorted in descending order, otherwise in ascending order.

Multiple sort attributes can be included as a comma-delimited string or in a list e.g. order_by=`sample_pct,-metric` or order_by=[sample_pct, -metric]

Using metric to sort by will result in models being sorted according to their validation score by how well they did according to the project metric.

search_params : dict, optional.

If not None, the returned models are filtered by lookup. Currently you can query models by:

  • name
  • sample_pct

with_metric : str, optional.

If not None, the returned models will only have scores for this metric. Otherwise all the metrics are returned.

Returns:

models : a list of Model instances.

All of the models that have been trained in this project.

Raises:

TypeError

Raised if order_by or search_params parameter is provided, but is not of supported type.

Examples

Project.get('pid').get_models(order_by=['-sample_pct',
                              'metric'])

# Getting models that contain "Ridge" in name
# and with sample_pct more than 64
Project.get('pid').get_models(
    search_params={
        'sample_pct__gt': 64,
        'name': "Ridge"
    })
get_datetime_models()

List all models in the project as DatetimeModels

Requires the project to be datetime partitioned. If it is not, a ClientError will occur.

Returns:

models : list of DatetimeModel

the datetime models

get_prime_models()

List all DataRobot Prime models for the project Prime models were created to approximate a parent model, and have downloadable code.

Returns:models : list of PrimeModel
get_prime_files(parent_model_id=None, model_id=None)

List all downloadable code files from DataRobot Prime for the project

Parameters:

parent_model_id : str, optional

Filter for only those prime files approximating this parent model

model_id : str, optional

Filter for only those prime files with code for this prime model

Returns:

files: list of PrimeFile

get_datasets()

List all the datasets that have been uploaded for predictions

Returns:datasets : list of PredictionDataset instances
upload_dataset(sourcedata, max_wait=600, read_timeout=600, forecast_point=None)

Upload a new dataset to make predictions against

Parameters:

sourcedata : str, file or pandas.DataFrame

Data to be used for predictions. If string can be either a path to a local file, url to publicly available file or raw file content. If using a file on disk, the filename must consist of ASCII characters only.

max_wait : int, optional

The maximum number of seconds to wait for the uploaded dataset to be processed before raising an error

read_timeout : int, optional

The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete

forecast_point : datetime.datetime or None, optional

(New in version v2.8) May only be specified for time series projects, otherwise the upload will be rejected. The time in the dataset relative to which predictions should be generated in a time series project. See the time series documentation for more information.

Returns:

dataset : PredictionDataset

the newly uploaded dataset

Raises:

InputNotUnderstoodError

Raised if sourcedata isn’t one of supported types.

AsyncFailureError

Polling for status of async process resulted in response with unsupported status code

AsyncProcessUnsuccessfulError

Raised if project creation was unsuccessful (i.e. the server reported an error in uploading the dataset)

AsyncTimeoutError

Raised if processing the uploaded dataset took more time than specified by max_wait parameter

ValueError

Raised if forecast_point is provided, but is not of supported type

get_blueprints()

List all blueprints recommended for a project.

Returns:

menu : list of Blueprint instances

All the blueprints recommended by DataRobot for a project

get_features()

List all features for this project

Returns:

list of Feature

all features for this project

get_modeling_features(batch_size=None)

List all modeling features for this project

Only available once the target and partitioning settings have been set. For more information on the distinction between input and modeling features, see the time series documentation<input_vs_modeling>.

Parameters:

batch_size : int, optional

The number of features to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of features. If not specified, an appropriate default will be chosen by the server.

Returns:

list of ModelingFeature

All modeling features in this project

get_featurelists()

List all featurelists created for this project

Returns:

list of Featurelist

all featurelists created for this project

get_modeling_featurelists(batch_size=None)

List all modeling featurelists created for this project

Modeling featurelists can only be created after the target and partitioning options have been set for a project. In time series projects, these are the featurelists that can be used for modeling; in other projects, they behave the same as regular featurelists.

See the time series documentation for more information.

Parameters:

batch_size : int, optional

The number of featurelists to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of features. If not specified, an appropriate default will be chosen by the server.

Returns:

list of ModelingFeaturelist

all modeling featurelists in this project

create_type_transform_feature(name, parent_name, variable_type, replacement=None, date_extraction=None, max_wait=600)

Create a new feature by transforming the type of an existing feature in the project

Note that only the following transformations are supported:

  1. Text to categorical or numeric
  2. Categorical to text or numeric
  3. Numeric to categorical
  4. Date to categorical or numeric

Note

Special considerations when casting numeric to categorical

There are two parameters which can be used for variableType to convert numeric data to categorical levels. These differ in the assumptions they make about the input data, and are very important when considering the data that will be used to make predictions. The assumptions that each makes are:

  • categorical : The data in the column is all integral, and there are no missing values. If either of these conditions do not hold in the training set, the transformation will be rejected. During predictions, if any of the values in the parent column are missing, the predictions will error
  • categoricalInt : New in v2.6 All of the data in the column should be considered categorical in its string form when cast to an int by truncation. For example the value 3 will be cast as the string 3 and the value 3.14 will also be cast as the string 3. Further, the value -3.6 will become the string -3. Missing values will still be recognized as missing.

For convenience these are represented in the enum VARIABLE_TYPE_TRANSFORM with the names CATEGORICAL and CATEGORICAL_INT

Parameters:

name : str

The name to give to the new feature

parent_name : str

The name of the feature to transform

variable_type : str

The type the new column should have. See the values within datarobot.enums.VARIABLE_TYPE_TRANSFORM

replacement : str or float, optional

The value that missing or unconverable data should have

date_extraction : str, optional

Must be specified when parent_name is a date column (and left None otherwise). Specifies which value from a date should be extracted. See the list of values in datarobot.enums.DATE_EXTRACTION

max_wait : int, optional

The maximum amount of time to wait for DataRobot to finish processing the new column. This process can take more time with more data to process. If this operation times out, an AsyncTimeoutError will occur. DataRobot continues the processing and the new column may successfully be constucted.

Returns:

Feature

The data of the new Feature

Raises:

AsyncFailureError

If any of the responses from the server are unexpected

AsyncProcessUnsuccessfulError

If the job being waited for has failed or has been cancelled

AsyncTimeoutError

If the resource did not resolve in time

create_featurelist(name, features)

Creates a new featurelist

Parameters:

name : str

The name to give to this new featurelist. Names must be unique, so an error will be returned from the server if this name has already been used in this project.

features : list of str

The names of the features. Each feature must exist in the project already.

Returns:

Featurelist

newly created featurelist

Raises:

DuplicateFeaturesError

Raised if features variable contains duplicate features

Examples

project = Project.get('5223deadbeefdeadbeef0101')
flists = project.get_featurelists()

# Create a new featurelist using a subset of features from an
# existing featurelist
flist = flists[0]
features = flist.features[::2]  # Half of the features

new_flist = project.create_featurelist(name='Feature Subset',
                                       features=features)
create_modeling_featurelist(name, features)

Create a new modeling featurelist

Modeling featurelists can only be created after the target and partitioning options have been set for a project. In time series projects, these are the featurelists that can be used for modeling; in other projects, they behave the same as regular featurelists.

See the time series documentation for more information.

Parameters:

name : str

the name of the modeling featurelist to create. Names must be unique within the project, or the server will return an error.

features : list of str

the names of the features to include in the modeling featurelist. Each feature must be a modeling feature.

Returns:

featurelist : ModelingFeaturelist

the newly created featurelist

Examples

project = Project.get('1234deadbeeffeeddead4321')
modeling_features = project.get_modeling_features()
selected_features = [feat.name for feat in modeling_features][:5]  # select first five
new_flist = project.create_modeling_featurelist('Model This', selected_features)
get_metrics(feature_name)

Get the metrics recommended for modeling on the given feature.

Parameters:

feature_name : str

The name of the feature to query regarding which metrics are recommended for modeling.

Returns:

names : list of str

The names of the recommended metrics.

get_status()

Query the server for project status.

Returns:

status : dict

Contains:

  • autopilot_done : a boolean.
  • stage : a short string indicating which stage the project is in.
  • stage_description : a description of what stage means.

Examples

{"autopilot_done": False,
 "stage": "modeling",
 "stage_description": "Ready for modeling"}
pause_autopilot()

Pause autopilot, which stops processing the next jobs in the queue.

Returns:

paused : boolean

Whether the command was acknowledged

unpause_autopilot()

Unpause autopilot, which restarts processing the next jobs in the queue.

Returns:

unpaused : boolean

Whether the command was acknowledged.

start_autopilot(featurelist_id)

Starts autopilot on provided featurelist.

Only one autopilot can be running at the time. That’s why any ongoing autopilot on a different featurelist will be halted - modeling jobs in queue would not be affected but new jobs would not be added to queue by the halted autopilot.

Parameters:

featurelist_id : str

Identifier of featurelist that should be used for autopilot

Raises:

AppPlatformError

Raised if autopilot is currently running on or has already finished running on the provided featurelist. Also raised if project’s target was not selected.

train(trainable, sample_pct=None, featurelist_id=None, source_project_id=None, scoring_type=None)

Submit a job to the queue.

Note

If the project uses datetime partitioning, use train_datetime instead

Parameters:

trainable : str or Blueprint

For str, this is assumed to be a blueprint_id. If no source_project_id is provided, the project_id will be assumed to be the project that this instance represents.

Otherwise, for a Blueprint, it contains the blueprint_id and source_project_id that we want to use. featurelist_id will assume the default for this project if not provided, and sample_pct will default to using the maximum training value allowed for this project’s partition setup. source_project_id will be ignored if a Blueprint instance is used for this parameter

sample_pct : float, optional

The amount of training data to use. Defaults to the maximum amount available based on the project configuration.

featurelist_id : str, optional

The identifier of the featurelist to use. If not defined, the default for this project is used.

source_project_id : str, optional

Which project created this blueprint_id. If None, it defaults to looking in this project. Note that you must have read permissions in this project.

scoring_type : str, optional

Either SCORING_TYPE.validation or SCORING_TYPE.cross_validation. SCORING_TYPE.validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, SCORING_TYPE.cross_validation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.

Returns:

model_job_id : str

id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Examples

Use a Blueprint instance:

blueprint = project.get_blueprints()[0]
model_job_id = project.train(blueprint, sample_pct=64)

Use a blueprint_id, which is a string. In the first case, it is assumed that the blueprint was created by this project. If you are using a blueprint used by another project, you will need to pass the id of that other project as well.

blueprint_id = 'e1c7fc29ba2e612a72272324b8a842af'
project.train(blueprint, sample_pct=64)

another_project.train(blueprint, source_project_id=project.id)

You can also easily use this interface to train a new model using the data from an existing model:

model = project.get_models()[0]
model_job_id = project.train(model.blueprint.id,
                             sample_pct=100)
train_datetime(blueprint_id, featurelist_id=None, training_row_count=None, training_duration=None, source_project_id=None)

Create a new model in a datetime partitioned project

If the project is not datetime partitioned, an error will occur.

Parameters:

blueprint_id : str

the blueprint to use to train the model

featurelist_id : str, optional

the featurelist to use to train the model. If not specified, the project default will be used.

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

source_project_id : str, optional

the id of the project this blueprint comes from, if not this project. If left unspecified, the blueprint must belong to this project.

Returns:

job : ModelJob

the created job to build the model

blend(model_ids, blender_method)

Submit a job for creating blender model. Upon success, the new job will be added to the end of the queue.

Parameters:

model_ids : list of str

List of model ids that will be used to create blender. These models should have completed validation stage without errors, and can’t be blenders, DataRobot Prime or scaleout models.

blender_method : str

Chosen blend method, one from datarobot.enums.BLENDER_METHOD

Returns:

model_job : ModelJob

New ModelJob instance for the blender creation job in queue.

get_all_jobs(status=None)

Get a list of jobs

This will give Jobs representing any type of job, including modeling or predict jobs.

Parameters:

status : QUEUE_STATUS enum, optional

If called with QUEUE_STATUS.INPROGRESS, will return the jobs that are currently running.

If called with QUEUE_STATUS.QUEUE, will return the jobs that are waiting to be run.

If called with QUEUE_STATUS.ERROR, will return the jobs that have errored.

If no value is provided, will return all jobs currently running or waiting to be run.

Returns:

jobs : list

Each is an instance of Job

get_blenders()

Get a list of blender models.

Returns:

list of BlenderModel

list of all blender models in project.

get_frozen_models()

Get a list of frozen models

Returns:

list of FrozenModel

list of all frozen models in project.

get_model_jobs(status=None)

Get a list of modeling jobs

Parameters:

status : QUEUE_STATUS enum, optional

If called with QUEUE_STATUS.INPROGRESS, will return the modeling jobs that are currently running.

If called with QUEUE_STATUS.QUEUE, will return the modeling jobs that are waiting to be run.

If called with QUEUE_STATUS.ERROR, will return the modeling jobs that have errored.

If no value is provided, will return all modeling jobs currently running or waiting to be run.

Returns:

jobs : list

Each is an instance of ModelJob

get_predict_jobs(status=None)

Get a list of prediction jobs

Parameters:

status : QUEUE_STATUS enum, optional

If called with QUEUE_STATUS.INPROGRESS, will return the prediction jobs that are currently running.

If called with QUEUE_STATUS.QUEUE, will return the prediction jobs that are waiting to be run.

If called with QUEUE_STATUS.ERROR, will return the prediction jobs that have errored.

If called without a status, will return all prediction jobs currently running or waiting to be run.

Returns:

jobs : list

Each is an instance of PredictJob

wait_for_autopilot(check_interval=20.0, timeout=86400, verbosity=1)

Blocks until autopilot is finished. This will raise an exception if the autopilot mode is changed from AUTOPILOT_MODE.FULL_AUTO.

It makes API calls to sync the project state with the server and to look at which jobs are enqueued.

Parameters:

check_interval : float or int

The maximum time (in seconds) to wait between checks for whether autopilot is finished

timeout : float or int or None

After this long (in seconds), we give up. If None, never timeout.

verbosity:

This should be VERBOSITY_LEVEL.SILENT or VERBOSITY_LEVEL.VERBOSE. For VERBOSITY_LEVEL.SILENT, nothing will be displayed about progress. For VERBOSITY_LEVEL.VERBOSE, the number of jobs in progress or queued is shown. Note that new jobs are added to the queue along the way.

Raises:

AsyncTimeoutError

If autopilot does not finished in the amount of time specified

RuntimeError

If a condition is detected that indicates that autopilot will not complete on its own

rename(project_name)

Update the name of the project.

Parameters:

project_name : str

The new name

unlock_holdout()

Unlock the holdout for this project.

This will cause subsequent queries of the models of this project to contain the metric values for the holdout set, if it exists.

Take care, as this cannot be undone. Remember that best practice is to select a model before analyzing the model performance on the holdout set

set_worker_count(worker_count)

Sets the number of workers allocated to this project.

Note that this value is limited to the number allowed by your account. Lowering the number will not stop currently running jobs, but will cause the queue to wait for the appropriate number of jobs to finish before attempting to run more jobs.

Parameters:

worker_count : int

The number of concurrent workers to request from the pool of workers

Returns:

url : str

Permanent static hyperlink to a project leaderboard.

open_leaderboard_browser()

Opens project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

get_rating_table_models()

Get a list of models with a rating table

Returns:

list of RatingTableModel

list of all models with a rating table in project.

get_rating_tables()

Get a list of rating tables

Returns:

list of RatingTable

list of rating tables in project.