API Reference

Advanced Options

class datarobot.helpers.AdvancedOptions(weights=None, response_cap=None, blueprint_threshold=None, seed=None, smart_downsampled=False, majority_downsampling_rate=None, offset=None, exposure=None, accuracy_optimized_mb=None, scaleout_modeling_mode=None, events_count=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, only_include_monotonic_blueprints=None, allowed_pairwise_interaction_groups=None, blend_best_models=None, scoring_code_only=None, prepare_model_for_deployment=None, consider_blenders_in_recommendation=None, min_secondary_validation_model_count=None, shap_only_mode=None)

Used when setting the target of a project to set advanced options of modeling process.

Parameters:
weights : string, optional

The name of a column indicating the weight of each row

response_cap : float in [0.5, 1), optional

Quantile of the response distribution to use for response capping.

blueprint_threshold : int, optional

Number of hours models are permitted to run before being excluded from later autopilot stages Minimum 1

seed : int

a seed to use for randomization

smart_downsampled : bool

whether to use smart downsampling to throw away excess rows of the majority class. Only applicable to classification and zero-boosted regression projects.

majority_downsampling_rate : float

the percentage between 0 and 100 of the majority rows that should be kept. Specify only if using smart downsampling. May not cause the majority class to become smaller than the minority class.

offset : list of str, optional

(New in version v2.6) the list of the names of the columns containing the offset of each row

exposure : string, optional

(New in version v2.6) the name of a column containing the exposure of each row

accuracy_optimized_mb : bool, optional

(New in version v2.6) Include additional, longer-running models that will be run by the autopilot and available to run manually.

scaleout_modeling_mode : string, optional

(New in version v2.8) Specifies the behavior of Scaleout models for the project. This is one of datarobot.enums.SCALEOUT_MODELING_MODE. If datarobot.enums.SCALEOUT_MODELING_MODE.DISABLED, no models will run during autopilot or show in the list of available blueprints. Scaleout models must be disabled for some partitioning settings including projects using datetime partitioning or projects using offset or exposure columns. If datarobot.enums.SCALEOUT_MODELING_MODE.REPOSITORY_ONLY, scaleout models will be in the list of available blueprints but not run during autopilot. If datarobot.enums.SCALEOUT_MODELING_MODE.AUTOPILOT, scaleout models will run during autopilot and be in the list of available blueprints. Scaleout models are only supported in the Hadoop enviroment with the corresponding user permission set.

events_count : string, optional

(New in version v2.8) the name of a column specifying events count.

monotonic_increasing_featurelist_id : string, optional

(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overriden at model submission time if desired.

monotonic_decreasing_featurelist_id : string, optional

(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overriden at model submission time if desired.

only_include_monotonic_blueprints : bool, optional

(new in version 2.11) when true, only blueprints that support enforcing monotonic constraints will be available in the project or selected for the autopilot.

allowed_pairwise_interaction_groups : list of tuple, optional

(New in version v2.19) For GAM models - specify groups of columns for which pairwise interactions will be allowed. E.g. if set to [(A, B, C), (C, D)] then GAM models will allow interactions between columns AxB, BxC, AxC, CxD. All others (AxD, BxD) will not be considered.

blend_best_models: bool, optional

(New in version v2.19) blend best models during Autopilot run

scoring_code_only: bool, optional

(New in version v2.19) Keep only models that can be converted to scorable java code during Autopilot run

shap_only_mode: bool, optional

(New in version v2.21) Keep only models that support SHAP values during Autopilot run. Use SHAP-based insights wherever possible. Defaults to False.

prepare_model_for_deployment: bool, optional

(New in version v2.19) Prepare model for deployment during Autopilot run. The preparation includes creating reduced feature list models, retraining best model on higher sample size, computing insights and assigning “RECOMMENDED FOR DEPLOYMENT” label.

consider_blenders_in_recommendation: bool, optional

(New in version 2.22.0) Include blenders when selecting a model to prepare for deployment in an Autopilot Run. Defaults to False.

min_secondary_validation_model_count: int, optional

(New in version v2.19) Compute “All backtest” scores (datetime models) or cross validation scores for the specified number of highest ranking models on the Leaderboard, if over the Autopilot default.

Examples

import datarobot as dr
advanced_options = dr.AdvancedOptions(
    weights='weights_column',
    offset=['offset_column'],
    exposure='exposure_column',
    response_cap=0.7,
    blueprint_threshold=2,
    smart_downsampled=True, majority_downsampling_rate=75.0)

Batch Predictions

class datarobot.models.BatchPredictionJob(data, completed_resource_url=None)

A Batch Prediction Job is used to score large data sets on prediction servers using the Batch Prediction API.

Attributes:
id : str

the id of the job

classmethod score(deployment, intake_settings=None, output_settings=None, csv_settings=None, timeseries_settings=None, num_concurrent=None, passthrough_columns=None, passthrough_columns_set=None, max_explanations=None, threshold_high=None, threshold_low=None, prediction_warning_enabled=None, include_prediction_status=False, skip_drift_tracking=False, prediction_instance=None, abort_on_error=True, column_names_remapping=None, include_probabilities=True, include_probabilities_classes=None, download_timeout=120, download_read_timeout=660)

Create new batch prediction job, upload the scoring dataset and return a batch prediction job.

The default intake and output options are both localFile which requires the caller to pass the file parameter and either download the results using the download() method afterwards or pass a path to a file where the scored data will be downloaded to afterwards.

Returns:
BatchPredictionJob

Instance of BatchPredictonJob

Attributes:
deployment : Deployment or string ID

Deployment which will be used for scoring.

intake_settings : dict (optional)

A dict configuring how data is coming from. Supported options:

  • type : string, either localFile, s3, azure, gcp, dataset or jdbc

To score from a local file, add the this parameter to the settings:

  • file : file-like object, string path to file or a pandas.DataFrame of scoring data

To score from S3, add the next parameters to the settings:

  • url : string, the URL to score (e.g.: s3://bucket/key)
  • credential_id : string (optional)

To score from JDBC, add the next parameters to the settings:

  • data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).
  • query : string (optional if table, schema and/or catalog is specified), a self-supplied SELECT statement of the data set you wish to predict.
  • table : string (optional if query is specified), the name of specified database table.
  • schema : string (optional if query is specified), the name of specified database schema.
  • catalog : string (optional if query is specified), (new in v2.22) the name of specified database catalog.
  • fetch_size : int (optional), Changing the fetchSize can be used to balance throughput and memory usage.
  • credential_id : string (optional) the ID of the credentials holding information about a user with read-access to the JDBC data source (see Credentials).
output_settings : dict (optional)

A dict configuring how scored data is to be saved. Supported options:

  • type : string, either localFile, s3 or jdbc

To save scored data to a local file, add this parameters to the settings:

  • path : string (optional), path to save the scored data as CSV. If a path is not specified, you must download the scored data yourself with job.download(). If a path is specified, the call will block until the job is done. if there are no other jobs currently processing for the targeted prediction instance, uploading, scoring, downloading will happen in parallel without waiting for a full job to complete. Otherwise, it will still block, but start downloading the scored data as soon as it starts generating data. This is the fastest method to get predictions.

To save scored data to S3, add the next parameters to the settings:

  • url : string, the URL for storing the results (e.g.: s3://bucket/key)
  • credential_id : string (optional)

To save scored data to JDBC, add the next parameters to the settings:

  • data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).
  • table : string, the name of specified database table.
  • schema : string (optional), the name of specified database schema.
  • catalog : string (optional), (new in v2.22) the name of specified database catalog.
  • statement_type : string, the type of insertion statement to create, one of datarobot.enums.AVAILABLE_STATEMENT_TYPES.
  • update_columns : list(string) (optional), a list of strings containing those column names to be updated in case statement_type is set to a value related to update or upsert.
  • where_columns : list(string) (optional), a list of strings containing those column names to be selected in case statement_type is set to a value related to insert or update.
  • credential_id : string, the ID of the credentials holding information about a user with write-access to the JDBC data source (see Credentials).
csv_settings : dict (optional)

CSV intake and output settings. Supported options:

  • delimiter : string (optional, default ,), fields are delimited by this character. Use the string tab to denote TSV (TAB separated values). Must be either a one-character string or the string tab.
  • quotechar : string (optional, default ), fields containing the delimiter must be quoted using this character.
  • encoding : string (optional, default utf-8), encoding for the CSV files. For example (but not limited to): shift_jis, latin_1 or mskanji.
timeseries_settings : dict (optional)

Configuration for time-series scoring. Supported options:

  • type : string, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point or rows in the dataset without target. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.
  • forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions, by default value will be inferred from the dataset. May be passed if timeseries_settings.type=forecast.
  • predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
  • predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
  • relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
num_concurrent : int (optional)

Number of concurrent chunks to score simultaneously. Defaults to the available number of cores of the deployment. Lower it to leave resources for real-time scoring.

passthrough_columns : list[string] (optional)

Keep these columns from the scoring dataset in the scored dataset. This is useful for correlating predictions with source data.

passthrough_columns_set : string (optional)

To pass through every column from the scoring dataset, set this to all. Takes precedence over passthrough_columns if set.

max_explanations : int (optional)

Compute prediction explanations for this amount of features.

threshold_high : float (optional)

Only compute prediction explanations for predictions above this threshold. Can be combined with threshold_low.

threshold_low : float (optional)

Only compute prediction explanations for predictions below this threshold. Can be combined with threshold_high.

prediction_warning_enabled : boolean (optional)

Add prediction warnings to the scored data. Currently only supported for regression models.

include_prediction_status : boolean (optional)

Include the prediction_status column in the output, defaults to False.

skip_drift_tracking : boolean (optional)

Skips drift tracking on any predictions made from this job. This is useful when running non-production workloads to not affect drift tracking and cause unnecessary alerts. Defaults to False.

prediction_instance : dict (optional)

Defaults to instance specified by deployment or system configuration. Supported options:

  • hostName : string
  • sslEnabled : boolean (optional, default true). Set to false to run prediction requests from the batch prediction job without SSL.
  • datarobotKey : string (optional), if running a job against a prediction instance in the Managed AI Cloud, you must provide the organization level DataRobot-Key
  • apiKey : string (optional), by default, prediction requests will use the API key of the user that created the job. This allows you to make requests on behalf of other users.
abort_on_error : boolean (optional)

Default behaviour is to abort the job if too many rows fail scoring. This will free up resources for other jobs that may score successfully. Set to false to unconditionally score every row no matter how many errors are encountered. Defaults to True.

column_names_remapping : dict (optional)

Mapping with column renaming for output table. Defaults to {}.

include_probabilities : boolean (optional)

Flag that enables returning of all probability columns. Defaults to True.

include_probabilities_classes : list (optional)

List the subset of classes if a user doesn’t want all the classes. Defaults to [].

download_timeout : int (optional)

New in version 2.22.

If using localFile output, wait this many seconds for the download to become available. See download().

download_read_timeout : int (optional, default 660)

New in version 2.22.

If using localFile output, wait this many seconds for the server to respond between chunks.

classmethod score_to_file(deployment, intake_path, output_path, **kwargs)

Create new batch prediction job, upload the scoring dataset and download the scored CSV file concurrently.

Will block until the entire file is scored.

Refer to the create method for details on the other kwargs parameters.

Returns:
BatchPredictionJob

Instance of BatchPredictonJob

Attributes:
deployment : Deployment or string ID

Deployment which will be used for scoring.

intake_path : file-like object/string path to file/pandas.DataFrame

Scoring data

output_path : str

Filename to save the result under

classmethod score_s3(deployment, source_url, destination_url, credential=None, **kwargs)

Create new batch prediction job, with a scoring dataset from S3 and writing the result back to S3.

This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion().

Refer to the create method for details on the other kwargs parameters.

Returns:
BatchPredictionJob

Instance of BatchPredictonJob

Attributes:
deployment : Deployment or string ID

Deployment which will be used for scoring.

source_url : string

The URL for the prediction dataset (e.g.: s3://bucket/key)

destination_url : string

The URL for the scored dataset (e.g.: s3://bucket/key)

credential : string or Credential (optional)

The AWS Credential object or credential id

classmethod score_azure(deployment, source_url, destination_url, credential=None, **kwargs)

Create new batch prediction job, with a scoring dataset from Azure blob storage and writing the result back to Azure blob storage.

This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion().

Refer to the create method for details on the other kwargs parameters.

Returns:
BatchPredictionJob

Instance of BatchPredictonJob

Attributes:
deployment : Deployment or string ID

Deployment which will be used for scoring.

source_url : string

The URL for the prediction dataset (e.g.: https://storage_account.blob.endpoint/container/blob_name)

destination_url : string

The URL for the scored dataset (e.g.: https://storage_account.blob.endpoint/container/blob_name)

credential : string or Credential (optional)

The Azure Credential object or credential id

classmethod score_gcp(deployment, source_url, destination_url, credential=None, **kwargs)

Create new batch prediction job, with a scoring dataset from Google Cloud Storage and writing the result back to one.

This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion().

Refer to the create method for details on the other kwargs parameters.

Returns:
BatchPredictionJob

Instance of BatchPredictonJob

Attributes:
deployment : Deployment or string ID

Deployment which will be used for scoring.

source_url : string

The URL for the prediction dataset (e.g.: http(s)://storage.googleapis.com/[bucket]/[object])

destination_url : string

The URL for the scored dataset (e.g.: http(s)://storage.googleapis.com/[bucket]/[object])

credential : string or Credential (optional)

The GCP Credential object or credential id

classmethod score_from_existing(batch_prediction_job_id)

Create a new batch prediction job based on the settings from a previously created one

Returns:
BatchPredictionJob

Instance of BatchPredictonJob

Attributes:
batch_prediction_job_id: str

ID of the previous batch prediction job

classmethod get(batch_prediction_job_id)

Get batch prediction job

Returns:
BatchPredictionJob

Instance of BatchPredictonJob

Attributes:
batch_prediction_job_id: str

ID of batch prediction job

download(fileobj, timeout=120, read_timeout=660)

Downloads the CSV result of a prediction job

Attributes:
fileobj: file-like object

Write CSV data to this file-like object

timeout : int (optional, default 120)

New in version 2.22.

Seconds to wait for the download to become available.

The download will not be available before the job has started processing. In case other jobs are occupying the queue, processing may not start immediately.

If the timeout is reached, the job will be aborted and RuntimeError is raised.

Set to -1 to wait infinitely.

read_timeout : int (optional, default 660)

New in version 2.22.

Seconds to wait for the server to respond between chunks.

delete()

Cancel this job. If this job has not finished running, it will be removed and canceled.

get_status()

Get status of batch prediction job

Returns:
BatchPredictionJob status data

Dict with job status

classmethod list_by_status(statuses=None)

Get jobs collection for specific set of statuses

Returns:
BatchPredictionJob statuses

List of job statses dicts with specific statuses

Attributes:
statuses

List of statuses to filter jobs ([ABORTED|COMPLETED…]) if statuses is not provided, returns all jobs for user

Blueprint

class datarobot.models.Blueprint(id=None, processes=None, model_type=None, project_id=None, blueprint_category=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, recommended_featurelist_id=None)

A Blueprint which can be used to fit models

Attributes:
id : str

the id of the blueprint

processes : list of str

the processes used by the blueprint

model_type : str

the model produced by the blueprint

project_id : str

the project the blueprint belongs to

blueprint_category : str

(New in version v2.6) Describes the category of the blueprint and the kind of model it produces.

recommended_featurelist_id: str or null

(New in v2.18) The ID of the feature list recommended for this blueprint. If this field is not present, then there is no recommended feature list.

classmethod get(project_id, blueprint_id)

Retrieve a blueprint.

Parameters:
project_id : str

The project’s id.

blueprint_id : str

Id of blueprint to retrieve.

Returns:
blueprint : Blueprint

The queried blueprint.

get_chart()

Retrieve a chart.

Returns:
BlueprintChart

The current blueprint chart.

get_documents()

Get documentation for tasks used in the blueprint.

Returns:
list of BlueprintTaskDocument

All documents available for blueprint.

class datarobot.models.BlueprintTaskDocument(title=None, task=None, description=None, parameters=None, links=None, references=None)

Document describing a task from a blueprint.

Attributes:
title : str

Title of document.

task : str

Name of the task described in document.

description : str

Task description.

parameters : list of dict(name, type, description)

Parameters that task can receive in human-readable format.

links : list of dict(name, url)

External links used in document

references : list of dict(name, url)

References used in document. When no link available url equals None.

class datarobot.models.BlueprintChart(nodes, edges)

A Blueprint chart that can be used to understand data flow in blueprint.

Attributes:
nodes : list of dict (id, label)

Chart nodes, id unique in chart.

edges : list of tuple (id1, id2)

Directions of data flow between blueprint chart nodes.

classmethod get(project_id, blueprint_id)

Retrieve a blueprint chart.

Parameters:
project_id : str

The project’s id.

blueprint_id : str

Id of blueprint to retrieve chart.

Returns:
BlueprintChart

The queried blueprint chart.

to_graphviz()

Get blueprint chart in graphviz DOT format.

Returns:
unicode

String representation of chart in graphviz DOT language.

class datarobot.models.ModelBlueprintChart(nodes, edges)

A Blueprint chart that can be used to understand data flow in model. Model blueprint chart represents reduced repository blueprint chart with only elements that used to build this particular model.

Attributes:
nodes : list of dict (id, label)

Chart nodes, id unique in chart.

edges : list of tuple (id1, id2)

Directions of data flow between blueprint chart nodes.

classmethod get(project_id, model_id)

Retrieve a model blueprint chart.

Parameters:
project_id : str

The project’s id.

model_id : str

Id of model to retrieve model blueprint chart.

Returns:
ModelBlueprintChart

The queried model blueprint chart.

to_graphviz()

Get blueprint chart in graphviz DOT format.

Returns:
unicode

String representation of chart in graphviz DOT language.

Calendar File

class datarobot.CalendarFile(calendar_end_date=None, calendar_start_date=None, created=None, id=None, name=None, num_event_types=None, num_events=None, project_ids=None, role=None, multiseries_id_columns=None)

Represents the data for a calendar file.

For more information about calendar files, see the calendar documentation.

Attributes:
id : str

The id of the calendar file.

calendar_start_date : str

The earliest date in the calendar.

calendar_end_date : str

The last date in the calendar.

created : str

The date this calendar was created, i.e. uploaded to DR.

name : str

The name of the calendar.

num_event_types : int

The number of different event types.

num_events : int

The number of events this calendar has.

project_ids : list of strings

A list containing the projectIds of the projects using this calendar.

multiseries_id_columns: list of str or None

A list of columns in calendar which uniquely identify events for different series. Currently, only one column is supported. If multiseries id columns are not provided, calendar is considered to be single series.

role : str

The access role the user has for this calendar.

classmethod create(file_path, calendar_name=None, multiseries_id_columns=None)

Creates a calendar using the given file. For information about calendar files, see the calendar documentation

The provided file must be a CSV in the format:

Date,   Event,          Series ID
<date>, <event_type>,   <series id>
<date>, <event_type>,

A header row is required, and the “Series ID” column is optional.

Once the CalendarFile has been created, pass its ID with the DatetimePartitioningSpecification when setting the target for a time series project in order to use it.

Parameters:
file_path : string

A string representing a path to a local csv file.

calendar_name : string, optional

A name to assign to the calendar. Defaults to the name of the file if not provided.

multiseries_id_columns : list of str or None

a list of the names of multiseries id columns to define which series an event belongs to. Currently only one multiseries id column is supported.

Returns:
calendar_file : CalendarFile

Instance with initialized data.

Raises:
AsyncProcessUnsuccessfulError

Raised if there was an error processing the provided calendar file.

Examples

# Creating a calendar with a specified name
cal = dr.CalendarFile.create('/home/calendars/somecalendar.csv',
                                         calendar_name='Some Calendar Name')
cal.id
>>> 5c1d4904211c0a061bc93013
cal.name
>>> Some Calendar Name

# Creating a calendar without specifying a name
cal = dr.CalendarFile.create('/home/calendars/somecalendar.csv')
cal.id
>>> 5c1d4904211c0a061bc93012
cal.name
>>> somecalendar.csv

# Creating a calendar with multiseries id columns
cal = dr.CalendarFile.create('/home/calendars/somemultiseriescalendar.csv',
                             calendar_name='Some Multiseries Calendar Name',
                             multiseries_id_columns=['series_id'])
cal.id
>>> 5da9bb21962d746f97e4daee
cal.name
>>> Some Multiseries Calendar Name
cal.multiseries_id_columns
>>> ['series_id']
classmethod get(calendar_id)

Gets the details of a calendar, given the id.

Parameters:
calendar_id : str

The identifier of the calendar.

Returns:
calendar_file : CalendarFile

The requested calendar.

Raises:
DataError

Raised if the calendar_id is invalid, i.e. the specified CalendarFile does not exist.

Examples

cal = dr.CalendarFile.get(some_calendar_id)
cal.id
>>> some_calendar_id
classmethod list(project_id=None, batch_size=None)

Gets the details of all calendars this user has view access for.

Parameters:
project_id : str, optional

If provided, will filter for calendars associated only with the specified project.

batch_size : int, optional

The number of calendars to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of calendars. If not specified, an appropriate default will be chosen by the server.

Returns:
calendar_list : list of CalendarFile

A list of CalendarFile objects.

Examples

calendars = dr.CalendarFile.list()
len(calendars)
>>> 10
classmethod delete(calendar_id)

Deletes the calendar specified by calendar_id.

Parameters:
calendar_id : str

The id of the calendar to delete. The requester must have OWNER access for this calendar.

Raises:
ClientError

Raised if an invalid calendar_id is provided.

Examples

# Deleting with a valid calendar_id
status_code = dr.CalendarFile.delete(some_calendar_id)
status_code
>>> 204
dr.CalendarFile.get(some_calendar_id)
>>> ClientError: Item not found
classmethod update_name(calendar_id, new_calendar_name)

Changes the name of the specified calendar to the specified name. The requester must have at least READ_WRITE permissions on the calendar.

Parameters:
calendar_id : str

The id of the calendar to update.

new_calendar_name : str

The new name to set for the specified calendar.

Returns:
status_code : int

200 for success

Raises:
ClientError

Raised if an invalid calendar_id is provided.

Examples

response = dr.CalendarFile.update_name(some_calendar_id, some_new_name)
response
>>> 200
cal = dr.CalendarFile.get(some_calendar_id)
cal.name
>>> some_new_name
classmethod share(calendar_id, access_list)

Shares the calendar with the specified users, assigning the specified roles.

Parameters:
calendar_id : str

The id of the calendar to update

access_list:

A list of dr.SharingAccess objects. Specify None for the role to delete a user’s access from the specified CalendarFile. For more information on specific access levels, see the sharing documentation.

Returns:
status_code : int

200 for success

Raises:
ClientError

Raised if unable to update permissions for a user.

AssertionError

Raised if access_list is invalid.

Examples

# assuming some_user is a valid user, share this calendar with some_user
sharing_list = [dr.SharingAccess(some_user_username,
                                 dr.enums.SHARING_ROLE.READ_WRITE)]
response = dr.CalendarFile.share(some_calendar_id, sharing_list)
response.status_code
>>> 200

# delete some_user from this calendar, assuming they have access of some kind already
delete_sharing_list = [dr.SharingAccess(some_user_username,
                                        None)]
response = dr.CalendarFile.share(some_calendar_id, delete_sharing_list)
response.status_code
>>> 200

# Attempt to add an invalid user to a calendar
invalid_sharing_list = [dr.SharingAccess(invalid_username,
                                         dr.enums.SHARING_ROLE.READ_WRITE)]
dr.CalendarFile.share(some_calendar_id, invalid_sharing_list)
>>> ClientError: Unable to update access for this calendar
classmethod get_access_list(calendar_id, batch_size=None)

Retrieve a list of users that have access to this calendar.

Parameters:
calendar_id : str

The id of the calendar to retrieve the access list for.

batch_size : int, optional

The number of access records to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of calendars. If not specified, an appropriate default will be chosen by the server.

Returns:
access_control_list : list of SharingAccess

A list of SharingAccess objects.

Raises:
ClientError

Raised if user does not have access to calendar or calendar does not exist.

Compliance Documentation Templates

class datarobot.models.compliance_doc_template.ComplianceDocTemplate(id, creator_id, creator_username, name, org_id=None, sections=None)

A compliance documentation template. Templates are used to customize contents of ComplianceDocumentation.

New in version v2.14.

Notes

Each section dictionary has the following schema:

  • title : title of the section
  • type : type of section. Must be one of “datarobot”, “user” or “table_of_contents”.

Each type of section has a different set of attributes described bellow.

Section of type "datarobot" represent a section owned by DataRobot. DataRobot sections have the following additional attributes:

  • content_id : The identifier of the content in this section. You can get the default template with get_default for a complete list of possible DataRobot section content ids.
  • sections : list of sub-section dicts nested under the parent section.

Section of type "user" represent a section with user-defined content. Those sections may contain text generated by user and have the following additional fields:

  • regularText : regular text of the section, optionally separated by \n to split paragraphs.
  • highlightedText : highlighted text of the section, optionally separated by \n to split paragraphs.
  • sections : list of sub-section dicts nested under the parent section.

Section of type "table_of_contents" represent a table of contents and has no additional attributes.

Attributes:
id : str

the id of the template

name : str

the name of the template.

creator_id : str

the id of the user who created the template

creator_username : str

username of the user who created the template

org_id : str

the id of the organization the template belongs to

sections : list of dicts

the sections of the template describing the structure of the document. Section schema is described in Notes section above.

classmethod get_default(template_type=None)

Get a default DataRobot template. This template is used for generating compliance documentation when no template is specified.

Parameters:
template_type : str or None

Type of the template. Currently supported values are “normal” and “time_series”

Returns:
template : ComplianceDocTemplate

the default template object with sections attribute populated with default sections.

classmethod create_from_json_file(name, path)

Create a template with the specified name and sections in a JSON file.

This is useful when working with sections in a JSON file. Example:

default_template = ComplianceDocTemplate.get_default()
default_template.sections_to_json_file('path/to/example.json')
# ... edit example.json in your editor
my_template = ComplianceDocTemplate.create_from_json_file(
    name='my template',
    path='path/to/example.json'
)
Parameters:
name : str

the name of the template. Must be unique for your user.

path : str

the path to find the JSON file at

Returns:
template : ComplianceDocTemplate

the created template

classmethod create(name, sections)

Create a template with the specified name and sections.

Parameters:
name : str

the name of the template. Must be unique for your user.

sections : list

list of section objects

Returns:
template : ComplianceDocTemplate

the created template

classmethod get(template_id)

Retrieve a specific template.

Parameters:
template_id : str

the id of the template to retrieve

Returns:
template : ComplianceDocTemplate

the retrieved template

classmethod list(name_part=None, limit=None, offset=None)

Get a paginated list of compliance documentation template objects.

Parameters:
name_part : str or None

Return only the templates with names matching specified string. The matching is case-insensitive.

limit : int

The number of records to return. The server will use a (possibly finite) default if not specified.

offset : int

The number of records to skip.

Returns:
templates : list of ComplianceDocTemplate

the list of template objects

sections_to_json_file(path, indent=2)

Save sections of the template to a json file at the specified path

Parameters:
path : str

the path to save the file to

indent : int

indentation to use in the json file.

update(name=None, sections=None)

Update the name or sections of an existing doc template.

Note that default or non-existent templates can not be updated.

Parameters:
name : str, optional

the new name for the template

sections : list of dicts

list of sections

delete()

Delete the compliance documentation template.

Compliance Documentation

class datarobot.models.compliance_documentation.ComplianceDocumentation(project_id, model_id, template_id=None)

A compliance documentation object.

New in version v2.14.

Examples

doc = ComplianceDocumentation('project-id', 'model-id')
job = doc.generate()
job.wait_for_completion()
doc.download('example.docx')
Attributes:
project_id : str

the id of the project

model_id : str

the id of the model

template_id : str or None

optional id of the template for the generated doc. See documentation for ComplianceDocTemplate for more info.

generate()

Start a job generating model compliance documentation.

Returns:
Job

an instance of an async job

download(filepath)

Download the generated compliance documentation file and save it to the specified path. The generated file has a DOCX format.

Parameters:
filepath : str

A file path, e.g. “/path/to/save/compliance_documentation.docx”

Confusion Chart

class datarobot.models.confusion_chart.ConfusionChart(source, data, source_model_id)

Confusion Chart data for model.

Notes

ClassMetrics is a dict containing the following:

  • class_name (string) name of the class
  • actual_count (int) number of times this class is seen in the validation data
  • predicted_count (int) number of times this class has been predicted for the validation data
  • f1 (float) F1 score
  • recall (float) recall score
  • precision (float) precision score
  • was_actual_percentages (list of dict) one vs all actual percentages in format specified below.
    • other_class_name (string) the name of the other class
    • percentage (float) the percentage of the times this class was predicted when is was actually class (from 0 to 1)
  • was_predicted_percentages (list of dict) one vs all predicted percentages in format specified below.
    • other_class_name (string) the name of the other class
    • percentage (float) the percentage of the times this class was actual predicted (from 0 to 1)
  • confusion_matrix_one_vs_all (list of list) 2d list representing 2x2 one vs all matrix.
    • This represents the True/False Negative/Positive rates as integer for each class. The data structure looks like:
    • [ [ True Negative, False Positive ], [ False Negative, True Positive ] ]
Attributes:
source : str

Confusion Chart data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.

raw_data : dict

All of the raw data for the Confusion Chart

confusion_matrix : list of list

The NxN confusion matrix

classes : list

The names of each of the classes

class_metrics : list of dicts

List of dicts with schema described as ClassMetrics above.

source_model_id : str

ID of the model this Confusion chart represents; in some cases, insights from the parent of a frozen model may be used

Credentials

class datarobot.models.Credential(credential_id=None, name=None, credential_type=None, creation_date=None, description=None)
classmethod list()

Returns list of available credentials.

Returns:
credentials : list of Credential instances

contains a list of available credentials.

Examples

>>> import datarobot as dr
>>> data_sources = dr.Credential.list()
>>> data_sources
[
    Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'),
    Credential('5e42cc4dcf8a5f3256865840', 'my_jdbc_cred', 'jdbc'),
]
classmethod get(credential_id)

Gets the Credential.

Parameters:
credential_id : str

the identifier of the credential.

Returns:
credential : Credential

the requested credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.get('5a8ac9ab07a57a0001be501f')
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'),
delete()

Deletes the Credential the store.

Parameters:
credential_id : str

the identifier of the credential.

Returns:
credential : Credential

the requested credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.get('5a8ac9ab07a57a0001be501f')
>>> cred.delete()
classmethod create_basic(name, user, password, description=None)

Creates the credentials.

Parameters:
name : str

the name to use for this set of credentials.

user : str

the username to store for this set of credentials.

password : str

the password to store for this set of credentials.

description : str, optional

the description to use for this set of credentials.

Returns:
credential : Credential

the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_basic(
...     name='my_basic_cred',
...     user='username',
...     password='password',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_basic_cred', 'basic'),
classmethod create_oauth(name, token, refresh_token, description=None)

Creates the OAUTH credentials.

Parameters:
name : str

the name to use for this set of credentials.

token: str

the OAUTH token

refresh_token: str

The OAUTH token

description : str, optional

the description to use for this set of credentials.

Returns:
credential : Credential

the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_oauth(
...     name='my_oauth_cred',
...     token='XXX',
...     refresh_token='YYY',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_oauth_cred', 'oauth'),
classmethod create_s3(name, aws_access_key_id=None, aws_secret_access_key=None, aws_session_token=None, description=None)

Creates the S3 credentials.

Parameters:
name : str

the name to use for this set of credentials.

aws_access_key_id : str, optional

the AWS access key id.

aws_secret_access_key : str, optional

the AWS secret access key.

aws_session_token : str, optional

the AWS session token.

description : str, optional

the description to use for this set of credentials.

Returns:
credential : Credential

the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_s3(
...     name='my_s3_cred',
...     aws_access_key_id='XXX',
...     aws_secret_access_key='YYY',
...     aws_session_token='ZZZ',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'),
classmethod create_azure(name, azure_connection_string, description=None)

Creates the Azure storage credentials.

Parameters:
name : str

the name to use for this set of credentials.

azure_connection_string : str

the Azure connection string.

description : str, optional

the description to use for this set of credentials.

Returns:
credential : Credential

the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_azure(
...     name='my_azure_cred',
...     azure_connection_string='XXX',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_azure_cred', 'azure'),
classmethod create_gcp(name, gcp_key=None, description=None)

Creates the GCP credentials.

Parameters:
name : str

the name to use for this set of credentials.

gcp_key : str

the GCP key in json format.

description : str, optional

the description to use for this set of credentials.

Returns:
credential : Credential

the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_gcp(
...     name='my_gcp_cred',
...     gcp_key='XXX',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_gcp_cred', 'gcp'),

Custom Models

class datarobot.models.custom_model_version.CustomModelFileItem(id, file_name, file_path, file_source, created_at=None)

A file item attached to a DataRobot custom model version.

New in version v2.21.

Attributes:
id: str

id of the file item

file_name: str

name of the file item

file_path: str

path of the file item

file_source: str

source of the file item

created_at: str, optional

ISO-8601 formatted timestamp of when the version was created

class datarobot.CustomInferenceImage(**kwargs)

An image of a custom model.

New in version v2.21.

Attributes:
id: str

image id

custom_model: dict

dict with 2 keys: id and name, where id is the ID of the custom model and name is the model name

custom_model_version: dict

dict with 2 keys: id and label, where id is the ID of the custom model version and label is the version label

execution_environment: dict

dict with 2 keys: id and name, where id is the ID of the execution environment and name is the environment name

execution_environment_version: dict

dict with 2 keys: id and label, where id is the ID of the execution environment version and label is the version label

latest_test: dict, optional

dict with 3 keys: id, status and completedAt, where id is the ID of the latest test, status is the testing status and completedAt is ISO-8601 formatted timestamp of when the testing was completed

classmethod create(custom_model_id, custom_model_version_id, environment_id, environment_version_id=None)

Create a custom model image.

New in version v2.21.

Parameters:
custom_model_id: str

the id of the custom model

custom_model_version_id: str

the id of the custom model version

environment_id: str

the id of the execution environment

environment_version_id: str, optional

the id of the execution environment version

Returns:
CustomInferenceImage

created custom model image

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod list(testing_status=None, custom_model_id=None, custom_model_version_id=None, environment_id=None, environment_version_id=None)

List custom model images.

New in version v2.21.

Parameters:
testing_status: str, optional

the testing status to filter results by

custom_model_id: str, optional

the id of the custom model

custom_model_version_id: str, optional

the id of the custom model version

environment_id: str, optional

the id of the execution environment

environment_version_id: str, optional

the id of the execution environment version

Returns:
List[CustomModelImage]

a list of custom model images

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod get(custom_model_image_id)

Get custom model image by id.

New in version v2.21.

Parameters:
custom_model_image_id: str

the id of the custom model image

Returns:
CustomInferenceImage

retrieved custom model image

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

refresh()

Update custom inference image with the latest data from server.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

get_feature_impact(with_metadata=False)

Get custom model feature impact.

New in version v2.21.

Parameters:
with_metadata : bool

The flag indicating if the result should include the metadata as well.

Returns:
feature_impacts : list of dict

The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

calculate_feature_impact(max_wait=600)

Calculate custom model feature impact.

New in version v2.22.

Parameters:
max_wait: int, optional

max time to wait for feature impact calculation. If set to None - method will return without waiting. Defaults to 10 min

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

class datarobot.CustomInferenceModel(*args, **kwargs)

A custom inference model.

New in version v2.21.

Attributes:
id: str

id of the custom model

name: str

name of the custom model

language: str

programming language of the custom model. Can be “python”, “r”, “java” or “other”

description: str

description of the custom model

target_type: datarobot.TARGET_TYPE

custom model target type. Can be datarobot.TARGET_TYPE.BINARY or datarobot.TARGET_TYPE.REGRESSION

latest_version: datarobot.CustomModelVersion or None

latest version of the custom model if the model has a latest version

deployments_count: int

number of a deployments of the custom models

target_name: str

custom model target name

positive_class_label: str

for binary classification projects, a label of a positive class

negative_class_label: str

for binary classification projects, a label of a negative class

prediction_threshold: float

for binary classification projects, a threshold used for predictions

training_data_assignment_in_progress: bool

flag describing if training data assignment is in progress

training_dataset_id: str, optional

id of a dataset assigned to the custom model

training_dataset_version_id: str, optional

id of a dataset version assigned to the custom model

training_data_file_name: str, optional

name of assigned training data file

training_data_partition_column: str, optional

name of a partition column in a training dataset assigned to the custom model

created_by: str

username of a user who user who created the custom model

updated_at: str

ISO-8601 formatted timestamp of when the custom model was updated

created_at: str

ISO-8601 formatted timestamp of when the custom model was created

classmethod list(is_deployed=None, search_for=None, order_by=None)

List custom inference models available to the user.

New in version v2.21.

Parameters:
is_deployed: bool, optional

flag for filtering custom inference models. If set to True, only deployed custom inference models are returned. If set to False, only not deployed custom inference models are returned

search_for: str, optional

string for filtering custom inference models - only custom inference models that contain the string in name or description will be returned. If not specified, all custom models will be returned

order_by: str, optional

property to sort custom inference models by. Supported properties are “created” and “updated”. Prefix the attribute name with a dash to sort in descending order, e.g. order_by=’-created’. By default, the order_by parameter is None which will result in custom models being returned in order of creation time descending

Returns:
List[CustomInferenceModel]

a list of custom inference models.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod get(custom_model_id)

Get custom inference model by id.

New in version v2.21.

Parameters:
custom_model_id: str

id of the custom inference model

Returns:
CustomInferenceModel

retrieved custom inference model

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

download_latest_version(file_path)

Download the latest custom inference model version.

New in version v2.21.

Parameters:
file_path: str

path to create a file with custom model version content

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

classmethod create(name, target_type, target_name, language=None, description=None, positive_class_label=None, negative_class_label=None, prediction_threshold=None)

Create a custom inference model.

New in version v2.21.

Parameters:
name: str

name of the custom inference model

target_type: datarobot.TARGET_TYPE

target type of the custom inference model. Can be datarobot.TARGET_TYPE.BINARY or datarobot.TARGET_TYPE.REGRESSION

language: str, optional

programming language of the custom learning model

description: str, optional

description of the custom learning model

positive_class_label: str, optional

custom inference model positive class label

negative_class_label: str, optional

custom inference model negative class label

prediction_threshold: float, optional

custom inference model prediction threshold

Returns:
CustomInferenceModel

created a custom inference model

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod copy_custom_model(custom_model_id)

Create a custom inference model by copying existing one.

New in version v2.21.

Parameters:
custom_model_id: str

id of the custom inference model to copy

Returns:
CustomInferenceModel

created a custom inference model

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

update(name=None, language=None, description=None, target_name=None, positive_class_label=None, negative_class_label=None, prediction_threshold=None)

Update custom inference model properties.

New in version v2.21.

Parameters:
name: str, optional

new custom inference model name

language: str, optional

new custom inference model programming language

description: str, optional

new custom inference model description

target_name: str, optional

new custom inference model target name

positive_class_label: str, optional

new custom inference model positive class label

negative_class_label: str, optional

new custom inference model negative class label

prediction_threshold: float, optional

new custom inference model prediction threshold

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

refresh()

Update custom inference model with the latest data from server.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

delete()

Delete custom inference model.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

assign_training_data(dataset_id, partition_column=None, max_wait=600)

Assign training data to the custom inference model.

New in version v2.21.

Parameters:
dataset_id: str

the id of the training dataset to be assigned

partition_column: str, optional

name of a partition column in the training dataset

max_wait: int, optional

max time to wait for a training data assignment. If set to None - method will return without waiting. Defaults to 10 min

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

class datarobot.CustomModelTest(**kwargs)

An custom model test.

New in version v2.21.

Attributes:
id: str

test id

dataset_id: str

id of a dataset used for testing

dataset_version_id: str

id of a dataset version used for testing

custom_model_image_id: str

id of a custom model image

image_type: str

the type of the image, either CUSTOM_MODEL_IMAGE_TYPE.CUSTOM_MODEL_IMAGE if the testing attempt is using a CustomModelImage as its model or CUSTOM_MODEL_IMAGE_TYPE.CUSTOM_MODEL_VERSION if the testing attempt is using a CustomModelVersion with dependency management

overall_status: str

a string representing testing status. Status can be - ‘not_tested’: the check not run - ‘failed’: the check failed - ‘succeeded’: the check succeeded - ‘warning’: the check resulted in a warning, or in non-critical failure - ‘in_progress’: the check is in progress

detailed_status: dict

detailed testing status - maps the testing types to their status and message. The keys of the dict are one of ‘errorCheck’, ‘nullValueImputation’, ‘longRunningService’, ‘sideEffects’. The values are dict with ‘message’ and ‘status’ keys.

created_by: str

a user who created a test

completed_at: str, optional

ISO-8601 formatted timestamp of when the test has completed

created_at: str, optional

ISO-8601 formatted timestamp of when the version was created

classmethod create(custom_model_id, custom_model_version_id, dataset_id, environment_id=None, environment_version_id=None, max_wait=600)

Create and start a custom model test.

New in version v2.21.

Parameters:
custom_model_id: str

the id of the custom model

custom_model_version_id: str

the id of the custom model version

dataset_id: str

the id of the testing dataset

environment_id: str, optional

the id of the execution environment. If specified, the environment will be used as is; if the custom model version has dependencies, they will not be installed at runtime.

environment_version_id: str, optional

the id of the execution environment version

max_wait: int, optional

max time to wait for a test completion. If set to None - method will return without waiting.

Returns:
CustomModelTest

created custom model test

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod list(custom_model_id)

List custom model tests.

New in version v2.21.

Parameters:
custom_model_id: str

the id of the custom model

Returns:
List[CustomModelTest]

a list of custom model tests

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod get(custom_model_test_id)

Get custom model test by id.

New in version v2.21.

Parameters:
custom_model_test_id: str

the id of the custom model test

Returns:
CustomModelTest

retrieved custom model test

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

get_log()

Get log of a custom model test.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

get_log_tail()

Get log tail of a custom model test.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

cancel()

Cancel custom model test that is in progress.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

refresh()

Update custom model test with the latest data from server.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

class datarobot.CustomModelVersion(**kwargs)

A version of a DataRobot custom model.

New in version v2.21.

Attributes:
id: str

id of the custom model version

custom_model_id: str

id of the custom model

version_minor: int

a minor version number of custom model version

version_major: int

a major version number of custom model version

is_frozen: bool

a flag if the custom model version is frozen

items: List[CustomModelFileItem]

a list of file items attached to the custom model version

base_environment_id: str

id of the environment to use with the model

label: str, optional

short human readable string to label the version

description: str, optional

custom model version description

created_at: str, optional

ISO-8601 formatted timestamp of when the version was created

dependencies: List[CustomDependency]

the parsed dependencies of the custom model version if the version has a valid requirements.txt file

classmethod create_clean(custom_model_id, base_environment_id, is_major_update=True, folder_path=None, files=None)

Create a custom model version without files from previous versions.

New in version v2.21.

Parameters:
custom_model_id: str

the id of the custom model

base_environment_id: str

the id of the base environment to use with the custom model version

is_major_update: bool

the flag defining if a custom model version will be a minor or a major version. Default to True

folder_path: str, optional

the path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path

files: list, optional

the list of tuples, where values in each tuple are the local filesystem path and the path the file should be placed in the model. if list is of strings, then basenames will be used for tuples Example: [(“/home/user/Documents/myModel/file1.txt”, “file1.txt”), (“/home/user/Documents/myModel/folder/file2.txt”, “folder/file2.txt”)] or [“/home/user/Documents/myModel/file1.txt”, “/home/user/Documents/myModel/folder/file2.txt”]

Returns:
CustomModelVersion

created custom model version

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod create_from_previous(custom_model_id, base_environment_id, is_major_update=True, folder_path=None, files=None, files_to_delete=None)

Create a custom model version containing files from a previous version.

New in version v2.21.

Parameters:
custom_model_id: str

the id of the custom model

base_environment_id: str

the id of the base environment to use with the custom model version

is_major_update: bool, optional

the flag defining if a custom model version will be a minor or a major version. Default to True

folder_path: str, optional

the path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path

files: list, optional

the list of tuples, where values in each tuple are the local filesystem path and the path the file should be placed in the model. if list is of strings, then basenames will be used for tuples Example: [(“/home/user/Documents/myModel/file1.txt”, “file1.txt”), (“/home/user/Documents/myModel/folder/file2.txt”, “folder/file2.txt”)] or [“/home/user/Documents/myModel/file1.txt”, “/home/user/Documents/myModel/folder/file2.txt”]

files_to_delete: list, optional

the list of a file items ids to be deleted Example: [“5ea95f7a4024030aba48e4f9”, “5ea6b5da402403181895cc51”]

Returns:
CustomModelVersion

created custom model version

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod list(custom_model_id)

List custom model versions.

New in version v2.21.

Parameters:
custom_model_id: str

the id of the custom model

Returns:
List[CustomModelVersion]

a list of custom model versions

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod get(custom_model_id, custom_model_version_id)

Get custom model version by id.

New in version v2.21.

Parameters:
custom_model_id: str

the id of the custom model

custom_model_version_id: str

the id of the custom model version to retrieve

Returns:
CustomModelVersion

retrieved custom model version

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

download(file_path)

Download custom model version.

New in version v2.21.

Parameters:
file_path: str

path to create a file with custom model version content

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

update(description)

Update custom model version properties.

New in version v2.21.

Parameters:
description: str

new custom model version description

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

refresh()

Update custom model version with the latest data from server.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

class datarobot.CustomModelVersionDependencyBuild(**kwargs)

Metadata about a DataRobot custom model version’s dependency build

New in version v2.22.

Attributes:
custom_model_id: str

id of the custom model

custom_model_version_id: str

id of the custom model version

build_status: str

the status of the custom model version’s dependency build

started_at: str

ISO-8601 formatted timestamp of when the build was started

completed_at: str, optional

ISO-8601 formatted timestamp of when the build has completed

classmethod get_build_info(custom_model_id, custom_model_version_id)

Retrieve information about a custom model version’s dependency build

New in version v2.22.

Parameters:
custom_model_id: str

the id of the custom model

custom_model_version_id: str

the id of the custom model version

Returns:
CustomModelVersionDependencyBuild

the dependency build information

classmethod start_build(custom_model_id, custom_model_version_id, max_wait=600)

Start the dependency build for a custom model version dependency build

New in version v2.22.

Parameters:
custom_model_id: str

the id of the custom model

custom_model_version_id: str

the id of the custom model version

max_wait: int, optional

max time to wait for a build completion. If set to None - method will return without waiting.

get_log()

Get log of a custom model version dependency build.

New in version v2.22.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

cancel()

Cancel custom model version dependency build that is in progress.

New in version v2.22.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

refresh()

Update custom model version dependency build with the latest data from server.

New in version v2.22.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

class datarobot.ExecutionEnvironment(**kwargs)

An execution environment entity.

New in version v2.21.

Attributes:
id: str

the id of the execution environment

name: str

the name of the execution environment

description: str, optional

the description of the execution environment

programming_language: str, optional

the programming language of the execution environment. Can be “python”, “r”, “java” or “other”

is_public: bool, optional

public accessibility of environment, visible only for admin user

created_at: str, optional

ISO-8601 formatted timestamp of when the execution environment version was created

latest_version: ExecutionEnvironmentVersion, optional

the latest version of the execution environment

classmethod create(name, description=None, programming_language=None)

Create an execution environment.

New in version v2.21.

Parameters:
name: str

execution environment name

description: str, optional

execution environment description

programming_language: str, optional

programming language of the environment to be created. Can be “python”, “r”, “java” or “other”. Default value - “other”

Returns:
ExecutionEnvironment

created execution environment

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod list(search_for=None)

List execution environments available to the user.

New in version v2.21.

Parameters:
search_for: str, optional

the string for filtering execution environment - only execution environments that contain the string in name or description will be returned.

Returns:
List[ExecutionEnvironment]

a list of execution environments.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod get(execution_environment_id)

Get execution environment by it’s id.

New in version v2.21.

Parameters:
execution_environment_id: str

ID of the execution environment to retrieve

Returns:
ExecutionEnvironment

retrieved execution environment

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

delete()

Delete execution environment.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

update(name=None, description=None)

Update execution environment properties.

New in version v2.21.

Parameters:
name: str, optional

new execution environment name

description: str, optional

new execution environment description

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

refresh()

Update execution environment with the latest data from server.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

class datarobot.ExecutionEnvironmentVersion(**kwargs)

A version of a DataRobot execution environment.

New in version v2.21.

Attributes:
id: str

the id of the execution environment version

environment_id: str

the id of the execution environment the version belongs to

build_status: str

the status of the execution environment version build

label: str, optional

the label of the execution environment version

description: str, optional

the description of the execution environment version

created_at: str, optional

ISO-8601 formatted timestamp of when the execution environment version was created

classmethod create(execution_environment_id, docker_context_path, label=None, description=None, max_wait=600)

Create an execution environment version.

New in version v2.21.

Parameters:
execution_environment_id: str

the id of the execution environment

docker_context_path: str

the path to a docker context archive or folder

label: str, optional

short human readable string to label the version

description: str, optional

execution environment version description

max_wait: int, optional

max time to wait for a final build status (“success” or “failed”). If set to None - method will return without waiting.

Returns:
ExecutionEnvironmentVersion

created execution environment version

Raises:
datarobot.errors.AsyncTimeoutError

if version did not reach final state during timeout seconds

datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod list(execution_environment_id, build_status=None)

List execution environment versions available to the user.

New in version v2.21.

Parameters:
execution_environment_id: str

the id of the execution environment

build_status: str, optional

build status of the execution environment version to filter by. See datarobot.enums.EXECUTION_ENVIRONMENT_VERSION_BUILD_STATUS for valid options

Returns:
List[ExecutionEnvironmentVersion]

a list of execution environment versions.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod get(execution_environment_id, version_id)

Get execution environment version by id.

New in version v2.21.

Parameters:
execution_environment_id: str

the id of the execution environment

version_id: str

the id of the execution environment version to retrieve

Returns:
ExecutionEnvironmentVersion

retrieved execution environment version

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

download(file_path)

Download execution environment version.

New in version v2.21.

Parameters:
file_path: str

path to create a file with execution environment version content

Returns:
ExecutionEnvironmentVersion

retrieved execution environment version

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

get_build_log()

Get execution environment version build log and error.

New in version v2.21.

Returns:
Tuple[str, str]

retrieved execution environment version build log and error. If there is no build error - None is returned.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

refresh()

Update execution environment version with the latest data from server.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

Database Connectivity

class datarobot.DataDriver(id=None, creator=None, base_names=None, class_name=None, canonical_name=None)

A data driver

Attributes:
id : str

the id of the driver.

class_name : str

the Java class name for the driver.

canonical_name : str

the user-friendly name of the driver.

creator : str

the id of the user who created the driver.

base_names : list of str

a list of the file name(s) of the jar files.

classmethod list()

Returns list of available drivers.

Returns:
drivers : list of DataDriver instances

contains a list of available drivers.

Examples

>>> import datarobot as dr
>>> drivers = dr.DataDriver.list()
>>> drivers
[DataDriver('mysql'), DataDriver('RedShift'), DataDriver('PostgreSQL')]
classmethod get(driver_id)

Gets the driver.

Parameters:
driver_id : str

the identifier of the driver.

Returns:
driver : DataDriver

the required driver.

Examples

>>> import datarobot as dr
>>> driver = dr.DataDriver.get('5ad08a1889453d0001ea7c5c')
>>> driver
DataDriver('PostgreSQL')
classmethod create(class_name, canonical_name, files)

Creates the driver. Only available to admin users.

Parameters:
class_name : str

the Java class name for the driver.

canonical_name : str

the user-friendly name of the driver.

files : list of str

a list of the file paths on file system file_path(s) for the driver.

Returns:
driver : DataDriver

the created driver.

Raises:
ClientError

raised if user is not granted for Can manage JDBC database drivers feature

Examples

>>> import datarobot as dr
>>> driver = dr.DataDriver.create(
...     class_name='org.postgresql.Driver',
...     canonical_name='PostgreSQL',
...     files=['/tmp/postgresql-42.2.2.jar']
... )
>>> driver
DataDriver('PostgreSQL')
update(class_name=None, canonical_name=None)

Updates the driver. Only available to admin users.

Parameters:
class_name : str

the Java class name for the driver.

canonical_name : str

the user-friendly name of the driver.

Raises:
ClientError

raised if user is not granted for Can manage JDBC database drivers feature

Examples

>>> import datarobot as dr
>>> driver = dr.DataDriver.get('5ad08a1889453d0001ea7c5c')
>>> driver.canonical_name
'PostgreSQL'
>>> driver.update(canonical_name='postgres')
>>> driver.canonical_name
'postgres'
delete()

Removes the driver. Only available to admin users.

Raises:
ClientError

raised if user is not granted for Can manage JDBC database drivers feature

class datarobot.DataStore(data_store_id=None, data_store_type=None, canonical_name=None, creator=None, updated=None, params=None, role=None)

A data store. Represents database

Attributes:
id : str

the id of the data store.

data_store_type : str

the type of data store.

canonical_name : str

the user-friendly name of the data store.

creator : str

the id of the user who created the data store.

updated : datetime.datetime

the time of the last update

params : DataStoreParameters

a list specifying data store parameters.

classmethod list()

Returns list of available data stores.

Returns:
data_stores : list of DataStore instances

contains a list of available data stores.

Examples

>>> import datarobot as dr
>>> data_stores = dr.DataStore.list()
>>> data_stores
[DataStore('Demo'), DataStore('Airlines')]
classmethod get(data_store_id)

Gets the data store.

Parameters:
data_store_id : str

the identifier of the data store.

Returns:
data_store : DataStore

the required data store.

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5a8ac90b07a57a0001be501e')
>>> data_store
DataStore('Demo')
classmethod create(data_store_type, canonical_name, driver_id, jdbc_url)

Creates the data store.

Parameters:
data_store_type : str

the type of data store.

canonical_name : str

the user-friendly name of the data store.

driver_id : str

the identifier of the DataDriver.

jdbc_url : str

the full JDBC url, for example jdbc:postgresql://my.dbaddress.org:5432/my_db.

Returns:
data_store : DataStore

the created data store.

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.create(
...     data_store_type='jdbc',
...     canonical_name='Demo DB',
...     driver_id='5a6af02eb15372000117c040',
...     jdbc_url='jdbc:postgresql://my.db.address.org:5432/perftest'
... )
>>> data_store
DataStore('Demo DB')
update(canonical_name=None, driver_id=None, jdbc_url=None)

Updates the data store.

Parameters:
canonical_name : str

optional, the user-friendly name of the data store.

driver_id : str

optional, the identifier of the DataDriver.

jdbc_url : str

optional, the full JDBC url, for example jdbc:postgresql://my.dbaddress.org:5432/my_db.

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae')
>>> data_store
DataStore('Demo DB')
>>> data_store.update(canonical_name='Demo DB updated')
>>> data_store
DataStore('Demo DB updated')
delete()

Removes the DataStore

test(username, password)

Tests database connection.

Parameters:
username : str

the username for database authentication.

password : str

the password for database authentication. The password is encrypted at server side and never saved / stored

Returns:
message : dict

message with status.

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae')
>>> data_store.test(username='db_username', password='db_password')
{'message': 'Connection successful'}
schemas(username, password)

Returns list of available schemas.

Parameters:
username : str

the username for database authentication.

password : str

the password for database authentication. The password is encrypted at server side and never saved / stored

Returns:
response : dict

dict with database name and list of str - available schemas

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae')
>>> data_store.schemas(username='db_username', password='db_password')
{'catalog': 'perftest', 'schemas': ['demo', 'information_schema', 'public']}
tables(username, password, schema=None)

Returns list of available tables in schema.

Parameters:
username : str

optional, the username for database authentication.

password : str

optional, the password for database authentication. The password is encrypted at server side and never saved / stored

schema : str

optional, the schema name.

Returns:
response : dict

dict with catalog name and tables info

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae')
>>> data_store.tables(username='db_username', password='db_password', schema='demo')
{'tables': [{'type': 'TABLE', 'name': 'diagnosis', 'schema': 'demo'}, {'type': 'TABLE',
'name': 'kickcars', 'schema': 'demo'}, {'type': 'TABLE', 'name': 'patient',
'schema': 'demo'}, {'type': 'TABLE', 'name': 'transcript', 'schema': 'demo'}],
'catalog': 'perftest'}
classmethod from_server_data(data, keep_attrs=None)

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters:
data : dict

The directly translated dict of JSON from the server. No casing fixes have taken place

keep_attrs : list

List of the dotted namespace notations for attributes to keep within the object structure even if their values are None

get_access_list()

Retrieve what users have access to this data store

New in version v2.14.

Returns:
list of SharingAccess
share(access_list)

Modify the ability of users to access this data store

New in version v2.14.

Parameters:
access_list : list of SharingAccess

the modifications to make.

Raises:
datarobot.ClientError :

if you do not have permission to share this data store, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the data store without an owner.

Examples

Transfer access to the data store from old_user@datarobot.com to new_user@datarobot.com

import datarobot as dr

new_access = dr.SharingAccess(new_user@datarobot.com,
                              dr.enums.SHARING_ROLE.OWNER, can_share=True)
access_list = [dr.SharingAccess(old_user@datarobot.com, None), new_access]

dr.DataStore.get('my-data-store-id').share(access_list)
class datarobot.DataSource(data_source_id=None, data_source_type=None, canonical_name=None, creator=None, updated=None, params=None, role=None)

A data source. Represents data request

Attributes:
id : str

the id of the data source.

type : str

the type of data source.

canonical_name : str

the user-friendly name of the data source.

creator : str

the id of the user who created the data source.

updated : datetime.datetime

the time of the last update.

params : DataSourceParameters

a list specifying data source parameters.

classmethod list()

Returns list of available data sources.

Returns:
data_sources : list of DataSource instances

contains a list of available data sources.

Examples

>>> import datarobot as dr
>>> data_sources = dr.DataSource.list()
>>> data_sources
[DataSource('Diagnostics'), DataSource('Airlines 100mb'), DataSource('Airlines 10mb')]
classmethod get(data_source_id)

Gets the data source.

Parameters:
data_source_id : str

the identifier of the data source.

Returns:
data_source : DataSource

the requested data source.

Examples

>>> import datarobot as dr
>>> data_source = dr.DataSource.get('5a8ac9ab07a57a0001be501f')
>>> data_source
DataSource('Diagnostics')
classmethod create(data_source_type, canonical_name, params)

Creates the data source.

Parameters:
data_source_type : str

the type of data source.

canonical_name : str

the user-friendly name of the data source.

params : DataSourceParameters

a list specifying data source parameters.

Returns:
data_source : DataSource

the created data source.

Examples

>>> import datarobot as dr
>>> params = dr.DataSourceParameters(
...     data_store_id='5a8ac90b07a57a0001be501e',
...     query='SELECT * FROM airlines10mb WHERE "Year" >= 1995;'
... )
>>> data_source = dr.DataSource.create(
...     data_source_type='jdbc',
...     canonical_name='airlines stats after 1995',
...     params=params
... )
>>> data_source
DataSource('airlines stats after 1995')
update(canonical_name=None, params=None)

Creates the data source.

Parameters:
canonical_name : str

optional, the user-friendly name of the data source.

params : DataSourceParameters

optional, the identifier of the DataDriver.

Examples

>>> import datarobot as dr
>>> data_source = dr.DataSource.get('5ad840cc613b480001570953')
>>> data_source
DataSource('airlines stats after 1995')
>>> params = dr.DataSourceParameters(
...     query='SELECT * FROM airlines10mb WHERE "Year" >= 1990;'
... )
>>> data_source.update(
...     canonical_name='airlines stats after 1990',
...     params=params
... )
>>> data_source
DataSource('airlines stats after 1990')
delete()

Removes the DataSource

classmethod from_server_data(data, keep_attrs=None)

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters:
data : dict

The directly translated dict of JSON from the server. No casing fixes have taken place

keep_attrs : list

List of the dotted namespace notations for attributes to keep within the object structure even if their values are None

get_access_list()

Retrieve what users have access to this data source

New in version v2.14.

Returns:
list of SharingAccess
share(access_list)

Modify the ability of users to access this data source

New in version v2.14.

Parameters:
access_list : list of SharingAccess

the modifications to make.

Raises:
datarobot.ClientError :

if you do not have permission to share this data source, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the data source without an owner

Examples

Transfer access to the data source from old_user@datarobot.com to new_user@datarobot.com

import datarobot as dr

new_access = dr.SharingAccess(new_user@datarobot.com,
                              dr.enums.SHARING_ROLE.OWNER, can_share=True)
access_list = [dr.SharingAccess(old_user@datarobot.com, None), new_access]

dr.DataSource.get('my-data-source-id').share(access_list)
create_dataset(username=None, password=None, do_snapshot=None, persist_data_after_ingestion=None, categories=None, credential_id=None, use_kerberos=None)

Create a Dataset from this data source.

New in version v2.22.

Parameters:
username: string, optional

The username for database authentication.

password: string, optional

The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored.

do_snapshot: bool, optional

If unset, uses the server default: True. If true, creates a snapshot dataset; if false, creates a remote dataset. Creating snapshots from non-file sources requires an additional permission, Enable Create Snapshot Data Source.

persist_data_after_ingestion: bool, optional

If unset, uses the server default: True. If true, will enforce saving all data (for download and sampling) and will allow a user to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.). If false, will not enforce saving data. The data schema (feature names and types) still will be available. Specifying this parameter to false and doSnapshot to true will result in an error.

categories: list[string], optional

An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.

credential_id: string, optional

The ID of the set of credentials to use instead of user and password. Note that with this change, username and password will become optional.

use_kerberos: bool, optional

If unset, uses the server default: False. If true, use kerberos authentication for database authentication.

Returns:
response: Dataset

The Dataset created from the uploaded data

class datarobot.DataSourceParameters(data_store_id=None, table=None, schema=None, partition_column=None, query=None, fetch_size=None)

Data request configuration

Attributes:
data_store_id : str

the id of the DataStore.

table : str

optional, the name of specified database table.

schema : str

optional, the name of the schema associated with the table.

partition_column : str

optional, the name of the partition column.

query : str

optional, the user specified SQL query.

fetch_size : int

optional, a user specified fetch size in the range [1, 20000]. By default a fetchSize will be assigned to balance throughput and memory usage

Datasets

class datarobot.Dataset(dataset_id, version_id, name, categories, created_at, created_by, is_data_engine_eligible, is_latest_version, is_snapshot, processing_state, data_persisted=None, size=None, row_count=None)

Represents a Dataset returned from the api/v2/datasets/ endpoints.

Attributes:
id: string

The ID of this dataset

name: string

The name of this dataset in the catalog

is_latest_version: bool

Whether this dataset version is the latest version of this dataset

version_id: string

The object ID of the catalog_version the dataset belongs to

categories: list(string)

An array of strings describing the intended use of the dataset. The supported options are “TRAINING” and “PREDICTION”.

created_at: string

The date when the dataset was created

created_by: string

Username of the user who created the dataset

is_snapshot: bool

Whether the dataset version is an immutable snapshot of data which has previously been retrieved and saved to Data_robot

data_persisted: bool, optional

If true, user is allowed to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.) and download data. If false, download is not allowed and only the data schema (feature names and types) will be available.

is_data_engine_eligible: bool

Whether this dataset can be a data source of a data engine query.

processing_state: string

Current ingestion process state of the dataset

row_count: int, optional

The number of rows in the dataset.

size: int, optional

The size of the dataset as a CSV in bytes.

classmethod create_from_file(file_path=None, filelike=None, categories=None, read_timeout=600, max_wait=600)

A blocking call that creates a new Dataset from a file. Returns when the dataset has been successfully uploaded and processed.

Warning: This function does not clean up it’s open files. If you pass a filelike, you are responsible for closing it. If you pass a file_path, this will create a file object from the file_path but will not close it.

Parameters:
file_path: string, optional

The path to the file. This will create a file object pointing to that file but will not close it.

filelike: file, optional

An open and readable file object.

categories: list[string], optional

An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.

read_timeout: int, optional

The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete

max_wait : int, optional

Time in seconds after which project creation is considered unsuccessful

Returns:
response: Dataset

A fully armed and operational Dataset

classmethod create_from_in_memory_data(data_frame=None, records=None, categories=None)

A blocking call that creates a new Dataset from in-memory data. Returns when the dataset has been successfully uploaded and processed.

The data can be either a pandas DataFrame or a list of dictionaries with identical keys.

Parameters:
data_frame: DataFrame, optional

The data frame to upload

records: list[dict], optional

A list of dictionaries with identical keys to upload

categories: list[string], optional

An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.

Returns:
response: Dataset

The Dataset created from the uploaded data

classmethod create_from_url(url, do_snapshot=None, persist_data_after_ingestion=None, categories=None)

A blocking call that creates a new Dataset from data stored at a url. Returns when the dataset has been successfully uploaded and processed.

Parameters:
url: string

The URL to use as the source of data for the dataset being created.

do_snapshot: bool, optional

If unset, uses the server default: True. If true, creates a snapshot dataset; if false, creates a remote dataset. Creating snapshots from non-file sources requires an additional permission, Enable Create Snapshot Data Source.

persist_data_after_ingestion: bool, optional

If unset, uses the server default: True. If true, will enforce saving all data (for download and sampling) and will allow a user to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.). If false, will not enforce saving data. The data schema (feature names and types) still will be available. Specifying this parameter to false and doSnapshot to true will result in an error.

categories: list[string], optional

An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.

Returns:
response: Dataset

The Dataset created from the uploaded data

classmethod create_from_data_source(data_source_id, username=None, password=None, do_snapshot=None, persist_data_after_ingestion=None, categories=None, credential_id=None, use_kerberos=None)

A blocking call that creates a new Dataset from data stored at a DataSource. Returns when the dataset has been successfully uploaded and processed.

New in version v2.22.

Parameters:
data_source_id: string

The ID of the DataSource to use as the source of data.

username: string, optional

The username for database authentication.

password: string, optional

The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored.

do_snapshot: bool, optional

If unset, uses the server default: True. If true, creates a snapshot dataset; if false, creates a remote dataset. Creating snapshots from non-file sources requires an additional permission, Enable Create Snapshot Data Source.

persist_data_after_ingestion: bool, optional

If unset, uses the server default: True. If true, will enforce saving all data (for download and sampling) and will allow a user to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.). If false, will not enforce saving data. The data schema (feature names and types) still will be available. Specifying this parameter to false and doSnapshot to true will result in an error.

categories: list[string], optional

An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.

credential_id: string, optional

The ID of the set of credentials to use instead of user and password. Note that with this change, username and password will become optional.

use_kerberos: bool, optional

If unset, uses the server default: False. If true, use kerberos authentication for database authentication.

Returns:
response: Dataset

The Dataset created from the uploaded data

classmethod get(dataset_id)

Get information about a dataset.

Parameters:
dataset_id : string

the id of the dataset

Returns:
dataset : Dataset

the queried dataset

classmethod delete(dataset_id)

Soft deletes a dataset. You cannot get it or list it or do actions with it, except for un-deleting it.

Parameters:
dataset_id: string

The id of the dataset to mark for deletion

Returns:
None
classmethod un_delete(dataset_id)

Un-deletes a previously deleted dataset. If the dataset was not deleted, nothing happens.

Parameters:
dataset_id: string

The id of the dataset to un-delete

Returns:
None
classmethod list(category=None, filter_failed=None, order_by=None)

List all datasets a user can view.

Parameters:
category: string, optional

Optional. If specified, only dataset versions that have the specified category will be included in the results. Categories identify the intended use of the dataset; supported categories are “TRAINING” and “PREDICTION”.

filter_failed: bool, optional

If unset, uses the server default: False. Whether datasets that failed during import should be excluded from the results. If True invalid datasets will be excluded.

order_by: string, optional

If unset, uses the server default: “-created”. Sorting order which will be applied to catalog list, valid options are: - “created” – ascending order by creation datetime; - “-created” – descending order by creation datetime.

Returns:
list[Dataset]

a list of datasets the user can view

classmethod iterate(offset=None, limit=None, category=None, order_by=None, filter_failed=None)

Get an iterator for the requested datasets a user can view. This lazily retrieves results. It does not get the next page from the server until the current page is exhausted.

Parameters:
offset: int, optional

If set, this many results will be skipped

limit: int, optional

Specifies the size of each page retrieved from the server. If unset, uses the server default.

category: string, optional

Optional. If specified, only dataset versions that have the specified category will be included in the results. Categories identify the intended use of the dataset; supported categories are “TRAINING” and “PREDICTION”.

filter_failed: bool, optional

If unset, uses the server default: False. Whether datasets that failed during import should be excluded from the results. If True invalid datasets will be excluded.

order_by: string, optional

If unset, uses the server default: “-created”. Sorting order which will be applied to catalog list, valid options are: - “created” – ascending order by creation datetime; - “-created” – descending order by creation datetime.

Yields:
Dataset

An iterator of the datasets the user can view

update()

Updates the Dataset attributes in place with the latest information from the server.

Returns:
None
modify(name=None, categories=None)

Modifies the Dataset name and/or categories. Updates the object in place.

Parameters:
name: string, optional

The new name of the dataset

categories: list[string], optional

A list of strings describing the intended use of the dataset. The supported options are “TRAINING” and “PREDICTION”. If any categories were previously specified for the dataset, they will be overwritten.

Returns:
None
get_details()

Gets the details for this Dataset

Returns:
DatasetDetails
get_all_features(order_by=None)

Get a list of all the features for this dataset.

Parameters:
order_by: string, optional

If unset, uses the server default: ‘name’. How the features should be ordered. Can be ‘name’ or ‘featureType’.

Returns:
list[DatasetFeature]
iterate_all_features(offset=None, limit=None, order_by=None)

Get an iterator for the requested features of a dataset. This lazily retrieves results. It does not get the next page from the server until the current page is exhausted.

Parameters:
offset: int, optional

If set, this many results will be skipped.

limit: int, optional

Specifies the size of each page retrieved from the server. If unset, uses the server default.

order_by: string, optional

If unset, uses the server default: ‘name’. How the features should be ordered. Can be ‘name’ or ‘featureType’.

Yields:
DatasetFeature
get_featurelists()

Get DatasetFeaturelists created on this Dataset

Returns:
feature_lists: list[DatasetFeaturelist]
create_featurelist(name, features)

Create a new dataset featurelist

Parameters:
name : str

the name of the modeling featurelist to create. Names must be unique within the dataset, or the server will return an error.

features : list of str

the names of the features to include in the dataset featurelist. Each feature must be a dataset feature.

Returns:
featurelist : DatasetFeaturelist

the newly created featurelist

Examples

dataset = Dataset.get('1234deadbeeffeeddead4321')
dataset_features = dataset.get_all_features()
selected_features = [feat.name for feat in dataset_features][:5]  # select first five
new_flist = dataset.create_featurelist('Simple Features', selected_features)
get_file(file_path=None, filelike=None)

Retrieves all the originally uploaded data in CSV form. Writes it to either the file or a filelike object that can write bytes.

Only one of file_path or filelike can be provided and it must be provided as a keyword argument (i.e. file_path=’path-to-write-to’). If a file-like object is provided, the user is responsible for closing it when they are done.

The user must also have permission to download data.

Parameters:
file_path: string, optional

The destination to write the file to.

filelike: file, optional

A file-like object to write to. The object must be able to write bytes. The user is responsible for closing the object

Returns:
None
get_projects()

Retrieves the Dataset’s projects as ProjectLocation named tuples.

Returns:
locations: list[ProjectLocation]
create_project(project_name=None, user=None, password=None, credential_id=None, use_kerberos=None)

Create a datarobot.models.Project from this dataset

Parameters:
project_name: string, optional

The name of the project to be created. If not specified, will be “Untitled Project” for database connections, otherwise the project name will be based on the file used.

user: string, optional

The username for database authentication.

password: string, optional

The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored

credential_id: string, optional

The ID of the set of credentials to use instead of user and password.

use_kerberos: bool, optional

Server default is False. If true, use kerberos authentication for database authentication.

Returns:
Project
class datarobot.DatasetDetails(dataset_id, version_id, categories, created_by, created_at, data_source_type, error, is_latest_version, is_snapshot, is_data_engine_eligible, last_modification_date, last_modifier_full_name, name, uri, data_persisted=None, data_engine_query_id=None, data_source_id=None, description=None, eda1_modification_date=None, eda1_modifier_full_name=None, feature_count=None, feature_count_by_type=None, processing_state=None, row_count=None, size=None, tags=None)

Represents a detailed view of a Dataset. The to_dataset method creates a Dataset from this details view.

Attributes:
dataset_id: string

The ID of this dataset

name: string

The name of this dataset in the catalog

is_latest_version: bool

Whether this dataset version is the latest version of this dataset

version_id: string

The object ID of the catalog_version the dataset belongs to

categories: list(string)

An array of strings describing the intended use of the dataset. The supported options are “TRAINING” and “PREDICTION”.

created_at: string

The date when the dataset was created

created_by: string

Username of the user who created the dataset

is_snapshot: bool

Whether the dataset version is an immutable snapshot of data which has previously been retrieved and saved to Data_robot

data_persisted: bool, optional

If true, user is allowed to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.) and download data. If false, download is not allowed and only the data schema (feature names and types) will be available.

is_data_engine_eligible: bool

Whether this dataset can be a data source of a data engine query.

processing_state: string

Current ingestion process state of the dataset

row_count: int, optional

The number of rows in the dataset.

size: int, optional

The size of the dataset as a CSV in bytes.

data_engine_query_id: string, optional

ID of the source data engine query

data_source_id: string, optional

ID of the datasource used as the source of the dataset

data_source_type: string

the type of the datasource that was used as the source of the dataset

description: string, optional

the description of the dataset

eda1_modification_date: string, optional

the ISO 8601 formatted date and time when the EDA1 for the dataset was updated

eda1_modifier_full_name: string, optional

the user who was the last to update EDA1 for the dataset

error: string

details of exception raised during ingestion process, if any

feature_count: int, optional

total number of features in the dataset

feature_count_by_type: list[FeatureTypeCount]

number of features in the dataset grouped by feature type

last_modification_date: string

the ISO 8601 formatted date and time when the dataset was last modified

last_modifier_full_name: string

full name of user who was the last to modify the dataset

tags: list[string]

list of tags attached to the item

uri: string

the uri to datasource like: - ‘file_name.csv’ - ‘jdbc:DATA_SOURCE_GIVEN_NAME/SCHEMA.TABLE_NAME’ - ‘jdbc:DATA_SOURCE_GIVEN_NAME/<query>’ - for query based datasources - ‘https://s3.amazonaws.com/datarobot_test/kickcars-sample-200.csv’ - etc.

classmethod get(dataset_id)

Get details for a Dataset from the server

Parameters:
dataset_id: str

The id for the Dataset from which to get details

Returns:
DatasetDetails
to_dataset()

Build a Dataset object from the information in this object

Returns:
Dataset

Deployment

class datarobot.Deployment(id=None, label=None, description=None, default_prediction_server=None, model=None, capabilities=None, prediction_usage=None, permissions=None, service_health=None, model_health=None, accuracy_health=None)

A deployment created from a DataRobot model.

Attributes:
id : str

the id of the deployment

label : str

the label of the deployment

description : str

the description of the deployment

default_prediction_server : dict

information on the default prediction server of the deployment

model : dict

information on the model of the deployment

capabilities : dict

information on the capabilities of the deployment

prediction_usage : dict

information on the prediction usage of the deployment

permissions : list

(New in version v2.18) user’s permissions on the deployment

service_health : dict

information on the service health of the deployment

model_health : dict

information on the model health of the deployment

accuracy_health : dict

information on the accuracy health of the deployment

classmethod create_from_learning_model(model_id, label, description=None, default_prediction_server_id=None)

Create a deployment from a DataRobot model.

New in version v2.17.

Parameters:
model_id : str

id of the DataRobot model to deploy

label : str

a human readable label of the deployment

description : str, optional

a human readable description of the deployment

default_prediction_server_id : str, optional

an identifier of a prediction server to be used as the default prediction server

Returns:
deployment : Deployment

The created deployment

Examples

from datarobot import Project, Deployment
project = Project.get('5506fcd38bd88f5953219da0')
model = project.get_models()[0]
deployment = Deployment.create_from_learning_model(model.id, 'New Deployment')
deployment
>>> Deployment('New Deployment')
classmethod create_from_custom_model_image(custom_model_image_id, label, description=None, default_prediction_server_id=None, max_wait=600)

Create a deployment from a DataRobot custom model image.

Parameters:
custom_model_image_id : str

id of the DataRobot custom model image to deploy

label : str

a human readable label of the deployment

description : str, optional

a human readable description of the deployment

default_prediction_server_id : str, optional

an identifier of a prediction server to be used as the default prediction server

max_wait : int, optional

seconds to wait for successful resolution of a deployment creation job. Deployment supports making predictions only after a deployment creating job has successfully finished

Returns:
deployment : Deployment

The created deployment

classmethod create_from_custom_model_version(custom_model_version_id, label, description=None, default_prediction_server_id=None, max_wait=600)

Create a deployment from a DataRobot custom model image.

Parameters:
custom_model_version_id : str

id of the DataRobot custom model version to deploy The version must have a base_environment_id.

label : str

a human readable label of the deployment

description : str, optional

a human readable description of the deployment

default_prediction_server_id : str, optional

an identifier of a prediction server to be used as the default prediction server

max_wait : int, optional

seconds to wait for successful resolution of a deployment creation job. Deployment supports making predictions only after a deployment creating job has successfully finished

Returns:
deployment : Deployment

The created deployment

classmethod list(order_by=None, search=None, filters=None)

List all deployments a user can view.

New in version v2.17.

Parameters:
order_by : str, optional

(New in version v2.18) the order to sort the deployment list by, defaults to label

Allowed attributes to sort by are:

  • label
  • serviceHealth
  • modelHealth
  • accuracyHealth
  • recentPredictions
  • lastPredictionTimestamp

If the sort attribute is preceded by a hyphen, deployments will be sorted in descending order, otherwise in ascending order.

For health related sorting, ascending means failing, warning, passing, unknown.

search : str, optional

(New in version v2.18) case insensitive search against deployment’s label and description.

filters : datarobot.models.deployment.DeploymentListFilters, optional

(New in version v2.20) an object containing all filters that you’d like to apply to the resulting list of deployments. See DeploymentListFilters for details on usage.

Returns:
deployments : list

a list of deployments the user can view

Examples

from datarobot import Deployment
deployments = Deployment.list()
deployments
>>> [Deployment('New Deployment'), Deployment('Previous Deployment')]
from datarobot import Deployment
from datarobot.enums import DEPLOYMENT_SERVICE_HEALTH
filters = DeploymentListFilters(
    role='OWNER',
    service_health=[DEPLOYMENT_SERVICE_HEALTH.FAILING]
)
filtered_deployments = Deployment.list(filters=filters)
filtered_deployments
>>> [Deployment('Deployment I Own w/ Failing Service Health')]
classmethod get(deployment_id)

Get information about a deployment.

New in version v2.17.

Parameters:
deployment_id : str

the id of the deployment

Returns:
deployment : Deployment

the queried deployment

Examples

from datarobot import Deployment
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
deployment.id
>>>'5c939e08962d741e34f609f0'
deployment.label
>>>'New Deployment'
update(label=None, description=None)

Update the label and description of this deployment.

New in version v2.19.

delete()

Delete this deployment.

New in version v2.17.

replace_model(new_model_id, reason, max_wait=600)
Replace the model used in this deployment. To confirm model replacement eligibility, use
validate_replacement_model() beforehand.

New in version v2.17.

Model replacement is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.

Predictions made against this deployment will start using the new model as soon as the initial request is completed. There will be no interruption for predictions throughout the process.

Parameters:
new_model_id : str

The id of the new model to use

reason : MODEL_REPLACEMENT_REASON

The reason for the model replacement. Must be one of ‘ACCURACY’, ‘DATA_DRIFT’, ‘ERRORS’, ‘SCHEDULED_REFRESH’, ‘SCORING_SPEED’, or ‘OTHER’. This value will be stored in the model history to keep track of why a model was replaced

max_wait : int, optional

(new in version 2.22) The maximum time to wait for model replacement job to complete before erroring

Examples

from datarobot import Deployment
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
deployment.model['id'], deployment.model['type']
>>>('5c0a979859b00004ba52e431', 'Decision Tree Classifier (Gini)')

deployment.replace_model('5c0a969859b00004ba52e41b', MODEL_REPLACEMENT_REASON.ACCURACY)
deployment.model['id'], deployment.model['type']
>>>('5c0a969859b00004ba52e41b', 'Support Vector Classifier (Linear Kernel)')
validate_replacement_model(new_model_id)

Validate a model can be used as the replacement model of the deployment.

New in version v2.17.

Parameters:
new_model_id : str

the id of the new model to validate

Returns:
status : str

status of the validation, will be one of ‘passing’, ‘warning’ or ‘failing’. If the status is passing or warning, use replace_model() to perform a model replacement. If the status is failing, refer to checks for more detail on why the new model cannot be used as a replacement.

message : str

message for the validation result

checks : dict

explain why the new model can or cannot replace the deployment’s current model

get_features()

Retrieve the list of features needed to make predictions on this deployment.

Returns:
features: list

a list of feature dict

Notes

Each feature dict contains the following structure:

  • name : str, feature name
  • feature_type : str, feature type
  • importance : float, numeric measure of the relationship strength between the feature and target (independent of model or other features)
  • date_format : str or None, the date format string for how this feature was interpreted, null if not a date feature, compatible with https://docs.python.org/2/library/time.html#time.strftime.
  • known_in_advance : bool, whether the feature was selected as known in advance in a time series model, false for non-time series models.

Examples

from datarobot import Deployment
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
features = deployment.get_features()
features[0]['feature_type']
>>>'Categorical'
features[0]['importance']
>>>0.133
submit_actuals(data, batch_size=10000)

Submit actuals for processing. The actuals submitted will be used to calculate accuracy metrics.

Parameters:
data: list or pandas.DataFrame
batch_size: the max number of actuals in each request
If `data` is a list, each item should be a dict-like object with the following keys and
values; if `data` is a pandas.DataFrame, it should contain the following columns:
- association_id: str, a unique identifier used with a prediction,

max length 128 characters

- actual_value: str or int or float, the actual value of a prediction;

should be numeric for deployments with regression models or string for deployments with classification model

- was_acted_on: bool, optional, indicates if the prediction was acted on in a way that

could have affected the actual outcome

- timestamp: datetime or string in RFC3339 format, optional. If the datetime provided

does not have a timezone, we assume it is UTC.

Raises:
ValueError

if input data is not a list of dict-like objects or a pandas.DataFrame if input data is empty

Examples

from datarobot import Deployment, AccuracyOverTime
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
data = [{
    'association_id': '439917',
    'actual_value': 'True',
    'was_acted_on': True
}]
deployment.submit_actuals(data)
get_drift_tracking_settings()

Retrieve drift tracking settings of this deployment.

New in version v2.17.

Returns:
settings : dict

Drift tracking settings of the deployment containing two nested dicts with key target_drift and feature_drift, which are further described below.

Target drift setting contains:

enabled : bool

If target drift tracking is enabled for this deployment. To create or update existing ‘’target_drift’’ settings, see update_drift_tracking_settings()

Feature drift setting contains:

enabled : bool

If feature drift tracking is enabled for this deployment. To create or update existing ‘’feature_drift’’ settings, see update_drift_tracking_settings()

update_drift_tracking_settings(target_drift_enabled=None, feature_drift_enabled=None, max_wait=600)

Update drift tracking settings of this deployment.

New in version v2.17.

Updating drift tracking setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.

Parameters:
target_drift_enabled : bool, optional

if target drift tracking is to be turned on

feature_drift_enabled : bool, optional

if feature drift tracking is to be turned on

max_wait : int, optional

seconds to wait for successful resolution

get_association_id_settings()

Retrieve association ID setting for this deployment.

New in version v2.19.

Returns:
association_id_settings : dict in the following format:
column_names : list[string], optional

name of the columns to be used as association ID,

required_in_prediction_requests : bool, optional

whether the association ID column is required in prediction requests

update_association_id_settings(column_names=None, required_in_prediction_requests=None, max_wait=600)

Update association ID setting for this deployment.

New in version v2.19.

Parameters:
column_names : list[string], optional

name of the columns to be used as association ID, currently only support a list of one string

required_in_prediction_requests : bool, optional

whether the association ID column is required in prediction requests

max_wait : int, optional

seconds to wait for successful resolution

get_predictions_data_collection_settings()

Retrieve predictions data collection settings of this deployment.

New in version v2.21.

Returns:
predictions_data_collection_settings : dict in the following format:
enabled : bool

If predictions data collection is enabled for this deployment. To update existing ‘’predictions_data_collection’’ settings, see update_predictions_data_collection_settings()

update_predictions_data_collection_settings(enabled, max_wait=600)

Update predictions data collection settings of this deployment.

New in version v2.21.

Updating predictions data collection setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.

Parameters:
enabled: bool

if predictions data collecion is to be turned on

max_wait : int, optional

seconds to wait for successful resolution

get_prediction_warning_settings()

Retrieve prediction warning settings of this deployment.

New in version v2.19.

Returns:
settings : dict in the following format:
enabled : bool

If target prediction_warning is enabled for this deployment. To create or update existing ‘’prediction_warning’’ settings, see update_prediction_warning_settings()

custom_boundaries : dict or None
If None default boundaries for a model are used. Otherwise has following keys:
upper : float

All predictions greater than provided value are considered anomalous

lower : float

All predictions less than provided value are considered anomalous

update_prediction_warning_settings(prediction_warning_enabled, use_default_boundaries=None, lower_boundary=None, upper_boundary=None, max_wait=600)

Update prediction warning settings of this deployment.

New in version v2.19.

Parameters:
prediction_warning_enabled : bool

If prediction warnings should be turned on.

use_default_boundaries : bool, optional

If default boundaries of the model should be used for the deployment.

upper_boundary : float, optional

All predictions greater than provided value will be considered anomalous

lower_boundary : float, optional

All predictions less than provided value will be considered anomalous

max_wait : int, optional

seconds to wait for successful resolution

get_prediction_intervals_settings()

Retrieve prediction intervals settings for this deployment.

New in version v2.19.

Returns:
dict in the following format:
enabled : bool

Whether prediction intervals are enabled for this deployment

percentiles : list[int]

List of enabled prediction intervals sizes for this deployment. Currently we only support one percentile at a time.

Notes

Note that prediction intervals are only supported for time series deployments.

update_prediction_intervals_settings(percentiles, enabled=True, max_wait=600)

Update prediction intervals settings for this deployment.

New in version v2.19.

Parameters:
percentiles : list[int]

The prediction intervals percentiles to enable for this deployment. Currently we only support setting one percentile at a time.

enabled : bool, optional (defaults to True)

Whether to enable showing prediction intervals in the results of predictions requested using this deployment.

max_wait : int, optional

seconds to wait for successful resolution

Raises:
AssertionError

If percentiles is in an invalid format

AsyncFailureError

If any of the responses from the server are unexpected

AsyncProcessUnsuccessfulError

If the prediction intervals calculation job has failed or has been cancelled.

AsyncTimeoutError

If the prediction intervals calculation job did not resolve in time

Notes

Updating prediction intervals settings is an asynchronous process, which means some preparatory work may be performed before the settings request is completed. This function will not return until all work is fully finished.

Note that prediction intervals are only supported for time series deployments.

get_service_stats(model_id=None, start_time=None, end_time=None, execution_time_quantile=None, response_time_quantile=None, slow_requests_threshold=None)

Retrieve value of service stat metrics over a certain time period.

New in version v2.18.

Parameters:
model_id : str, optional

the id of the model

start_time : datetime, optional

start of the time period

end_time : datetime, optional

end of the time period

execution_time_quantile : float, optional

quantile for executionTime, defaults to 0.5

response_time_quantile : float, optional

quantile for responseTime, defaults to 0.5

slow_requests_threshold : float, optional

threshold for slowRequests, defaults to 1000

Returns:
service_stats : ServiceStats

the queried service stats metrics information

get_service_stats_over_time(metric=None, model_id=None, start_time=None, end_time=None, bucket_size=None, quantile=None, threshold=None)

Retrieve information about how a service stat metric changes over a certain time period.

New in version v2.18.

Parameters:
metric : SERVICE_STAT_METRIC, optional

the service stat metric to retrieve

model_id : str, optional

the id of the model

start_time : datetime, optional

start of the time period

end_time : datetime, optional

end of the time period

bucket_size : str, optional

time duration of a bucket, in ISO 8601 time duration format

quantile : float, optional

quantile for ‘executionTime’ or ‘responseTime’, ignored when querying other metrics

threshold : int, optional

threshold for ‘slowQueries’, ignored when querying other metrics

Returns:
service_stats_over_time : ServiceStatsOverTime

the queried service stats metric over time information

get_target_drift(model_id=None, start_time=None, end_time=None, metric=None)

Retrieve target drift information over a certain time period.

New in version v2.21.

Parameters:
model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

metric : str

(New in version v2.22) metric used to calculate the drift score

Returns:
target_drift : TargetDrift

the queried target drift information

get_feature_drift(model_id=None, start_time=None, end_time=None, metric=None)

Retrieve drift information for deployment’s features over a certain time period.

New in version v2.21.

Parameters:
model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

metric : str

(New in version v2.22) metric used to calculate the drift score

Returns:
feature_drift_data : [FeatureDrift]

the queried feature drift information

get_accuracy(model_id=None, start_time=None, end_time=None, start=None, end=None)

Retrieve values of accuracy metrics over a certain time period.

New in version v2.18.

Parameters:
model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

Returns:
accuracy : Accuracy

the queried accuracy metrics information

get_accuracy_over_time(metric=None, model_id=None, start_time=None, end_time=None, bucket_size=None)

Retrieve information about how an accuracy metric changes over a certain time period.

New in version v2.18.

Parameters:
metric : ACCURACY_METRIC

the accuracy metric to retrieve

model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

bucket_size : str

time duration of a bucket, in ISO 8601 time duration format

Returns:
accuracy_over_time : AccuracyOverTime

the queried accuracy metric over time information

class datarobot.models.deployment.DeploymentListFilters(role=None, service_health=None, model_health=None, accuracy_health=None, execution_environment_type=None, importance=None)

Construct a set of filters to pass to Deployment.list()

New in version v2.20.

Parameters:
role : str

A user role. If specified, then take those deployments that the user can view, then filter them down to those that the user has the specified role for, and return only them. Allowed options are OWNER and USER.

service_health : list of str

A list of service health status values. If specified, then only deployments whose service health status is one of these will be returned. See datarobot.enums.DEPLOYMENT_SERVICE_HEALTH_STATUS for allowed values. Supports comma-separated lists.

model_health : list of str

A list of model health status values. If specified, then only deployments whose model health status is one of these will be returned. See datarobot.enums.DEPLOYMENT_MODEL_HEALTH_STATUS for allowed values. Supports comma-separated lists.

accuracy_health : list of str

A list of accuracy health status values. If specified, then only deployments whose accuracy health status is one of these will be returned. See datarobot.enums.DEPLOYMENT_ACCURACY_HEALTH_STATUS for allowed values. Supports comma-separated lists.

execution_environment_type : list of str

A list of strings representing the type of the deployments’ execution environment. If provided, then only return those deployments whose execution environment type is one of those provided. See datarobot.enums.DEPLOYMENT_EXECUTION_ENVIRONMENT_TYPE for allowed values. Supports comma-separated lists.

importance : list of str

A list of strings representing the deployments’ “importance”. If provided, then only return those deployments whose importance is one of those provided. See datarobot.enums.DEPLOYMENT_IMPORTANCE for allowed values. Supports comma-separated lists. Note that Approval Workflows must be enabled for your account to use this filter, otherwise the API will return a 403.

Examples

Multiple filters can be combined in interesting ways to return very specific subsets of deployments.

Performing AND logic

Providing multiple different parameters will result in AND logic between them. For example, the following will return all deployments that I own whose service health status is failing.

from datarobot import Deployment
from datarobot.models.deployment import DeploymentListFilters
from datarobot.enums import DEPLOYMENT_SERVICE_HEALTH
filters = DeploymentListFilters(
    role='OWNER',
    service_health=[DEPLOYMENT_SERVICE_HEALTH.FAILING]
)
deployments = Deployment.list(filters=filters)

Performing OR logic

Some filters support comma-separated lists (and will say so if they do). Providing a comma-separated list of values to a single filter performs OR logic between those values. For example, the following will return all deployments whose service health is either warning OR failing.

from datarobot import Deployment
from datarobot.models.deployment import DeploymentListFilters
from datarobot.enums import DEPLOYMENT_SERVICE_HEALTH
filters = DeploymentListFilters(
    service_health=[
        DEPLOYMENT_SERVICE_HEALTH.WARNING,
        DEPLOYMENT_SERVICE_HEALTH.FAILING,
    ]
)
deployments = Deployment.list(filters=filters)

Performing OR logic across different filter types is not supported.

Note

In all cases, you may only retrieve deployments for which you have at least the USER role for. Deployments for which you are a CONSUMER of will not be returned, regardless of the filters applied.

class datarobot.models.ServiceStats(period=None, metrics=None, model_id=None)

Deployment service stats information.

Attributes:
model_id : str

the model used to retrieve service stats metrics

period : dict

the time period used to retrieve service stats metrics

metrics : dict

the service stats metrics

classmethod get(deployment_id, model_id=None, start_time=None, end_time=None, execution_time_quantile=None, response_time_quantile=None, slow_requests_threshold=None)

Retrieve value of service stat metrics over a certain time period.

New in version v2.18.

Parameters:
deployment_id : str

the id of the deployment

model_id : str, optional

the id of the model

start_time : datetime, optional

start of the time period

end_time : datetime, optional

end of the time period

execution_time_quantile : float, optional

quantile for executionTime, defaults to 0.5

response_time_quantile : float, optional

quantile for responseTime, defaults to 0.5

slow_requests_threshold : float, optional

threshold for slowRequests, defaults to 1000

Returns:
service_stats : ServiceStats

the queried service stats metrics

class datarobot.models.ServiceStatsOverTime(buckets=None, summary=None, metric=None, model_id=None)

Deployment service stats over time information.

Attributes:
model_id : str

the model used to retrieve accuracy metric

metric : str

the service stat metric being retrieved

buckets : dict

how the service stat metric changes over time

summary : dict

summary for the service stat metric

classmethod get(deployment_id, metric=None, model_id=None, start_time=None, end_time=None, bucket_size=None, quantile=None, threshold=None)

Retrieve information about how a service stat metric changes over a certain time period.

New in version v2.18.

Parameters:
deployment_id : str

the id of the deployment

metric : SERVICE_STAT_METRIC, optional

the service stat metric to retrieve

model_id : str, optional

the id of the model

start_time : datetime, optional

start of the time period

end_time : datetime, optional

end of the time period

bucket_size : str, optional

time duration of a bucket, in ISO 8601 time duration format

quantile : float, optional

quantile for ‘executionTime’ or ‘responseTime’, ignored when querying other metrics

threshold : int, optional

threshold for ‘slowQueries’, ignored when querying other metrics

Returns:
service_stats_over_time : ServiceStatsOverTime

the queried service stat over time information

bucket_values

The metric value for all time buckets, keyed by start time of the bucket.

Returns:
bucket_values: OrderedDict
class datarobot.models.TargetDrift(period=None, metric=None, model_id=None, target_name=None, drift_score=None, sample_size=None, baseline_sample_size=None)

Deployment target drift information.

Attributes:
model_id : str

the model used to retrieve target drift metric

period : dict

the time period used to retrieve target drift metric

metric : str

the data drift metric

target_name : str

name of the target

drift_score : float

target drift score

sample_size : int

count of data points for comparison

baseline_sample_size : int

count of data points for baseline

classmethod get(deployment_id, model_id=None, start_time=None, end_time=None, metric=None)

Retrieve target drift information over a certain time period.

New in version v2.21.

Parameters:
deployment_id : str

the id of the deployment

model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

metric : str

(New in version v2.22) metric used to calculate the drift score

Returns:
target_drift : TargetDrift

the queried target drift information

Examples

from datarobot import Deployment, TargetDrift
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
target_drift = TargetDrift.get(deployment.id)
target_drift.period['end']
>>>'2019-08-01 00:00:00+00:00'
target_drift.drift_score
>>>0.03423
accuracy.target_name
>>>'readmitted'
class datarobot.models.FeatureDrift(period=None, metric=None, model_id=None, name=None, drift_score=None, feature_impact=None, sample_size=None, baseline_sample_size=None)

Deployment feature drift information.

Attributes:
model_id : str

the model used to retrieve feature drift metric

period : dict

the time period used to retrieve feature drift metric

metric : str

the data drift metric

name : str

name of the feature

drift_score : float

feature drift score

sample_size : int

count of data points for comparison

baseline_sample_size : int

count of data points for baseline

classmethod list(deployment_id, model_id=None, start_time=None, end_time=None, metric=None)

Retrieve drift information for deployment’s features over a certain time period.

New in version v2.21.

Parameters:
deployment_id : str

the id of the deployment

model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

metric : str

(New in version v2.22) metric used to calculate the drift score

Returns:
feature_drift_data : [FeatureDrift]

the queried feature drift information

Examples

from datarobot import Deployment, TargetDrift
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
feature_drift = FeatureDrift.list(deployment.id)[0]
feature_drift.period
>>>'2019-08-01 00:00:00+00:00'
feature_drift.drift_score
>>>0.252
feature_drift.name
>>>'age'
class datarobot.models.Accuracy(period=None, metrics=None, model_id=None)

Deployment accuracy information.

Attributes:
model_id : str

the model used to retrieve accuracy metrics

period : dict

the time period used to retrieve accuracy metrics

metrics : dict

the accuracy metrics

classmethod get(deployment_id, model_id=None, start_time=None, end_time=None)

Retrieve values of accuracy metrics over a certain time period.

New in version v2.18.

Parameters:
deployment_id : str

the id of the deployment

model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

Returns:
accuracy : Accuracy

the queried accuracy metrics information

Examples

from datarobot import Deployment, Accuracy
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
accuracy = Accuracy.get(deployment.id)
accuracy.period['end']
>>>'2019-08-01 00:00:00+00:00'
accuracy.metric['LogLoss']['value']
>>>0.7533
accuracy.metric_values['LogLoss']
>>>0.7533
metric_values

The value for all metrics, keyed by metric name.

Returns:
metric_values: OrderedDict
metric_baselines

The baseline value for all metrics, keyed by metric name.

Returns:
metric_baselines: OrderedDict
percent_changes

The percent change of value over baseline for all metrics, keyed by metric name.

Returns:
percent_changes: OrderedDict
class datarobot.models.AccuracyOverTime(buckets=None, summary=None, baseline=None, metric=None, model_id=None)

Deployment accuracy over time information.

Attributes:
model_id : str

the model used to retrieve accuracy metric

metric : str

the accuracy metric being retrieved

buckets : dict

how the accuracy metric changes over time

summary : dict

summary for the accuracy metric

baseline : dict

baseline for the accuracy metric

classmethod get(deployment_id, metric=None, model_id=None, start_time=None, end_time=None, bucket_size=None)

Retrieve information about how an accuracy metric changes over a certain time period.

New in version v2.18.

Parameters:
deployment_id : str

the id of the deployment

metric : ACCURACY_METRIC

the accuracy metric to retrieve

model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

bucket_size : str

time duration of a bucket, in ISO 8601 time duration format

Returns:
accuracy_over_time : AccuracyOverTime

the queried accuracy metric over time information

Examples

from datarobot import Deployment, AccuracyOverTime
from datarobot.enums import ACCURACY_METRICS
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
accuracy_over_time = AccuracyOverTime.get(deployment.id, metric=ACCURACY_METRIC.LOGLOSS)
accuracy_over_time.metric
>>>'LogLoss'
accuracy_over_time.metric_values
>>>{datetime.datetime(2019, 8, 1): 0.73, datetime.datetime(2019, 8, 2): 0.55}
classmethod get_as_dataframe(deployment_id, metrics, model_id=None, start_time=None, end_time=None, bucket_size=None)

Retrieve information about how a list of accuracy metrics change over a certain time period as pandas DataFrame.

In the returned DataFrame, the columns corresponds to the metrics being retrieved; the rows are labeled with the start time of each bucket.

Parameters:
deployment_id : str

the id of the deployment

metrics : [ACCURACY_METRIC]

the accuracy metrics to retrieve

model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

bucket_size : str

time duration of a bucket, in ISO 8601 time duration format

Returns:
accuracy_over_time: pd.DataFrame
bucket_values

The metric value for all time buckets, keyed by start time of the bucket.

Returns:
bucket_values: OrderedDict
bucket_sample_sizes

The sample size for all time buckets, keyed by start time of the bucket.

Returns:
bucket_sample_sizes: OrderedDict

External Scores and Insights

class datarobot.ExternalScores(project_id, scores, model_id=None, dataset_id=None, actual_value_column=None)

Metric scores on prediction dataset with target or actual value column in unsupervised case. Contains project metrics for supervised and special classification metrics set for unsupervised projects.

New in version v2.21.

Examples

List all scores for a dataset

import datarobot as dr
scores = dr.Scores.list(project_id, dataset_id=dataset_id)
Attributes:
project_id: str

id of the project the model belongs to

model_id: str

id of the model

dataset_id: str

id of the prediction dataset with target or actual value column for unsupervised case

actual_value_column: str, optional

For unsupervised projects only. Actual value column which was used to calculate the classification metrics and insights on the prediction dataset.

scores: list of dicts in a form of {‘label’: metric_name, ‘value’: score}

Scores on the dataset.

classmethod create(project_id, model_id, dataset_id, actual_value_column=None)

Compute an external dataset insights for the specified model.

Parameters:
project_id : str

id of the project the model belongs to

model_id : str

id of the model for which insights is requested

dataset_id : str

id of the dataset for which insights is requested

actual_value_column : str, optional

actual values column label, for unsupervised projects only

Returns:
job : Job

an instance of created async job

classmethod list(project_id, model_id=None, dataset_id=None, offset=0, limit=100)

Fetch external scores list for the project and optionally for model and dataset.

Parameters:
project_id: str

id of the project

model_id: str, optional

if specified, only scores for this model will be retrieved

dataset_id: str, optional

if specified, only scores for this dataset will be retrieved

offset: int, optional

this many results will be skipped, default: 0

limit: int, optional

at most this many results are returned, default: 100, max 1000. To return all results, specify 0

Returns:
A list of External Scores objects
classmethod get(project_id, model_id, dataset_id)

Retrieve external scores for the project, model and dataset.

Parameters:
project_id: str

id of the project

model_id: str

if specified, only scores for this model will be retrieved

dataset_id: str

if specified, only scores for this dataset will be retrieved

Returns:
External Scores object
class datarobot.ExternalLiftChart(dataset_id, bins)

Lift chart for the model and prediction dataset with target or actual value column in unsupervised case.

New in version v2.21.

LiftChartBin is a dict containing the following:

  • actual (float) Sum of actual target values in bin
  • predicted (float) Sum of predicted target values in bin
  • bin_weight (float) The weight of the bin. For weighted projects, it is the sum of the weights of the rows in the bin. For unweighted projects, it is the number of rows in the bin.
Attributes:
dataset_id: str

id of the prediction dataset with target or actual value column for unsupervised case

bins: list of dict

List of dicts with schema described as LiftChartBin above.

classmethod list(project_id, model_id, dataset_id=None, offset=0, limit=100)

Retrieve list of the lift charts for the model.

Parameters:
project_id: str

id of the project

model_id: str

if specified, only lift chart for this model will be retrieved

dataset_id: str, optional

if specified, only lift chart for this dataset will be retrieved

offset: int, optional

this many results will be skipped, default: 0

limit: int, optional

at most this many results are returned, default: 100, max 1000. To return all results, specify 0

Returns:
A list of ExternalLiftChart objects
classmethod get(project_id, model_id, dataset_id)

Retrieve lift chart for the model and prediction dataset.

Parameters:
project_id: str

project id

model_id: str

model id

dataset_id: str

prediction dataset id with target or actual value column for unsupervised case

Returns:
ExternalLiftChart object
class datarobot.ExternalRocCurve(dataset_id, roc_points, negative_class_predictions, positive_class_predictions)

ROC curve data for the model and prediction dataset with target or actual value column in unsupervised case.

New in version v2.21.

Attributes:
dataset_id: str

id of the prediction dataset with target or actual value column for unsupervised case

roc_points: list of dict

List of precalculated metrics associated with thresholds for ROC curve.

negative_class_predictions: list of float

List of predictions from example for negative class

positive_class_predictions: list of float

List of predictions from example for positive class

classmethod list(project_id, model_id, dataset_id=None, offset=0, limit=100)

Retrieve list of the roc curves for the model.

Parameters:
project_id: str

id of the project

model_id: str

if specified, only lift chart for this model will be retrieved

dataset_id: str, optional

if specified, only lift chart for this dataset will be retrieved

offset: int, optional

this many results will be skipped, default: 0

limit: int, optional

at most this many results are returned, default: 100, max 1000. To return all results, specify 0

Returns:
A list of ExternalRocCurve objects
classmethod get(project_id, model_id, dataset_id)

Retrieve ROC curve chart for the model and prediction dataset.

Parameters:
project_id: str

project id

model_id: str

model id

dataset_id: str

prediction dataset id with target or actual value column for unsupervised case

Returns:
ExternalRocCurve object

Feature

class datarobot.models.Feature(id, project_id=None, name=None, feature_type=None, importance=None, low_information=None, unique_count=None, na_count=None, date_format=None, min=None, max=None, mean=None, median=None, std_dev=None, time_series_eligible=None, time_series_eligibility_reason=None, time_step=None, time_unit=None, target_leakage=None, key_summary=None)

A feature from a project’s dataset

These are features either included in the originally uploaded dataset or added to it via feature transformations. In time series projects, these will be distinct from the ModelingFeature s created during partitioning; otherwise, they will correspond to the same features. For more information about input and modeling features, see the time series documentation.

The min, max, mean, median, and std_dev attributes provide information about the distribution of the feature in the EDA sample data. For non-numeric features or features created prior to these summary statistics becoming available, they will be None. For features where the summary statistics are available, they will be in a format compatible with the data type, i.e. date type features will have their summary statistics expressed as ISO-8601 formatted date strings.

Attributes:
id : int

the id for the feature - note that name is used to reference the feature instead of id

project_id : str

the id of the project the feature belongs to

name : str

the name of the feature

feature_type : str

the type of the feature, e.g. ‘Categorical’, ‘Text’

importance : float or None

numeric measure of the strength of relationship between the feature and target (independent of any model or other features); may be None for non-modeling features such as partition columns

low_information : bool

whether a feature is considered too uninformative for modeling (e.g. because it has too few values)

unique_count : int

number of unique values

na_count : int or None

number of missing values

date_format : str or None

For Date features, the date format string for how this feature was interpreted, compatible with https://docs.python.org/2/library/time.html#time.strftime . For other feature types, None.

min : str, int, float, or None

The minimum value of the source data in the EDA sample

max : str, int, float, or None

The maximum value of the source data in the EDA sample

mean : str, int, or, float

The arithmetic mean of the source data in the EDA sample

median : str, int, float, or None

The median of the source data in the EDA sample

std_dev : str, int, float, or None

The standard deviation of the source data in the EDA sample

time_series_eligible : bool

Whether this feature can be used as the datetime partition column in a time series project.

time_series_eligibility_reason : str

Why the feature is ineligible for the datetime partition column in a time series project, or ‘suitable’ when it is eligible.

time_step : int or None

For time series eligible features, a positive integer determining the interval at which windows can be specified. If used as the datetime partition column on a time series project, the feature derivation and forecast windows must start and end at an integer multiple of this value. None for features that are not time series eligible.

time_unit : str or None

For time series eligible features, the time unit covered by a single time step, e.g. ‘HOUR’, or None for features that are not time series eligible.

target_leakage : str

Whether a feature is considered to have target leakage or not. A value of ‘SKIPPED_DETECTION’ indicates that target leakage detection was not run on the feature. ‘FALSE’ indicates no leakage, ‘MODERATE’ indicates a moderate risk of target leakage, and ‘HIGH_RISK’ indicates a high risk of target leakage

key_summary: list of dict

Statistics for top 50 keys (truncated to 103 characters) of Summarized Categorical column example:

{‘key’:’DataRobot’, ‘summary’:{‘min’:0, ‘max’:29815.0, ‘stdDev’:6498.029, ‘mean’:1490.75, ‘median’:0.0, ‘pctRows’:5.0}}

where,
key: string or None

name of the key

summary: dict

statistics of the key

max: maximum value of the key. min: minimum value of the key. mean: mean value of the key. median: median value of the key. stdDev: standard deviation of the key. pctRows: percentage occurrence of key in the EDA sample of the feature.

classmethod get(project_id, feature_name)

Retrieve a single feature

Parameters:
project_id : str

The ID of the project the feature is associated with.

feature_name : str

The name of the feature to retrieve

Returns:
feature : Feature

The queried instance

get_multiseries_properties(multiseries_id_columns, max_wait=600)

Retrieve time series properties for a potential multiseries datetime partition column

Multiseries time series projects use multiseries id columns to model multiple distinct series within a single project. This function returns the time series properties (time step and time unit) of this column if it were used as a datetime partition column with the specified multiseries id columns, running multiseries detection automatically if it had not previously been successfully ran.

Parameters:
multiseries_id_columns : list of str

the name(s) of the multiseries id columns to use with this datetime partition column. Currently only one multiseries id column is supported.

max_wait : int, optional

if a multiseries detection task is run, the maximum amount of time to wait for it to complete before giving up

Returns:
properties : dict

A dict with three keys:

  • time_series_eligible : bool, whether the column can be used as a partition column
  • time_unit : str or null, the inferred time unit if used as a partition column
  • time_step : int or null, the inferred time step if used as a partition column
get_cross_series_properties(datetime_partition_column, cross_series_group_by_columns, max_wait=600)

Retrieve cross-series properties for multiseries ID column.

This function returns the cross-series properties (eligibility as group-by column) of this column if it were used with specified datetime partition column and with current multiseries id column, running cross-series group-by validation automatically if it had not previously been successfully ran.

Parameters:
datetime_partition_column : datetime partition column
cross_series_group_by_columns : list of str

the name(s) of the columns to use with this multiseries ID column. Currently only one cross-series group-by column is supported.

max_wait : int, optional

if a multiseries detection task is run, the maximum amount of time to wait for it to complete before giving up

Returns:
properties : dict

A dict with three keys:

  • name : str, column name
  • eligibility : str, reason for column eligibility
  • isEligible : bool, is column eligible as cross-series group-by
class datarobot.models.ModelingFeature(project_id=None, name=None, feature_type=None, importance=None, low_information=None, unique_count=None, na_count=None, date_format=None, min=None, max=None, mean=None, median=None, std_dev=None, parent_feature_names=None, key_summary=None)

A feature used for modeling

In time series projects, a new set of modeling features is created after setting the partitioning options. These features are automatically derived from those in the project’s dataset and are the features used for modeling. Modeling features are only accessible once the target and partitioning options have been set. In projects that don’t use time series modeling, once the target has been set, ModelingFeatures and Features will behave the same.

For more information about input and modeling features, see the time series documentation.

As with the Feature object, the min, max, `mean, median, and std_dev attributes provide information about the distribution of the feature in the EDA sample data. For non-numeric features, they will be None. For features where the summary statistics are available, they will be in a format compatible with the data type, i.e. date type features will have their summary statistics expressed as ISO-8601 formatted date strings.

Attributes:
project_id : str

the id of the project the feature belongs to

name : str

the name of the feature

feature_type : str

the type of the feature, e.g. ‘Categorical’, ‘Text’

importance : float or None

numeric measure of the strength of relationship between the feature and target (independent of any model or other features); may be None for non-modeling features such as partition columns

low_information : bool

whether a feature is considered too uninformative for modeling (e.g. because it has too few values)

unique_count : int

number of unique values

na_count : int or None

number of missing values

date_format : str or None

For Date features, the date format string for how this feature was interpreted, compatible with https://docs.python.org/2/library/time.html#time.strftime . For other feature types, None.

min : str, int, float, or None

The minimum value of the source data in the EDA sample

max : str, int, float, or None

The maximum value of the source data in the EDA sample

mean : str, int, or, float

The arithmetic mean of the source data in the EDA sample

median : str, int, float, or None

The median of the source data in the EDA sample

std_dev : str, int, float, or None

The standard deviation of the source data in the EDA sample

parent_feature_names : list of str

A list of the names of input features used to derive this modeling feature. In cases where the input features and modeling features are the same, this will simply contain the feature’s name. Note that if a derived feature was used to create this modeling feature, the values here will not necessarily correspond to the features that must be supplied at prediction time.

key_summary: list of dict

Statistics for top 50 keys (truncated to 103 characters) of Summarized Categorical column example:

{‘key’:’DataRobot’, ‘summary’:{‘min’:0, ‘max’:29815.0, ‘stdDev’:6498.029, ‘mean’:1490.75, ‘median’:0.0, ‘pctRows’:5.0}}

where,
key: string or None

name of the key

summary: dict

statistics of the key

max: maximum value of the key. min: minimum value of the key. mean: mean value of the key. median: median value of the key. stdDev: standard deviation of the key. pctRows: percentage occurrence of key in the EDA sample of the feature.

classmethod get(project_id, feature_name)

Retrieve a single modeling feature

Parameters:
project_id : str

The ID of the project the feature is associated with.

feature_name : str

The name of the feature to retrieve

Returns:
feature : ModelingFeature

The requested feature

class datarobot.models.DatasetFeature(id_, dataset_id=None, dataset_version_id=None, name=None, feature_type=None, low_information=None, unique_count=None, na_count=None, date_format=None, min_=None, max_=None, mean=None, median=None, std_dev=None, time_series_eligible=None, time_series_eligibility_reason=None, time_step=None, time_unit=None, target_leakage=None, target_leakage_reason=None)

A feature from a project’s dataset

These are features either included in the originally uploaded dataset or added to it via feature transformations.

The min, max, mean, median, and std_dev attributes provide information about the distribution of the feature in the EDA sample data. For non-numeric features or features created prior to these summary statistics becoming available, they will be None. For features where the summary statistics are available, they will be in a format compatible with the data type, i.e. date type features will have their summary statistics expressed as ISO-8601 formatted date strings.

Attributes:
id : int

the id for the feature - note that name is used to reference the feature instead of id

dataset_id : str

the id of the dataset the feature belongs to

dataset_version_id : str

the id of the dataset version the feature belongs to

name : str

the name of the feature

feature_type : str, optional

the type of the feature, e.g. ‘Categorical’, ‘Text’

low_information : bool, optional

whether a feature is considered too uninformative for modeling (e.g. because it has too few values)

unique_count : int, optional

number of unique values

na_count : int, optional

number of missing values

date_format : str, optional

For Date features, the date format string for how this feature was interpreted, compatible with https://docs.python.org/2/library/time.html#time.strftime . For other feature types, None.

min : str, int, float, optional

The minimum value of the source data in the EDA sample

max : str, int, float, optional

The maximum value of the source data in the EDA sample

mean : str, int, float, optional

The arithmetic mean of the source data in the EDA sample

median : str, int, float, optional

The median of the source data in the EDA sample

std_dev : str, int, float, optional

The standard deviation of the source data in the EDA sample

time_series_eligible : bool, optional

Whether this feature can be used as the datetime partition column in a time series project.

time_series_eligibility_reason : str, optional

Why the feature is ineligible for the datetime partition column in a time series project, or ‘suitable’ when it is eligible.

time_step : int, optional

For time series eligible features, a positive integer determining the interval at which windows can be specified. If used as the datetime partition column on a time series project, the feature derivation and forecast windows must start and end at an integer multiple of this value. None for features that are not time series eligible.

time_unit : str, optional

For time series eligible features, the time unit covered by a single time step, e.g. ‘HOUR’, or None for features that are not time series eligible.

target_leakage : str, optional

Whether a feature is considered to have target leakage or not. A value of ‘SKIPPED_DETECTION’ indicates that target leakage detection was not run on the feature. ‘FALSE’ indicates no leakage, ‘MODERATE’ indicates a moderate risk of target leakage, and ‘HIGH_RISK’ indicates a high risk of target leakage

target_leakage_reason: string, optional

The descriptive text explaining the reason for target leakage, if any.

get_histogram(bin_limit=None)

Retrieve a feature histogram

Parameters:
bin_limit : int or None

Desired max number of histogram bins. If omitted, by default endpoint will use 60.

Returns:
featureHistogram : DatasetFeatureHistogram

The requested histogram with desired number or bins

class datarobot.models.DatasetFeatureHistogram(plot)
classmethod get(dataset_id, feature_name, bin_limit=None, key_name=None)

Retrieve a single feature histogram

Parameters:
dataset_id : str

The ID of the Dataset the feature is associated with.

feature_name : str

The name of the feature to retrieve

bin_limit : int or None

Desired max number of histogram bins. If omitted, by default the endpoint will use 60.

key_name: string or None

(Only required for summarized categorical feature) Name of the top 50 keys for which plot to be retrieved

Returns:
featureHistogram : FeatureHistogram

The queried instance with plot attribute in it.

class datarobot.models.FeatureHistogram(plot)
classmethod get(project_id, feature_name, bin_limit=None, key_name=None)

Retrieve a single feature histogram

Parameters:
project_id : str

The ID of the project the feature is associated with.

feature_name : str

The name of the feature to retrieve

bin_limit : int or None

Desired max number of histogram bins. If omitted, by default endpoint will use 60.

key_name: string or None

(Only required for summarized categorical feature) Name of the top 50 keys for which plot to be retrieved

Returns:
featureHistogram : FeatureHistogram

The queried instance with plot attribute in it.

class datarobot.models.InteractionFeature(rows, source_columns, bars, bubbles)

Interaction feature data

New in version v2.21.

Attributes:
rows: int

Total number of rows

source_columns: list(str)

names of two categorical features which were combined into this one

bars: list(dict)

dictionaries representing frequencies of each independent value from the source columns

bubbles: list(dict)

dictionaries representing frequencies of each combined value in the interaction feature.

classmethod get(project_id, feature_name)

Retrieve a single Interaction feature

Parameters:
project_id : str

The id of the project the feature belongs to

feature_name : str

The name of the Interaction feature to retrieve

Returns:
feature : InteractionFeature

The queried instance

Feature Engineering

class datarobot.models.FeatureEngineeringGraph(id=None, name=None, description=None, created=None, last_modified=None, creator_full_name=None, modifier_full_name=None, creator_user_id=None, last_modified_user_id=None, number_of_projects=None, linkage_keys=None, table_definitions=None, relationships=None, time_unit=None, feature_derivation_window_start=None, feature_derivation_window_end=None, is_draft=True)

A Feature Engineering Graph for the Project. A Feature Engineering Graph is graph which allow to specify relationships between two or more tables so it can automatically generate features from that

Attributes:
id : str

the id of the created feature engineering graph

name: str

name of the feature engineering graph

description: str

description of the feature engineering graph

created: datetime.datetime

creation date of the feature engineering graph

creator_user_id: str

id of the user who created the feature engineering graph

creator_full_name: str

full name of the user who created the feature engineering graph

last_modified: datetime.datetime

last modification date of the feature engineering graph

last_modified_user_id: str

id of the user who last modified the feature engineering graph

modifier_full_name: str

full name of the user who last modified the feature engineering graph

number_of_projects: int

number of projects that are used in the feature engineering graph

linkage_keys: list os str

a list of strings specifying the name of the columns that link the feature engineering graph with the primary table.

table_definitions: list

each element is a table_definition for a table.

relationships: list

each element is a relationship between two tables

time_unit: str, or None

time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR. If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_start: int, or None

how many time_units of each table’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_end: int, or None

how many timeUnits of each table’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.

is_draft: bool (default=True)

a draft (is_draft=True) feature engineering graph can be updated, while a non-draft(is_draft=False) feature engineering graph is immutable

The `table_defintions` structure is
identifier: str

alias of the table (used directly as part of the generated feature names)

catalog_id: str, or None

identifier of the catalog item

catalog_version_id: str

identifier of the catalog item version

feature_list_id: str, or None

identifier of the feature list. This decides which columns in the table are used for feature generation

primary_temporal_key: str, or None

name of the column indicating time of record creation

snapshot_policy: str

policy to use when creating a project or making predictions. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)

feature_lists: list

list of feature list info

data_source: dict

data source info if the table is from data source

is_deleted: bool or None

whether the table is deleted or not

The `relationship` structure is
table1_identifier: str or None

identifier of the first table in this relationship. This is specified in the indentifier field of table_definition structure. If None, then the relationship is with the primary dataset.

table2_identifier: str

identifier of the second table in this relationship. This is specified in the identifier field of table_definition schema.

table1_keys: list of str (max length: 10 min length: 1)

column(s) from the first table which are used to join to the second table

table2_keys: list of str (max length: 10 min length: 1)

column(s) from the second table that are used to join to the first table

The `feature list info` structure is
id : str

the id of the featurelist

name : str

the name of the featurelist

features : list of str

the names of all the Features in the featurelist

dataset_id : str

the project the featurelist belongs to

creation_date : datetime.datetime

when the featurelist was created

user_created : bool

whether the featurelist was created by a user or by DataRobot automation

created_by: str

the name of user who created it

description : str

the description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.

dataset_id: str

dataset which is associated with the feature list

dataset_version_id: str or None

version of the dataset which is associated with feature list. Only relevant for Informative features

The `data source info` structured is
data_store_id: str

the id of the data store.

data_store_name : str

the user-friendly name of the data store.

url : str

the url used to connect to the data store.

dbtable : str

the name of table from the data store.

schema: str

schema definition of the table from the data store

classmethod create(name, description, table_definitions, relationships, time_unit=None, feature_derivation_window_start=None, feature_derivation_window_end=None, is_draft=True)

Create a feature engineering graph.

Parameters:
name : str

the name of the feature engineering graph

description : str

the description of the feature engineering graph

table_definitions: list of dict

each element is a TableDefinition for a table. The TableDefinition schema is

identifier: str

alias of the table (used directly as part of the generated feature names)

catalog_id: str, or None

identifier of the catalog item

catalog_version_id: str

identifier of the catalog item version

feature_list_id: str, or None

identifier of the feature list. This decides which columns in the table are used for feature generation

primary_temporal_key: str, or None

name of the column indicating time of record creation

snapshot_policy: str

policy to use when creating a project or making predictions. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)

relationships: list of dict

each element is a Relationship between two tables The Relationship schema is

table1_identifier: str or None

identifier of the first table in this relationship. This is specified in the indentifier field of table_definition structure. If None, then the relationship is with the primary dataset.

table2_identifier: str

identifier of the second table in this relationship. This is specified in the identifier field of table_definition schema.

table1_keys: list of str (max length: 10 min length: 1)

column(s) from the first table which are used to join to the second table

table2_keys: list of str (max length: 10 min length: 1)

column(s) from the second table that are used to join to the first table

time_unit: str, or None

time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR. If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_start: int, or None

how many time_units of each table’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_end: int, or None

how many timeUnits of each table’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.

is_draft: bool (default=True)

a draft (is_draft=True) feature engineering graph can be updated, while a non-draft(is_draft=False) feature engineering graph is immutable

Returns:
feature_engineering_graphs: FeatureEngineeringGraph

the created feature engineering graph

replace(id, name, description, table_definitions, relationships, time_unit=None, feature_derivation_window_start=None, feature_derivation_window_end=None, is_draft=True)

Replace a feature engineering graph.

Parameters:
id : str

the id of the created feature engineering graph

name : str

the name of the feature engineering graph

description : str

the description of the feature engineering graph

items: list of dict

each element is a TableDefinition for a table. The TableDefinition schema is

identifier: str

alias of the table (used directly as part of the generated feature names)

catalog_id: str, or None

identifier of the catalog item

catalog_version_id: str

identifier of the catalog item version

feature_list_id: str, or None

identifier of the feature list. This decides which columns in the table are used for feature generation

primary_temporal_key: str, or None

name of the column indicating time of record creation

snapshot_policy: str

policy to use when creating a project or making predictions. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)

relationships: list of dict

each element is a Relationship between two tables The Relationship schema is

table1_identifier: str or None

identifier of the first table in this relationship. This is specified in the indentifier field of table_definition structure. If None, then the relationship is with the primary dataset.

table2_identifier: str

identifier of the second table in this relationship. This is specified in the identifier field of table_definition schema.

table1_keys: list of str (max length: 10 min length: 1)

column(s) from the first table which are used to join to the second table

table2_keys: list of str (max length: 10 min length: 1)

column(s) from the second table that are used to join to the first table

time_unit: str, or None

time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR. If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_start: int, or None

how many time_units of each table’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_end: int, or None

how many timeUnits of each table’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.

is_draft: bool (default=True)

a draft (is_draft=True) feature engineering graph can be updated, while a non-draft(is_draft=False) feature engineering graph is immutable

Returns:
feature_engineering_graphs: FeatureEngineeringGraph

the updated feature engineering graph

update(name, description)

Update the Feature engineering graph name and description.

Parameters:
name : str

the name of the feature engineering graph

description : str

the description of the feature engineering graph

classmethod get(feature_engineering_graph_id)

Retrieve a single feature engineering graph

Parameters:
feature_engineering_graph_id : str

The ID of the feature engineering graph to retrieve.

Returns:
feature_engineering_graph : FeatureEngineeringGraph

The requested feature engineering graph

classmethod list(project_id=None, secondary_dataset_id=None, include_drafts=None)

Returns list of feature engineering graphs.

Parameters:
project_id: str, optional

The Id of project to filter the feature engineering graph list for returning only those feature engineering Graphs which are related to this project If not specified, it will return all the feature engineering graphs irrespective of the project

secondary_dataset_id: str, optional

ID of the dataset to filter feature engineering graphs which use the dataset as the secondary dataset If not specified, return all the feature engineering graphs without filtering on secondary dataset id.

include_drafts: bool (default=False)

include draft feature engineering graphs If True, return all the draft (mutable) as well as non-draft (immutable) feature engineering graphs

Returns:
feature_engineering_graphs : list of FeatureEngineeringGraph instances

a list of available feature engineering graphs.

delete()

Delete the Feature Engineering Graph

share(access_list)

Modify the ability of users to access this feature engineering graph

Parameters:
access_list : list of SharingAccess

the modifications to make.

Raises:
datarobot.ClientError :

if you do not have permission to share this feature engineering graph or if the user you’re sharing with doesn’t exist

get_access_list()

Retrieve what users have access to this feature engineering graph

Returns:
list of SharingAccess

Feature List

class datarobot.DatasetFeaturelist(id=None, name=None, features=None, dataset_id=None, dataset_version_id=None, creation_date=None, created_by=None, user_created=None, description=None)

A set of features attached to a dataset in the AI Catalog

Attributes:
id : str

the id of the dataset featurelist

dataset_id : str

the id of the dataset the featurelist belongs to

dataset_version_id: str, optional

the version id of the dataset this featurelist belongs to

name : str

the name of the dataset featurelist

features : list of str

a list of the names of features included in this dataset featurelist

creation_date : datetime.datetime

when the featurelist was created

created_by : str

the user name of the user who created this featurelist

user_created : bool

whether the featurelist was created by a user or by DataRobot automation

description : basestring, optional

the description of the featurelist. Only present on DataRobot-created featurelists.

classmethod get(dataset_id, featurelist_id)

Retrieve a dataset featurelist

Parameters:
dataset_id : str

the id of the dataset the featurelist belongs to

featurelist_id : str

the id of the dataset featurelist to retrieve

Returns:
featurelist : DatasetFeatureList

the specified featurelist

delete()

Delete a dataset featurelist

Featurelists configured into the dataset as a default featurelist cannot be deleted.

update(name=None)

Update the name of an existing featurelist

Note that only user-created featurelists can be renamed, and that names must not conflict with names used by other featurelists.

Parameters:
name : str, optional

the new name for the featurelist

class datarobot.models.Featurelist(id=None, name=None, features=None, project_id=None, created=None, is_user_created=None, num_models=None, description=None)

A set of features used in modeling

Attributes:
id : str

the id of the featurelist

name : str

the name of the featurelist

features : list of str

the names of all the Features in the featurelist

project_id : str

the project the featurelist belongs to

created : datetime.datetime

(New in version v2.13) when the featurelist was created

is_user_created : bool

(New in version v2.13) whether the featurelist was created by a user or by DataRobot automation

num_models : int

(New in version v2.13) the number of models currently using this featurelist. A model is considered to use a featurelist if it is used to train the model or as a monotonic constraint featurelist, or if the model is a blender with at least one component model using the featurelist.

description : basestring

(New in version v2.13) the description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.

classmethod get(project_id, featurelist_id)

Retrieve a known feature list

Parameters:
project_id : str

The id of the project the featurelist is associated with

featurelist_id : str

The ID of the featurelist to retrieve

Returns:
featurelist : Featurelist

The queried instance

delete(dry_run=False, delete_dependencies=False)

Delete a featurelist, and any models and jobs using it

All models using a featurelist, whether as the training featurelist or as a monotonic constraint featurelist, will also be deleted when the deletion is executed and any queued or running jobs using it will be cancelled. Similarly, predictions made on these models will also be deleted. All the entities that are to be deleted with a featurelist are described as “dependencies” of it. To preview the results of deleting a featurelist, call delete with dry_run=True

When deleting a featurelist with dependencies, users must specify delete_dependencies=True to confirm they want to delete the featurelist and all its dependencies. Without that option, only featurelists with no dependencies may be successfully deleted and others will error.

Featurelists configured into the project as a default featurelist or as a default monotonic constraint featurelist cannot be deleted.

Featurelists used in a model deployment cannot be deleted until the model deployment is deleted.

Parameters:
dry_run : bool, optional

specify True to preview the result of deleting the featurelist, instead of actually deleting it.

delete_dependencies : bool, optional

specify True to successfully delete featurelists with dependencies; if left False by default, featurelists without dependencies can be successfully deleted and those with dependencies will error upon attempting to delete them.

Returns:
result : dict
A dictionary describing the result of deleting the featurelist, with the following keys
  • dry_run : bool, whether the deletion was a dry run or an actual deletion
  • can_delete : bool, whether the featurelist can actually be deleted
  • deletion_blocked_reason : str, why the featurelist can’t be deleted (if it can’t)
  • num_affected_models : int, the number of models using this featurelist
  • num_affected_jobs : int, the number of jobs using this featurelist
update(name=None, description=None)

Update the name or description of an existing featurelist

Note that only user-created featurelists can be renamed, and that names must not conflict with names used by other featurelists.

Parameters:
name : str, optional

the new name for the featurelist

description : str, optional

the new description for the featurelist

class datarobot.models.ModelingFeaturelist(id=None, name=None, features=None, project_id=None, created=None, is_user_created=None, num_models=None, description=None)

A set of features that can be used to build a model

In time series projects, a new set of modeling features is created after setting the partitioning options. These features are automatically derived from those in the project’s dataset and are the features used for modeling. Modeling features are only accessible once the target and partitioning options have been set. In projects that don’t use time series modeling, once the target has been set, ModelingFeaturelists and Featurelists will behave the same.

For more information about input and modeling features, see the time series documentation.

Attributes:
id : str

the id of the modeling featurelist

project_id : str

the id of the project the modeling featurelist belongs to

name : str

the name of the modeling featurelist

features : list of str

a list of the names of features included in this modeling featurelist

created : datetime.datetime

(New in version v2.13) when the featurelist was created

is_user_created : bool

(New in version v2.13) whether the featurelist was created by a user or by DataRobot automation

num_models : int

(New in version v2.13) the number of models currently using this featurelist. A model is considered to use a featurelist if it is used to train the model or as a monotonic constraint featurelist, or if the model is a blender with at least one component model using the featurelist.

description : basestring

(New in version v2.13) the description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.

classmethod get(project_id, featurelist_id)

Retrieve a modeling featurelist

Modeling featurelists can only be retrieved once the target and partitioning options have been set.

Parameters:
project_id : str

the id of the project the modeling featurelist belongs to

featurelist_id : str

the id of the modeling featurelist to retrieve

Returns:
featurelist : ModelingFeaturelist

the specified featurelist

delete(dry_run=False, delete_dependencies=False)

Delete a featurelist, and any models and jobs using it

All models using a featurelist, whether as the training featurelist or as a monotonic constraint featurelist, will also be deleted when the deletion is executed and any queued or running jobs using it will be cancelled. Similarly, predictions made on these models will also be deleted. All the entities that are to be deleted with a featurelist are described as “dependencies” of it. To preview the results of deleting a featurelist, call delete with dry_run=True

When deleting a featurelist with dependencies, users must specify delete_dependencies=True to confirm they want to delete the featurelist and all its dependencies. Without that option, only featurelists with no dependencies may be successfully deleted and others will error.

Featurelists configured into the project as a default featurelist or as a default monotonic constraint featurelist cannot be deleted.

Featurelists used in a model deployment cannot be deleted until the model deployment is deleted.

Parameters:
dry_run : bool, optional

specify True to preview the result of deleting the featurelist, instead of actually deleting it.

delete_dependencies : bool, optional

specify True to successfully delete featurelists with dependencies; if left False by default, featurelists without dependencies can be successfully deleted and those with dependencies will error upon attempting to delete them.

Returns:
result : dict
A dictionary describing the result of deleting the featurelist, with the following keys
  • dry_run : bool, whether the deletion was a dry run or an actual deletion
  • can_delete : bool, whether the featurelist can actually be deleted
  • deletion_blocked_reason : str, why the featurelist can’t be deleted (if it can’t)
  • num_affected_models : int, the number of models using this featurelist
  • num_affected_jobs : int, the number of jobs using this featurelist
update(name=None, description=None)

Update the name or description of an existing featurelist

Note that only user-created featurelists can be renamed, and that names must not conflict with names used by other featurelists.

Parameters:
name : str, optional

the new name for the featurelist

description : str, optional

the new description for the featurelist

Job

class datarobot.models.Job(data, completed_resource_url=None)

Tracks asynchronous work being done within a project

Attributes:
id : int

the id of the job

project_id : str

the id of the project the job belongs to

status : str

the status of the job - will be one of datarobot.enums.QUEUE_STATUS

job_type : str

what kind of work the job is doing - will be one of datarobot.enums.JOB_TYPE

is_blocked : bool

if true, the job is blocked (cannot be executed) until its dependencies are resolved

classmethod get(project_id, job_id)

Fetches one job.

Parameters:
project_id : str

The identifier of the project in which the job resides

job_id : str

The job id

Returns:
job : Job

The job

Raises:
AsyncFailureError

Querying this resource gave a status code other than 200 or 303

cancel()

Cancel this job. If this job has not finished running, it will be removed and canceled.

get_result(params=None)
Parameters:
params : dict or None

Query parameters to be added to request to get results.

For featureEffects and featureFit, source param is required to define source,
otherwise the default is `training`
Returns:
result : object
Return type depends on the job type:
  • for model jobs, a Model is returned
  • for predict jobs, a pandas.DataFrame (with predictions) is returned
  • for featureImpact jobs, a list of dicts by default (see with_metadata parameter of the FeatureImpactJob class and its get() method).
  • for primeRulesets jobs, a list of Rulesets
  • for primeModel jobs, a PrimeModel
  • for primeDownloadValidation jobs, a PrimeFile
  • for reasonCodesInitialization jobs, a ReasonCodesInitialization
  • for reasonCodes jobs, a ReasonCodes
  • for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
  • for predictionExplanations jobs, a PredictionExplanations
  • for featureEffects, a FeatureEffects
  • for featureFit, a FeatureFit
Raises:
JobNotFinished

If the job is not finished, the result is not available.

AsyncProcessUnsuccessfulError

If the job errored or was aborted

get_result_when_complete(max_wait=600, params=None)
Parameters:
max_wait : int, optional

How long to wait for the job to finish.

params : dict, optional

Query parameters to be added to request.

Returns:
result: object

Return type is the same as would be returned by Job.get_result.

Raises:
AsyncTimeoutError

If the job does not finish in time

AsyncProcessUnsuccessfulError

If the job errored or was aborted

refresh()

Update this object with the latest job data from the server.

wait_for_completion(max_wait=600)

Waits for job to complete.

Parameters:
max_wait : int, optional

How long to wait for the job to finish.

class datarobot.models.TrainingPredictionsJob(data, model_id, data_subset, **kwargs)
classmethod get(project_id, job_id, model_id=None, data_subset=None)

Fetches one training predictions job.

The resulting TrainingPredictions object will be annotated with model_id and data_subset.

Parameters:
project_id : str

The identifier of the project in which the job resides

job_id : str

The job id

model_id : str

The identifier of the model used for computing training predictions

data_subset : dr.enums.DATA_SUBSET, optional

Data subset used for computing training predictions

Returns:
job : TrainingPredictionsJob

The job

refresh()

Update this object with the latest job data from the server.

cancel()

Cancel this job. If this job has not finished running, it will be removed and canceled.

get_result(params=None)
Parameters:
params : dict or None

Query parameters to be added to request to get results.

For featureEffects and featureFit, source param is required to define source,
otherwise the default is `training`
Returns:
result : object
Return type depends on the job type:
  • for model jobs, a Model is returned
  • for predict jobs, a pandas.DataFrame (with predictions) is returned
  • for featureImpact jobs, a list of dicts by default (see with_metadata parameter of the FeatureImpactJob class and its get() method).
  • for primeRulesets jobs, a list of Rulesets
  • for primeModel jobs, a PrimeModel
  • for primeDownloadValidation jobs, a PrimeFile
  • for reasonCodesInitialization jobs, a ReasonCodesInitialization
  • for reasonCodes jobs, a ReasonCodes
  • for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
  • for predictionExplanations jobs, a PredictionExplanations
  • for featureEffects, a FeatureEffects
  • for featureFit, a FeatureFit
Raises:
JobNotFinished

If the job is not finished, the result is not available.

AsyncProcessUnsuccessfulError

If the job errored or was aborted

get_result_when_complete(max_wait=600, params=None)
Parameters:
max_wait : int, optional

How long to wait for the job to finish.

params : dict, optional

Query parameters to be added to request.

Returns:
result: object

Return type is the same as would be returned by Job.get_result.

Raises:
AsyncTimeoutError

If the job does not finish in time

AsyncProcessUnsuccessfulError

If the job errored or was aborted

wait_for_completion(max_wait=600)

Waits for job to complete.

Parameters:
max_wait : int, optional

How long to wait for the job to finish.

class datarobot.models.ShapMatrixJob(data, model_id, dataset_id, **kwargs)
classmethod get(project_id, job_id, model_id=None, dataset_id=None)

Fetches one SHAP matrix job.

Parameters:
project_id : str

The identifier of the project in which the job resides

job_id : str

The job identifier

model_id : str

The identifier of the model used for computing prediction explanations

dataset_id : str

The identifier of the dataset against which prediction explanations should be computed

Returns:
job : ShapMatrixJob

The job

Raises:
AsyncFailureError

Querying this resource gave a status code other than 200 or 303

refresh()

Update this object with the latest job data from the server.

cancel()

Cancel this job. If this job has not finished running, it will be removed and canceled.

get_result(params=None)
Parameters:
params : dict or None

Query parameters to be added to request to get results.

For featureEffects and featureFit, source param is required to define source,
otherwise the default is `training`
Returns:
result : object
Return type depends on the job type:
  • for model jobs, a Model is returned
  • for predict jobs, a pandas.DataFrame (with predictions) is returned
  • for featureImpact jobs, a list of dicts by default (see with_metadata parameter of the FeatureImpactJob class and its get() method).
  • for primeRulesets jobs, a list of Rulesets
  • for primeModel jobs, a PrimeModel
  • for primeDownloadValidation jobs, a PrimeFile
  • for reasonCodesInitialization jobs, a ReasonCodesInitialization
  • for reasonCodes jobs, a ReasonCodes
  • for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
  • for predictionExplanations jobs, a PredictionExplanations
  • for featureEffects, a FeatureEffects
  • for featureFit, a FeatureFit
Raises:
JobNotFinished

If the job is not finished, the result is not available.

AsyncProcessUnsuccessfulError

If the job errored or was aborted

get_result_when_complete(max_wait=600, params=None)
Parameters:
max_wait : int, optional

How long to wait for the job to finish.

params : dict, optional

Query parameters to be added to request.

Returns:
result: object

Return type is the same as would be returned by Job.get_result.

Raises:
AsyncTimeoutError

If the job does not finish in time

AsyncProcessUnsuccessfulError

If the job errored or was aborted

wait_for_completion(max_wait=600)

Waits for job to complete.

Parameters:
max_wait : int, optional

How long to wait for the job to finish.

class datarobot.models.FeatureImpactJob(data, completed_resource_url=None, with_metadata=False)

Custom Feature Impact job to handle different return value structures.

The original implementation had just the the data and the new one also includes some metadata.

In general, we aim to keep the number of Job classes low by just utilizing the job_type attribute to control any specific formatting; however in this case when we needed to support a new representation with the _same_ job_type, customzing the behavior of _make_result_from_location allowed us to achieve our ends without complicating the _make_result_from_json method.

classmethod get(project_id, job_id, with_metadata=False)

Fetches one job.

Parameters:
project_id : str

The identifier of the project in which the job resides

job_id : str

The job id

with_metadata : bool

To make this job return the metadata (i.e. the full object of the completed resource) set the with_metadata flag to True.

Returns:
job : Job

The job

Raises:
AsyncFailureError

Querying this resource gave a status code other than 200 or 303

cancel()

Cancel this job. If this job has not finished running, it will be removed and canceled.

get_result(params=None)
Parameters:
params : dict or None

Query parameters to be added to request to get results.

For featureEffects and featureFit, source param is required to define source,
otherwise the default is `training`
Returns:
result : object
Return type depends on the job type:
  • for model jobs, a Model is returned
  • for predict jobs, a pandas.DataFrame (with predictions) is returned
  • for featureImpact jobs, a list of dicts by default (see with_metadata parameter of the FeatureImpactJob class and its get() method).
  • for primeRulesets jobs, a list of Rulesets
  • for primeModel jobs, a PrimeModel
  • for primeDownloadValidation jobs, a PrimeFile
  • for reasonCodesInitialization jobs, a ReasonCodesInitialization
  • for reasonCodes jobs, a ReasonCodes
  • for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
  • for predictionExplanations jobs, a PredictionExplanations
  • for featureEffects, a FeatureEffects
  • for featureFit, a FeatureFit
Raises:
JobNotFinished

If the job is not finished, the result is not available.

AsyncProcessUnsuccessfulError

If the job errored or was aborted

get_result_when_complete(max_wait=600, params=None)
Parameters:
max_wait : int, optional

How long to wait for the job to finish.

params : dict, optional

Query parameters to be added to request.

Returns:
result: object

Return type is the same as would be returned by Job.get_result.

Raises:
AsyncTimeoutError

If the job does not finish in time

AsyncProcessUnsuccessfulError

If the job errored or was aborted

refresh()

Update this object with the latest job data from the server.

wait_for_completion(max_wait=600)

Waits for job to complete.

Parameters:
max_wait : int, optional

How long to wait for the job to finish.

Lift Chart

class datarobot.models.lift_chart.LiftChart(source, bins, source_model_id, target_class)

Lift chart data for model.

Notes

LiftChartBin is a dict containing the following:

  • actual (float) Sum of actual target values in bin
  • predicted (float) Sum of predicted target values in bin
  • bin_weight (float) The weight of the bin. For weighted projects, it is the sum of the weights of the rows in the bin. For unweighted projects, it is the number of rows in the bin.
Attributes:
source : str

Lift chart data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.

bins : list of dict

List of dicts with schema described as LiftChartBin above.

source_model_id : str

ID of the model this lift chart represents; in some cases, insights from the parent of a frozen model may be used

target_class : str, optional

For multiclass lift - target class for this lift chart data.

Missing Values Report

class datarobot.models.missing_report.MissingValuesReport(missing_values_report)

Missing values report for model, contains list of reports per feature sorted by missing count in descending order.

Notes

Report per feature contains:

  • feature : feature name.
  • type : feature type – ‘Numeric’ or ‘Categorical’.
  • missing_count : missing values count in training data.
  • missing_percentage : missing values percentage in training data.
  • tasks : list of information per each task, which was applied to feature.

task information contains:

  • id : a number of task in the blueprint diagram.
  • name : task name.
  • descriptions : human readable aggregated information about how the task handles missing values. The following descriptions may be present: what value is imputed for missing values, whether the feature being missing is treated as a feature by the task, whether missing values are treated as infrequent values, whether infrequent values are treated as missing values, and whether missing values are ignored.
classmethod get(project_id, model_id)

Retrieve a missing report.

Parameters:
project_id : str

The project’s id.

model_id : str

The model’s id.

Returns:
MissingValuesReport

The queried missing report.

Models

Model

class datarobot.models.Model(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, project=None, data=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, use_project_settings=None)

A model trained on a project’s dataset capable of making predictions

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Attributes:
id : str

the id of the model

project_id : str

the id of the project the model belongs to

processes : list of str

the processes used by the model

featurelist_name : str

the name of the featurelist used by the model

featurelist_id : str

the id of the featurelist used by the model

sample_pct : float or None

the percentage of the project dataset used in training the model. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date and training_end_date instead.

training_row_count : int or None

the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.

training_duration : str or None

only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

training_start_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

training_end_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

model_type : str

what model this is, e.g. ‘Nystroem Kernel SVM Regressor’

model_category : str

what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models

is_frozen : bool

whether this model is a frozen model

blueprint_id : str

the id of the blueprint used in this model

metrics : dict

a mapping from each metric to the model’s scores for that metric

monotonic_increasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

monotonic_decreasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

supports_monotonic_constraints : bool

optinonal, whether this model supports enforcing monotonic constraints

is_starred : bool

whether this model marked as starred

prediction_threshold : float

for binary classification projects, the threshold used for predictions

prediction_threshold_read_only : bool

indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

model_number : integer

model number assigned to a model

parent_model_id : str or None

(New in version v2.20) the id of the model that tuning parameters are derived from

use_project_settings : bool or None

(New in version v2.20) Only present for models in datetime-partitioned projects. If True, indicates that the custom backtest partitioning settings specified by the user were used to train the model and evaluate backtest scores.

classmethod get(project, model_id)

Retrieve a specific model.

Parameters:
project : str

The project’s id.

model_id : str

The model_id of the leaderboard item to retrieve.

Returns:
model : Model

The queried instance.

Raises:
ValueError

passed project parameter value is of not supported type

classmethod fetch_resource_data(url, join_endpoint=True)

(Deprecated.) Used to acquire model data directly from its url.

Consider using get instead, as this is a convenience function used for development of datarobot

Parameters:
url : str

The resource we are acquiring

join_endpoint : boolean, optional

Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint

Returns:
model_data : dict

The queried model’s data

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:
features : list of str

The names of the features used in the model.

get_supported_capabilities()

Retrieves a summary of the capabilities supported by a model.

New in version v2.14.

Returns:
supportsBlending: bool

whether the model supports blending

supportsMonotonicConstraints: bool

whether the model supports monotonic constraints

hasWordCloud: bool

whether the model has word cloud data available

eligibleForPrime: bool

whether the model is eligible for Prime

hasParameters: bool

whether the model has parameters that can be retrieved

supportsCodeGeneration: bool

(New in version v2.18) whether the model supports code generation

supportsShap: bool
(New in version v2.18) True if the model supports Shapley package. i.e. Shapley based

feature Importance

supportsEarlyStopping: bool

(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_num_iterations_trained()

Retrieves the number of estimators trained by early-stopping tree-based models

– versionadded:: v2.22

Returns:
projectId: str

id of project containing the model

modelId: str

id of the model

data: array

list of numEstimatorsItem objects, one for each modeling stage.

numEstimatorsItem will be of the form:
stage: str

indicates the modeling stage (for multi-stage models); None of single-stage models

numIterations: int

the number of estimators or iterations trained by the model

delete()

Delete a model from the project’s leaderboard.

Returns:
url : str

Permanent static hyperlink to this model at leaderboard.

open_model_browser()

Opens model at project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Note

For datetime partitioned projects, see train_datetime instead.

Parameters:
sample_pct : float, optional

The amount of data to use for training, as a percentage of the project dataset from 0 to 100.

featurelist_id : str, optional

The identifier of the featurelist to use. If not defined, the featurelist of this model is used.

scoring_type : str, optional

Either SCORING_TYPE.validation or SCORING_TYPE.cross_validation. SCORING_TYPE.validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, SCORING_TYPE.cross_validation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.

training_row_count : int, optional

The number of rows to use to train the requested model.

monotonic_increasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:
model_job_id : str

id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Examples

project = Project.get('p-id')
model = Model.get('p-id', 'l-id')
model_job_id = model.train(training_row_count=project.max_train_rows)
train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False)

Train this model on a different featurelist or amount of data

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
featurelist_id : str, optional

the featurelist to use to train the model. If not specified, the featurelist of this model is used.

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, neither training_duration nor use_project_settings may be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, neither training_row_count nor use_project_settings may be specified.

use_project_settings : bool, optional

(New in version v2.20) defaults to False. If True, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neither training_row_count nor training_duration may be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

monotonic_increasing_featurelist_id : str, optional

(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_id : str, optional

(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:
job : ModelJob

the created job to build the model

retrain(sample_pct=None, featurelist_id=None, training_row_count=None)

Submit a job to the queue to train a blender model.

Parameters:
sample_pct: str, optional

The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.

featurelist_id : str, optional

The featurelist id

training_row_count : str, optional

The number of rows to train the model. If this parameter is used then sample_pct should not be given.

Returns:
job : ModelJob

The created job that is retraining the model

request_predictions(dataset_id, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None)

Request predictions against a previously uploaded dataset

Parameters:
dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

include_prediction_intervals : bool, optional

(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.

prediction_intervals_size : int, optional

(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).

forecast_point : datetime.datetime or None, optional

(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.

predictions_start_date : datetime.datetime or None, optional

(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.

predictions_end_date : datetime.datetime or None, optional

(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.

actual_value_column : string, optional

(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the

response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).

max_explanations: (New in version v2.21) optional; specifies the maximum number of

explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.

Returns:
job : PredictJob

The job computing the predictions

get_feature_impact(with_metadata=False)

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters:
with_metadata : bool

The flag indicating if the result should include the metadata as well.

Returns:
list or dict

The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:

  • featureImpacts - Feature Impact data as a dictionary. Each item is a dict with
    keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.
  • shapBased - A boolean that indicates whether Feature Impact was calculated using
    Shapley values.
  • ranRedundancyDetection - A boolean that indicates whether redundant feature
    identification was run while calculating this Feature Impact.
  • rowCount - An integer or None that indicates the number of rows that was used to
    calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.
  • count - An integer with the number of features under the featureImpacts.
Raises:
ClientError (404)

If the feature impacts have not been computed.

get_multiclass_feature_impact()

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:
feature_impacts : list of dict

The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:
ClientError (404)

If the multiclass feature impacts have not been computed.

request_feature_impact(row_count=None, with_metadata=False)

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters:
row_count : int

The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multi-class (that has a separate method) and time series projects.

Returns:
job : Job

A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature impacts have already been requested.

request_external_test(dataset_id, actual_value_column=None)

Request external test to compute scores and insights on an external test dataset

Parameters:
dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

actual_value_column : string, optional

(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

Returns
——-
job : Job

a Job representing external dataset insights computation

get_or_request_feature_impact(max_wait=600, **kwargs)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:
max_wait : int, optional

The maximum time to wait for a requested feature impact job to complete before erroring

**kwargs

Arbitrary keyword arguments passed to request_feature_impact.

Returns:
feature_impacts : list or dict

The feature impact data. See get_feature_impact for the exact schema.

get_feature_effect_metadata()
Retrieve Feature Effect metadata. Response contains status and available model sources.
  • Feature Fit of training is always available (except for the old project which supports only Feature Fit for validation).
  • When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Effect is not available for validation or holdout.
  • Feature Effect for holdout is not available when there is no holdout configured for the project.
source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.
Returns:
feature_effect_metadata: FeatureEffectMetadata
get_feature_fit_metadata()
Retrieve Feature Fit metadata. Response contains status and available model sources.
  • Feature Fit of training is always available (except for the old project which supports only Feature Fit for validation).
  • When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Fit is not available for validation or holdout.
  • Feature Fit for holdout is not available when there is no holdout configured for the project.
source is expected parameter to retrieve Feature Fit. One of provided sources shall be used.
Returns:
feature_effect_metadata: FeatureFitMetadata
request_feature_effect(row_count=None)

Request feature effects to be computed for the model.

See get_feature_effect for more information on the result of the job.

Parameters:
row_count : int

(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

Returns:
job : Job

A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature effect have already been requested.

get_feature_effect(source)

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the availiable sources.

Parameters:
source : string

The source Feature Effects are retrieved for.

Returns:
feature_effects : FeatureEffects

The feature effects data.

Raises:
ClientError (404)

If the feature effects have not been computed or source is not valid value.

get_or_request_feature_effect(source, max_wait=600, row_count=None)

Retrieve feature effect for the model, requesting a job if it hasn’t been run previously

See get_feature_effect_metadata for retrieving information of source.

Parameters:
max_wait : int, optional

The maximum time to wait for a requested feature effect job to complete before erroring

row_count : int, optional

(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

source : string

The source Feature Effects are retrieved for.

Returns:
feature_effects : FeatureEffects

The feature effects data.

request_feature_fit()

Request feature fit to be computed for the model.

See get_feature_effect for more information on the result of the job.

Returns:
job : Job

A Job representing the feature fit computation. To get the completed feature fit data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature effect have already been requested.

get_feature_fit(source)

Retrieve Feature Fit for the model.

Feature Fit provides partial dependence and predicted vs actual values for top-500 features ordered by feature importance score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Fit has already been computed with request_feature_effect.

See get_feature_fit_metadata for retrieving information the availiable sources.

Parameters:
source : string

The source Feature Fit are retrieved for. One value of [FeatureFitMetadata.sources].

Returns:
feature_fit : FeatureFit

The feature fit data.

Raises:
ClientError (404)

If the feature fit have not been computed or source is not valid value.

get_or_request_feature_fit(source, max_wait=600)

Retrieve feature fit for the model, requesting a job if it hasn’t been run previously

See get_feature_fit_metadata for retrieving information of source.

Parameters:
max_wait : int, optional

The maximum time to wait for a requested feature fit job to complete before erroring

source : string

The source Feature Fit are retrieved for. One value of [FeatureFitMetadata.sources].

Returns:
feature_effects : FeatureFit

The feature fit data.

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:
prime_eligibility : dict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:
job : Job

the job generating the rulesets

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:
rulesets : list of Ruleset
download_export(filepath)

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters:
filepath : str

The path at which to save the exported model file.

request_transferable_export(prediction_intervals_size=None)

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Parameters:
prediction_intervals_size : int, optional

(New in v2.19) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Prediction intervals size must be between 1 and 100 (inclusive).

Examples

model = datarobot.Model.get('p-id', 'l-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
request_frozen_model(sample_pct=None, training_row_count=None)

Train a new frozen model with parameters from this model

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:
sample_pct : float

optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.

training_row_count : int

(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns:
model_job : ModelJob

the modeling job training a frozen model

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None)

Train a new frozen model with parameters from this model

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

training_start_date : datetime.datetime, optional

the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

training_end_date : datetime.datetime, optional

the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

Returns:
model_job : ModelJob

the modeling job training a frozen model

get_parameters()

Retrieve model parameters.

Returns:
ModelParameters

Model parameters for this model.

get_lift_chart(source, fallback_to_parent_insights=False)

Retrieve model lift chart for the specified source.

Parameters:
source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
LiftChart

Model lift chart data

Raises:
ClientError

If the insight is not available for this model

get_multiclass_lift_chart(source, fallback_to_parent_insights=False)

Retrieve model lift chart for the specified source.

Parameters:
source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
list of LiftChart

Model lift chart data for each saved target class

Raises:
ClientError

If the insight is not available for this model

get_all_lift_charts(fallback_to_parent_insights=False)

Retrieve a list of all lift charts available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of LiftChart

Data for all available model lift charts.

get_residuals_chart(source, fallback_to_parent_insights=False)

Retrieve model residuals chart for the specified source.

Parameters:
source : str

Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.

Returns:
ResidualsChart

Model residuals chart data

Raises:
ClientError

If the insight is not available for this model

get_all_residuals_charts(fallback_to_parent_insights=False)

Retrieve a list of all lift charts available for the model.

Parameters:
fallback_to_parent_insights : bool

Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of ResidualsChart

Data for all available model residuals charts.

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:
ParetoFront

Model ParetoFront data

get_confusion_chart(source, fallback_to_parent_insights=False)

Retrieve model’s confusion chart for the specified source.

Parameters:
source : str

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
ConfusionChart

Model ConfusionChart data

Raises:
ClientError

If the insight is not available for this model

get_all_confusion_charts(fallback_to_parent_insights=False)

Retrieve a list of all confusion charts available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of ConfusionChart

Data for all available confusion charts for model.

get_roc_curve(source, fallback_to_parent_insights=False)

Retrieve model ROC curve for the specified source.

Parameters:
source : str

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns:
RocCurve

Model ROC curve data

Raises:
ClientError

If the insight is not available for this model

get_all_roc_curves(fallback_to_parent_insights=False)

Retrieve a list of all ROC curves available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of RocCurve

Data for all available model ROC curves.

get_word_cloud(exclude_stop_words=False)

Retrieve a word cloud data for the model.

Parameters:
exclude_stop_words : bool, optional

Set to True if you want stopwords filtered out of response.

Returns:
WordCloud

Word cloud data for the model.

download_scoring_code(file_name, source_code=False)

Download scoring code JAR.

Parameters:
file_name : str

File path where scoring code will be saved.

source_code : bool, optional

Set to True to download source code archive. It will not be executable.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:
list of BlueprintTaskDocument

All documents available for the model.

get_model_blueprint_chart()

Retrieve a model blueprint chart that can be used to understand data flow in blueprint.

Returns:
ModelBlueprintChart

The queried model blueprint chart.

get_missing_report_info()

Retrieve a model missing data report on training data that can be used to understand missing values treatment in a model. Report consists of missing values reports for features which took part in modelling and are numeric or categorical.

Returns:
An iterable of MissingReportPerFeature

The queried model missing report, sorted by missing count (DESCENDING order).

get_frozen_child_models()

Retrieves the ids for all the models that are frozen from this model

Returns:
A list of Models
request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)

Start a job to build training predictions

Parameters:
data_subset : str

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
    models in datetime partitioned projects
  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
    all data except training set. Not valid for models in datetime partitioned projects
  • dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
  • dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
    the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
explanation_algorithm : dr.enums.EXPLANATIONS_ALGORITHM

(New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).

max_explanations : int

(New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the max_explanations, the sum of remaining values will also be returned as shap_remaining_total. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored if explanation_algorithm is not set.

Returns:
Job

an instance of created async job

cross_validate()

Run Cross Validation on this model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:
ModelJob

The created job to build the model

get_cross_validation_scores(partition=None, metric=None)

Returns a dictionary keyed by metric showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:
partition : float

optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value.

metric: unicode

optional name of the metric to filter to resulting cross validation scores by

Returns:
cross_validation_scores: dict

A dictionary keyed by metric showing cross validation scores per partition.

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Parameters:
params : dict

Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

description : unicode

Human-readable string describing the newly advanced-tuned model

Returns:
ModelJob

The created job to build the model

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Returns:
dict

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuningDescription and tuningParameters.

tuningDescription an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuningParameters is a list of a dicts, each has the following keys

  • parameterName : (unicode) name of the parameter (unique per task, see below)
  • parameterId : (unicode) opaque ID string uniquely identifying parameter
  • defaultValue : (*) default value of the parameter for the blueprint
  • currentValue : (*) value of the parameter that was used for this model
  • taskName : (unicode) name of the task that this parameter belongs to
  • constraints: (dict) see the notes below

Notes

The type of defaultValue and currentValue is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "length": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
  • ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
  • unicode: The parameter may be any Python unicode object.
  • int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
  • float: The value may be an object of type float within the specified range (inclusive).
  • intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Returns:
AdvancedTuningSession

Session for setting up and running Advanced Tuning on a model

star_model()

Mark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

unstar_model()

Unmark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

set_prediction_threshold(threshold)

Set a custom prediction threshold for the model

May not be used once prediction_threshold_read_only is True for this model.

Parameters:
threshold : float

only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

PrimeModel

class datarobot.models.PrimeModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, parent_model_id=None, ruleset_id=None, rule_count=None, score=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None)

A DataRobot Prime model approximating a parent model with downloadable code

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Attributes:
id : str

the id of the model

project_id : str

the id of the project the model belongs to

processes : list of str

the processes used by the model

featurelist_name : str

the name of the featurelist used by the model

featurelist_id : str

the id of the featurelist used by the model

sample_pct : float

the percentage of the project dataset used in training the model

training_row_count : int or None

the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.

training_duration : str or None

only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

training_start_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

training_end_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

model_type : str

what model this is, e.g. ‘DataRobot Prime’

model_category : str

what kind of model this is - always ‘prime’ for DataRobot Prime models

is_frozen : bool

whether this model is a frozen model

blueprint_id : str

the id of the blueprint used in this model

metrics : dict

a mapping from each metric to the model’s scores for that metric

ruleset : Ruleset

the ruleset used in the Prime model

parent_model_id : str

the id of the model that this Prime model approximates

monotonic_increasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

monotonic_decreasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

supports_monotonic_constraints : bool

optional, whether this model supports enforcing monotonic constraints

is_starred : bool

whether this model is marked as starred

prediction_threshold : float

for binary classification projects, the threshold used for predictions

prediction_threshold_read_only : bool

indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

classmethod get(project_id, model_id)

Retrieve a specific prime model.

Parameters:
project_id : str

The id of the project the prime model belongs to

model_id : str

The model_id of the prime model to retrieve.

Returns:
model : PrimeModel

The queried instance.

request_download_validation(language)

Prep and validate the downloadable code for the ruleset associated with this model

Parameters:
language : str

the language the code should be downloaded in - see datarobot.enums.PRIME_LANGUAGE for available languages

Returns:
job : Job

A job tracking the code preparation and validation

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Parameters:
params : dict

Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

description : unicode

Human-readable string describing the newly advanced-tuned model

Returns:
ModelJob

The created job to build the model

cross_validate()

Run Cross Validation on this model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:
ModelJob

The created job to build the model

delete()

Delete a model from the project’s leaderboard.

download_export(filepath)

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters:
filepath : str

The path at which to save the exported model file.

download_scoring_code(file_name, source_code=False)

Download scoring code JAR.

Parameters:
file_name : str

File path where scoring code will be saved.

source_code : bool, optional

Set to True to download source code archive. It will not be executable.

classmethod fetch_resource_data(url, join_endpoint=True)

(Deprecated.) Used to acquire model data directly from its url.

Consider using get instead, as this is a convenience function used for development of datarobot

Parameters:
url : str

The resource we are acquiring

join_endpoint : boolean, optional

Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint

Returns:
model_data : dict

The queried model’s data

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Returns:
dict

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuningDescription and tuningParameters.

tuningDescription an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuningParameters is a list of a dicts, each has the following keys

  • parameterName : (unicode) name of the parameter (unique per task, see below)
  • parameterId : (unicode) opaque ID string uniquely identifying parameter
  • defaultValue : (*) default value of the parameter for the blueprint
  • currentValue : (*) value of the parameter that was used for this model
  • taskName : (unicode) name of the task that this parameter belongs to
  • constraints: (dict) see the notes below

Notes

The type of defaultValue and currentValue is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "length": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
  • ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
  • unicode: The parameter may be any Python unicode object.
  • int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
  • float: The value may be an object of type float within the specified range (inclusive).
  • intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)

Retrieve a list of all confusion charts available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of ConfusionChart

Data for all available confusion charts for model.

get_all_lift_charts(fallback_to_parent_insights=False)

Retrieve a list of all lift charts available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of LiftChart

Data for all available model lift charts.

get_all_residuals_charts(fallback_to_parent_insights=False)

Retrieve a list of all lift charts available for the model.

Parameters:
fallback_to_parent_insights : bool

Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of ResidualsChart

Data for all available model residuals charts.

get_all_roc_curves(fallback_to_parent_insights=False)

Retrieve a list of all ROC curves available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of RocCurve

Data for all available model ROC curves.

get_confusion_chart(source, fallback_to_parent_insights=False)

Retrieve model’s confusion chart for the specified source.

Parameters:
source : str

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
ConfusionChart

Model ConfusionChart data

Raises:
ClientError

If the insight is not available for this model

get_cross_validation_scores(partition=None, metric=None)

Returns a dictionary keyed by metric showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:
partition : float

optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value.

metric: unicode

optional name of the metric to filter to resulting cross validation scores by

Returns:
cross_validation_scores: dict

A dictionary keyed by metric showing cross validation scores per partition.

get_feature_effect(source)

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the availiable sources.

Parameters:
source : string

The source Feature Effects are retrieved for.

Returns:
feature_effects : FeatureEffects

The feature effects data.

Raises:
ClientError (404)

If the feature effects have not been computed or source is not valid value.

get_feature_effect_metadata()
Retrieve Feature Effect metadata. Response contains status and available model sources.
  • Feature Fit of training is always available (except for the old project which supports only Feature Fit for validation).
  • When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Effect is not available for validation or holdout.
  • Feature Effect for holdout is not available when there is no holdout configured for the project.
source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.
Returns:
feature_effect_metadata: FeatureEffectMetadata
get_feature_fit(source)

Retrieve Feature Fit for the model.

Feature Fit provides partial dependence and predicted vs actual values for top-500 features ordered by feature importance score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Fit has already been computed with request_feature_effect.

See get_feature_fit_metadata for retrieving information the availiable sources.

Parameters:
source : string

The source Feature Fit are retrieved for. One value of [FeatureFitMetadata.sources].

Returns:
feature_fit : FeatureFit

The feature fit data.

Raises:
ClientError (404)

If the feature fit have not been computed or source is not valid value.

get_feature_fit_metadata()
Retrieve Feature Fit metadata. Response contains status and available model sources.
  • Feature Fit of training is always available (except for the old project which supports only Feature Fit for validation).
  • When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Fit is not available for validation or holdout.
  • Feature Fit for holdout is not available when there is no holdout configured for the project.
source is expected parameter to retrieve Feature Fit. One of provided sources shall be used.
Returns:
feature_effect_metadata: FeatureFitMetadata
get_feature_impact(with_metadata=False)

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters:
with_metadata : bool

The flag indicating if the result should include the metadata as well.

Returns:
list or dict

The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:

  • featureImpacts - Feature Impact data as a dictionary. Each item is a dict with
    keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.
  • shapBased - A boolean that indicates whether Feature Impact was calculated using
    Shapley values.
  • ranRedundancyDetection - A boolean that indicates whether redundant feature
    identification was run while calculating this Feature Impact.
  • rowCount - An integer or None that indicates the number of rows that was used to
    calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.
  • count - An integer with the number of features under the featureImpacts.
Raises:
ClientError (404)

If the feature impacts have not been computed.

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:
features : list of str

The names of the features used in the model.

get_frozen_child_models()

Retrieves the ids for all the models that are frozen from this model

Returns:
A list of Models
Returns:
url : str

Permanent static hyperlink to this model at leaderboard.

get_lift_chart(source, fallback_to_parent_insights=False)

Retrieve model lift chart for the specified source.

Parameters:
source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
LiftChart

Model lift chart data

Raises:
ClientError

If the insight is not available for this model

get_missing_report_info()

Retrieve a model missing data report on training data that can be used to understand missing values treatment in a model. Report consists of missing values reports for features which took part in modelling and are numeric or categorical.

Returns:
An iterable of MissingReportPerFeature

The queried model missing report, sorted by missing count (DESCENDING order).

get_model_blueprint_chart()

Retrieve a model blueprint chart that can be used to understand data flow in blueprint.

Returns:
ModelBlueprintChart

The queried model blueprint chart.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:
list of BlueprintTaskDocument

All documents available for the model.

get_multiclass_feature_impact()

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:
feature_impacts : list of dict

The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:
ClientError (404)

If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False)

Retrieve model lift chart for the specified source.

Parameters:
source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
list of LiftChart

Model lift chart data for each saved target class

Raises:
ClientError

If the insight is not available for this model

get_num_iterations_trained()

Retrieves the number of estimators trained by early-stopping tree-based models

– versionadded:: v2.22

Returns:
projectId: str

id of project containing the model

modelId: str

id of the model

data: array

list of numEstimatorsItem objects, one for each modeling stage.

numEstimatorsItem will be of the form:
stage: str

indicates the modeling stage (for multi-stage models); None of single-stage models

numIterations: int

the number of estimators or iterations trained by the model

get_or_request_feature_effect(source, max_wait=600, row_count=None)

Retrieve feature effect for the model, requesting a job if it hasn’t been run previously

See get_feature_effect_metadata for retrieving information of source.

Parameters:
max_wait : int, optional

The maximum time to wait for a requested feature effect job to complete before erroring

row_count : int, optional

(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

source : string

The source Feature Effects are retrieved for.

Returns:
feature_effects : FeatureEffects

The feature effects data.

get_or_request_feature_fit(source, max_wait=600)

Retrieve feature fit for the model, requesting a job if it hasn’t been run previously

See get_feature_fit_metadata for retrieving information of source.

Parameters:
max_wait : int, optional

The maximum time to wait for a requested feature fit job to complete before erroring

source : string

The source Feature Fit are retrieved for. One value of [FeatureFitMetadata.sources].

Returns:
feature_effects : FeatureFit

The feature fit data.

get_or_request_feature_impact(max_wait=600, **kwargs)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:
max_wait : int, optional

The maximum time to wait for a requested feature impact job to complete before erroring

**kwargs

Arbitrary keyword arguments passed to request_feature_impact.

Returns:
feature_impacts : list or dict

The feature impact data. See get_feature_impact for the exact schema.

get_parameters()

Retrieve model parameters.

Returns:
ModelParameters

Model parameters for this model.

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:
ParetoFront

Model ParetoFront data

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:
prime_eligibility : dict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_residuals_chart(source, fallback_to_parent_insights=False)

Retrieve model residuals chart for the specified source.

Parameters:
source : str

Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.

Returns:
ResidualsChart

Model residuals chart data

Raises:
ClientError

If the insight is not available for this model

get_roc_curve(source, fallback_to_parent_insights=False)

Retrieve model ROC curve for the specified source.

Parameters:
source : str

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns:
RocCurve

Model ROC curve data

Raises:
ClientError

If the insight is not available for this model

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:
rulesets : list of Ruleset
get_supported_capabilities()

Retrieves a summary of the capabilities supported by a model.

New in version v2.14.

Returns:
supportsBlending: bool

whether the model supports blending

supportsMonotonicConstraints: bool

whether the model supports monotonic constraints

hasWordCloud: bool

whether the model has word cloud data available

eligibleForPrime: bool

whether the model is eligible for Prime

hasParameters: bool

whether the model has parameters that can be retrieved

supportsCodeGeneration: bool

(New in version v2.18) whether the model supports code generation

supportsShap: bool
(New in version v2.18) True if the model supports Shapley package. i.e. Shapley based

feature Importance

supportsEarlyStopping: bool

(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_word_cloud(exclude_stop_words=False)

Retrieve a word cloud data for the model.

Parameters:
exclude_stop_words : bool, optional

Set to True if you want stopwords filtered out of response.

Returns:
WordCloud

Word cloud data for the model.

open_model_browser()

Opens model at project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

request_external_test(dataset_id, actual_value_column=None)

Request external test to compute scores and insights on an external test dataset

Parameters:
dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

actual_value_column : string, optional

(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

Returns
——-
job : Job

a Job representing external dataset insights computation

request_feature_effect(row_count=None)

Request feature effects to be computed for the model.

See get_feature_effect for more information on the result of the job.

Parameters:
row_count : int

(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

Returns:
job : Job

A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature effect have already been requested.

request_feature_fit()

Request feature fit to be computed for the model.

See get_feature_effect for more information on the result of the job.

Returns:
job : Job

A Job representing the feature fit computation. To get the completed feature fit data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature effect have already been requested.

request_feature_impact(row_count=None, with_metadata=False)

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters:
row_count : int

The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multi-class (that has a separate method) and time series projects.

Returns:
job : Job

A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature impacts have already been requested.

request_predictions(dataset_id, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None)

Request predictions against a previously uploaded dataset

Parameters:
dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

include_prediction_intervals : bool, optional

(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.

prediction_intervals_size : int, optional

(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).

forecast_point : datetime.datetime or None, optional

(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.

predictions_start_date : datetime.datetime or None, optional

(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.

predictions_end_date : datetime.datetime or None, optional

(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.

actual_value_column : string, optional

(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the

response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).

max_explanations: (New in version v2.21) optional; specifies the maximum number of

explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.

Returns:
job : PredictJob

The job computing the predictions

request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)

Start a job to build training predictions

Parameters:
data_subset : str

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
    models in datetime partitioned projects
  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
    all data except training set. Not valid for models in datetime partitioned projects
  • dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
  • dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
    the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
explanation_algorithm : dr.enums.EXPLANATIONS_ALGORITHM

(New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).

max_explanations : int

(New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the max_explanations, the sum of remaining values will also be returned as shap_remaining_total. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored if explanation_algorithm is not set.

Returns:
Job

an instance of created async job

request_transferable_export(prediction_intervals_size=None)

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Parameters:
prediction_intervals_size : int, optional

(New in v2.19) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Prediction intervals size must be between 1 and 100 (inclusive).

Examples

model = datarobot.Model.get('p-id', 'l-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
retrain(sample_pct=None, featurelist_id=None, training_row_count=None)

Submit a job to the queue to train a blender model.

Parameters:
sample_pct: str, optional

The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.

featurelist_id : str, optional

The featurelist id

training_row_count : str, optional

The number of rows to train the model. If this parameter is used then sample_pct should not be given.

Returns:
job : ModelJob

The created job that is retraining the model

set_prediction_threshold(threshold)

Set a custom prediction threshold for the model

May not be used once prediction_threshold_read_only is True for this model.

Parameters:
threshold : float

only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Returns:
AdvancedTuningSession

Session for setting up and running Advanced Tuning on a model

unstar_model()

Unmark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

BlenderModel

class datarobot.models.BlenderModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, model_ids=None, blender_method=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None)

Blender model that combines prediction results from other models.

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Attributes:
id : str

the id of the model

project_id : str

the id of the project the model belongs to

processes : list of str

the processes used by the model

featurelist_name : str

the name of the featurelist used by the model

featurelist_id : str

the id of the featurelist used by the model

sample_pct : float

the percentage of the project dataset used in training the model

training_row_count : int or None

the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.

training_duration : str or None

only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

training_start_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

training_end_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

model_type : str

what model this is, e.g. ‘DataRobot Prime’

model_category : str

what kind of model this is - always ‘prime’ for DataRobot Prime models

is_frozen : bool

whether this model is a frozen model

blueprint_id : str

the id of the blueprint used in this model

metrics : dict

a mapping from each metric to the model’s scores for that metric

model_ids : list of str

List of model ids used in blender

blender_method : str

Method used to blend results from underlying models

monotonic_increasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

monotonic_decreasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

supports_monotonic_constraints : bool

optional, whether this model supports enforcing monotonic constraints

is_starred : bool

whether this model marked as starred

prediction_threshold : float

for binary classification projects, the threshold used for predictions

prediction_threshold_read_only : bool

indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

model_number : integer

model number assigned to a model

parent_model_id : str or None

(New in version v2.20) the id of the model that tuning parameters are derived from

classmethod get(project_id, model_id)

Retrieve a specific blender.

Parameters:
project_id : str

The project’s id.

model_id : str

The model_id of the leaderboard item to retrieve.

Returns:
model : BlenderModel

The queried instance.

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Parameters:
params : dict

Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

description : unicode

Human-readable string describing the newly advanced-tuned model

Returns:
ModelJob

The created job to build the model

cross_validate()

Run Cross Validation on this model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:
ModelJob

The created job to build the model

delete()

Delete a model from the project’s leaderboard.

download_export(filepath)

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters:
filepath : str

The path at which to save the exported model file.

download_scoring_code(file_name, source_code=False)

Download scoring code JAR.

Parameters:
file_name : str

File path where scoring code will be saved.

source_code : bool, optional

Set to True to download source code archive. It will not be executable.

classmethod fetch_resource_data(url, join_endpoint=True)

(Deprecated.) Used to acquire model data directly from its url.

Consider using get instead, as this is a convenience function used for development of datarobot

Parameters:
url : str

The resource we are acquiring

join_endpoint : boolean, optional

Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint

Returns:
model_data : dict

The queried model’s data

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Returns:
dict

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuningDescription and tuningParameters.

tuningDescription an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuningParameters is a list of a dicts, each has the following keys

  • parameterName : (unicode) name of the parameter (unique per task, see below)
  • parameterId : (unicode) opaque ID string uniquely identifying parameter
  • defaultValue : (*) default value of the parameter for the blueprint
  • currentValue : (*) value of the parameter that was used for this model
  • taskName : (unicode) name of the task that this parameter belongs to
  • constraints: (dict) see the notes below

Notes

The type of defaultValue and currentValue is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "length": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
  • ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
  • unicode: The parameter may be any Python unicode object.
  • int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
  • float: The value may be an object of type float within the specified range (inclusive).
  • intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)

Retrieve a list of all confusion charts available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of ConfusionChart

Data for all available confusion charts for model.

get_all_lift_charts(fallback_to_parent_insights=False)

Retrieve a list of all lift charts available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of LiftChart

Data for all available model lift charts.

get_all_residuals_charts(fallback_to_parent_insights=False)

Retrieve a list of all lift charts available for the model.

Parameters:
fallback_to_parent_insights : bool

Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of ResidualsChart

Data for all available model residuals charts.

get_all_roc_curves(fallback_to_parent_insights=False)

Retrieve a list of all ROC curves available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of RocCurve

Data for all available model ROC curves.

get_confusion_chart(source, fallback_to_parent_insights=False)

Retrieve model’s confusion chart for the specified source.

Parameters:
source : str

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
ConfusionChart

Model ConfusionChart data

Raises:
ClientError

If the insight is not available for this model

get_cross_validation_scores(partition=None, metric=None)

Returns a dictionary keyed by metric showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:
partition : float

optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value.

metric: unicode

optional name of the metric to filter to resulting cross validation scores by

Returns:
cross_validation_scores: dict

A dictionary keyed by metric showing cross validation scores per partition.

get_feature_effect(source)

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the availiable sources.

Parameters:
source : string

The source Feature Effects are retrieved for.

Returns:
feature_effects : FeatureEffects

The feature effects data.

Raises:
ClientError (404)

If the feature effects have not been computed or source is not valid value.

get_feature_effect_metadata()
Retrieve Feature Effect metadata. Response contains status and available model sources.
  • Feature Fit of training is always available (except for the old project which supports only Feature Fit for validation).
  • When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Effect is not available for validation or holdout.
  • Feature Effect for holdout is not available when there is no holdout configured for the project.
source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.
Returns:
feature_effect_metadata: FeatureEffectMetadata
get_feature_fit(source)

Retrieve Feature Fit for the model.

Feature Fit provides partial dependence and predicted vs actual values for top-500 features ordered by feature importance score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Fit has already been computed with request_feature_effect.

See get_feature_fit_metadata for retrieving information the availiable sources.

Parameters:
source : string

The source Feature Fit are retrieved for. One value of [FeatureFitMetadata.sources].

Returns:
feature_fit : FeatureFit

The feature fit data.

Raises:
ClientError (404)

If the feature fit have not been computed or source is not valid value.

get_feature_fit_metadata()
Retrieve Feature Fit metadata. Response contains status and available model sources.
  • Feature Fit of training is always available (except for the old project which supports only Feature Fit for validation).
  • When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Fit is not available for validation or holdout.
  • Feature Fit for holdout is not available when there is no holdout configured for the project.
source is expected parameter to retrieve Feature Fit. One of provided sources shall be used.
Returns:
feature_effect_metadata: FeatureFitMetadata
get_feature_impact(with_metadata=False)

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters:
with_metadata : bool

The flag indicating if the result should include the metadata as well.

Returns:
list or dict

The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:

  • featureImpacts - Feature Impact data as a dictionary. Each item is a dict with
    keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.
  • shapBased - A boolean that indicates whether Feature Impact was calculated using
    Shapley values.
  • ranRedundancyDetection - A boolean that indicates whether redundant feature
    identification was run while calculating this Feature Impact.
  • rowCount - An integer or None that indicates the number of rows that was used to
    calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.
  • count - An integer with the number of features under the featureImpacts.
Raises:
ClientError (404)

If the feature impacts have not been computed.

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:
features : list of str

The names of the features used in the model.

get_frozen_child_models()

Retrieves the ids for all the models that are frozen from this model

Returns:
A list of Models
Returns:
url : str

Permanent static hyperlink to this model at leaderboard.

get_lift_chart(source, fallback_to_parent_insights=False)

Retrieve model lift chart for the specified source.

Parameters:
source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
LiftChart

Model lift chart data

Raises:
ClientError

If the insight is not available for this model

get_missing_report_info()

Retrieve a model missing data report on training data that can be used to understand missing values treatment in a model. Report consists of missing values reports for features which took part in modelling and are numeric or categorical.

Returns:
An iterable of MissingReportPerFeature

The queried model missing report, sorted by missing count (DESCENDING order).

get_model_blueprint_chart()

Retrieve a model blueprint chart that can be used to understand data flow in blueprint.

Returns:
ModelBlueprintChart

The queried model blueprint chart.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:
list of BlueprintTaskDocument

All documents available for the model.

get_multiclass_feature_impact()

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:
feature_impacts : list of dict

The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:
ClientError (404)

If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False)

Retrieve model lift chart for the specified source.

Parameters:
source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
list of LiftChart

Model lift chart data for each saved target class

Raises:
ClientError

If the insight is not available for this model

get_num_iterations_trained()

Retrieves the number of estimators trained by early-stopping tree-based models

– versionadded:: v2.22

Returns:
projectId: str

id of project containing the model

modelId: str

id of the model

data: array

list of numEstimatorsItem objects, one for each modeling stage.

numEstimatorsItem will be of the form:
stage: str

indicates the modeling stage (for multi-stage models); None of single-stage models

numIterations: int

the number of estimators or iterations trained by the model

get_or_request_feature_effect(source, max_wait=600, row_count=None)

Retrieve feature effect for the model, requesting a job if it hasn’t been run previously

See get_feature_effect_metadata for retrieving information of source.

Parameters:
max_wait : int, optional

The maximum time to wait for a requested feature effect job to complete before erroring

row_count : int, optional

(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

source : string

The source Feature Effects are retrieved for.

Returns:
feature_effects : FeatureEffects

The feature effects data.

get_or_request_feature_fit(source, max_wait=600)

Retrieve feature fit for the model, requesting a job if it hasn’t been run previously

See get_feature_fit_metadata for retrieving information of source.

Parameters:
max_wait : int, optional

The maximum time to wait for a requested feature fit job to complete before erroring

source : string

The source Feature Fit are retrieved for. One value of [FeatureFitMetadata.sources].

Returns:
feature_effects : FeatureFit

The feature fit data.

get_or_request_feature_impact(max_wait=600, **kwargs)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:
max_wait : int, optional

The maximum time to wait for a requested feature impact job to complete before erroring

**kwargs

Arbitrary keyword arguments passed to request_feature_impact.

Returns:
feature_impacts : list or dict

The feature impact data. See get_feature_impact for the exact schema.

get_parameters()

Retrieve model parameters.

Returns:
ModelParameters

Model parameters for this model.

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:
ParetoFront

Model ParetoFront data

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:
prime_eligibility : dict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_residuals_chart(source, fallback_to_parent_insights=False)

Retrieve model residuals chart for the specified source.

Parameters:
source : str

Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.

Returns:
ResidualsChart

Model residuals chart data

Raises:
ClientError

If the insight is not available for this model

get_roc_curve(source, fallback_to_parent_insights=False)

Retrieve model ROC curve for the specified source.

Parameters:
source : str

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns:
RocCurve

Model ROC curve data

Raises:
ClientError

If the insight is not available for this model

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:
rulesets : list of Ruleset
get_supported_capabilities()

Retrieves a summary of the capabilities supported by a model.

New in version v2.14.

Returns:
supportsBlending: bool

whether the model supports blending

supportsMonotonicConstraints: bool

whether the model supports monotonic constraints

hasWordCloud: bool

whether the model has word cloud data available

eligibleForPrime: bool

whether the model is eligible for Prime

hasParameters: bool

whether the model has parameters that can be retrieved

supportsCodeGeneration: bool

(New in version v2.18) whether the model supports code generation

supportsShap: bool
(New in version v2.18) True if the model supports Shapley package. i.e. Shapley based

feature Importance

supportsEarlyStopping: bool

(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_word_cloud(exclude_stop_words=False)

Retrieve a word cloud data for the model.

Parameters:
exclude_stop_words : bool, optional

Set to True if you want stopwords filtered out of response.

Returns:
WordCloud

Word cloud data for the model.

open_model_browser()

Opens model at project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:
job : Job

the job generating the rulesets

request_external_test(dataset_id, actual_value_column=None)

Request external test to compute scores and insights on an external test dataset

Parameters:
dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

actual_value_column : string, optional

(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

Returns
——-
job : Job

a Job representing external dataset insights computation

request_feature_effect(row_count=None)

Request feature effects to be computed for the model.

See get_feature_effect for more information on the result of the job.

Parameters:
row_count : int

(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.

Returns:
job : Job

A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature effect have already been requested.

request_feature_fit()

Request feature fit to be computed for the model.

See get_feature_effect for more information on the result of the job.

Returns:
job : Job

A Job representing the feature fit computation. To get the completed feature fit data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature effect have already been requested.

request_feature_impact(row_count=None, with_metadata=False)

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters:
row_count : int

The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multi-class (that has a separate method) and time series projects.

Returns:
job : Job

A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None)

Train a new frozen model with parameters from this model

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

training_start_date : datetime.datetime, optional

the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

training_end_date : datetime.datetime, optional

the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

Returns:
model_job : ModelJob

the modeling job training a frozen model

request_frozen_model(sample_pct=None, training_row_count=None)

Train a new frozen model with parameters from this model

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:
sample_pct : float

optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.

training_row_count : int

(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns:
model_job : ModelJob

the modeling job training a frozen model

request_predictions(dataset_id, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None)

Request predictions against a previously uploaded dataset

Parameters:
dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

include_prediction_intervals : bool, optional

(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.

prediction_intervals_size : int, optional

(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).

forecast_point : datetime.datetime or None, optional

(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.

predictions_start_date : datetime.datetime or None, optional

(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.

predictions_end_date : datetime.datetime or None, optional

(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.

actual_value_column : string, optional

(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.

explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the

response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).

max_explanations: (New in version v2.21) optional; specifies the maximum number of

explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.

Returns:
job : PredictJob

The job computing the predictions

request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)

Start a job to build training predictions

Parameters:
data_subset : str

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
    models in datetime partitioned projects
  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
    all data except training set. Not valid for models in datetime partitioned projects
  • dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
  • dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
    the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
explanation_algorithm : dr.enums.EXPLANATIONS_ALGORITHM

(New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).

max_explanations : int

(New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the max_explanations, the sum of remaining values will also be returned as shap_remaining_total. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored if explanation_algorithm is not set.

Returns:
Job

an instance of created async job

request_transferable_export(prediction_intervals_size=None)

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Parameters:
prediction_intervals_size : int, optional

(New in v2.19) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Prediction intervals size must be between 1 and 100 (inclusive).

Examples

model = datarobot.Model.get('p-id', 'l-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
retrain(sample_pct=None, featurelist_id=None, training_row_count=None)

Submit a job to the queue to train a blender model.

Parameters:
sample_pct: str, optional

The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.

featurelist_id : str, optional

The featurelist id

training_row_count : str, optional

The number of rows to train the model. If this parameter is used then sample_pct should not be given.

Returns:
job : ModelJob

The created job that is retraining the model

set_prediction_threshold(threshold)

Set a custom prediction threshold for the model

May not be used once prediction_threshold_read_only is True for this model.

Parameters:
threshold : float

only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Returns:
AdvancedTuningSession

Session for setting up and running Advanced Tuning on a model

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Note

For datetime partitioned projects, see train_datetime instead.

Parameters:
sample_pct : float, optional

The amount of data to use for training, as a percentage of the project dataset from 0 to 100.

featurelist_id : str, optional

The identifier of the featurelist to use. If not defined, the featurelist of this model is used.

scoring_type : str, optional

Either SCORING_TYPE.validation or SCORING_TYPE.cross_validation. SCORING_TYPE.validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, SCORING_TYPE.cross_validation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.

training_row_count : int, optional

The number of rows to use to train the requested model.

monotonic_increasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:
model_job_id : str

id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Examples

project = Project.get('p-id')
model = Model.get('p-id', 'l-id')
model_job_id = model.train(training_row_count=project.max_train_rows)
train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False)

Train this model on a different featurelist or amount of data

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
featurelist_id : str, optional

the featurelist to use to train the model. If not specified, the featurelist of this model is used.

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, neither training_duration nor use_project_settings may be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, neither training_row_count nor use_project_settings may be specified.

use_project_settings : bool, optional

(New in version v2.20) defaults to False. If True, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neither training_row_count nor training_duration may be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.

monotonic_increasing_featurelist_id : str, optional

(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_id : str, optional

(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:
job : ModelJob

the created job to build the model

unstar_model()

Unmark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

DatetimeModel

class datarobot.models.DatetimeModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, training_info=None, holdout_score=None, holdout_status=None, data_selection_method=None, backtests=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, effective_feature_derivation_window_start=None, effective_feature_derivation_window_end=None, forecast_window_start=None, forecast_window_end=None, windows_basis_unit=None, model_number=None, parent_model_id=None, use_project_settings=None)

A model from a datetime partitioned project

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Note that only one of training_row_count, training_duration, and training_start_date and training_end_date will be specified, depending on the data_selection_method of the model. Whichever method was selected determines the amount of data used to train on when making predictions and scoring the backtests and the holdout.

Attributes:
id : str

the id of the model

project_id : str

the id of the project the model belongs to

processes : list of str

the processes used by the model

featurelist_name : str

the name of the featurelist used by the model

featurelist_id : str

the id of the featurelist used by the model

sample_pct : float

the percentage of the project dataset used in training the model

training_row_count : int or None

If specified, an int specifying the number of rows used to train the model and evaluate backtest scores.

training_duration : str or None

If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

training_start_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

training_end_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

time_window_sample_pct : int or None

An integer between 1 and 99 indicating the percentage of sampling within the training window. The points kept are determined by a random uniform sample. If not specified, no sampling was done.

model_type : str

what model this is, e.g. ‘Nystroem Kernel SVM Regressor’

model_category : str

what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models

is_frozen : bool

whether this model is a frozen model

blueprint_id : str

the id of the blueprint used in this model

metrics : dict

a mapping from each metric to the model’s scores for that metric. The keys in metrics are the different metrics used to evaluate the model, and the values are the results. The dictionaries inside of metrics will be as described here: ‘validation’, the score for a single backtest; ‘crossValidation’, always None; ‘backtesting’, the average score for all backtests if all are available and computed, or None otherwise; ‘backtestingScores’, a list of scores for all backtests where the score is None if that backtest does not have a score available; and ‘holdout’, the score for the holdout or None if the holdout is locked or the score is unavailable.

backtests : list of dict

describes what data was used to fit each backtest, the score for the project metric, and why the backtest score is unavailable if it is not provided.

data_selection_method : str

which of training_row_count, training_duration, or training_start_data and training_end_date were used to determine the data used to fit the model. One of ‘rowCount’, ‘duration’, or ‘selectedDateRange’.

training_info : dict

describes which data was used to train on when scoring the holdout and making predictions. training_info` will have the following keys: holdout_training_start_date, holdout_training_duration, holdout_training_row_count, holdout_training_end_date, prediction_training_start_date, prediction_training_duration, prediction_training_row_count, prediction_training_end_date. Start and end dates will be datetimes, durations will be duration strings, and rows will be integers.

holdout_score : float or None

the score against the holdout, if available and the holdout is unlocked, according to the project metric.

holdout_status : string or None

the status of the holdout score, e.g. “COMPLETED”, “HOLDOUT_BOUNDARIES_EXCEEDED”. Unavailable if the holdout fold was disabled in the partitioning configuration.

monotonic_increasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

monotonic_decreasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

supports_monotonic_constraints : bool

optional, whether this model supports enforcing monotonic constraints

is_starred : bool

whether this model marked as starred

prediction_threshold : float

for binary classification projects, the threshold used for predictions

prediction_threshold_read_only : bool

indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

effective_feature_derivation_window_start : int or None

(New in v2.16) For time series projects only. How many units of the windows_basis_unit into the past relative to the forecast point the user needs to provide history for at prediction time. This can differ from the feature_derivation_window_start set on the project due to the differencing method and period selected, or if the model is a time series native model such as ARIMA. Will be a negative integer in time series projects and None otherwise.

effective_feature_derivation_window_end : int or None

(New in v2.16) For time series projects only. How many units of the windows_basis_unit into the past relative to the forecast point the feature derivation window should end. Will be a non-positive integer in time series projects and None otherwise.

forecast_window_start : int or None

(New in v2.16) For time series projects only. How many units of the windows_basis_unit into the future relative to the forecast point the forecast window should start. Note that this field will be the same as what is shown in the project settings. Will be a non-negative integer in time series projects and None otherwise.

forecast_window_end : int or None

(New in v2.16) For time series projects only. How many units of the windows_basis_unit into the future relative to the forecast point the forecast window should end. Note that this field will be the same as what is shown in the project settings. Will be a non-negative integer in time series projects and None otherwise.

windows_basis_unit : str or None

(New in v2.16) For time series projects only. Indicates which unit is the basis for the feature derivation window and the forecast window. Note that this field will be the same as what is shown in the project settings. In time series projects, will be either the detected time unit or “ROW”, and None otherwise.

model_number : integer

model number assigned to a model

parent_model_id : str or None

(New in version v2.20) the id of the model that tuning parameters are derived from

use_project_settings : bool or None

(New in version v2.20) If True, indicates that the custom backtest partitioning settings specified by the user were used to train the model and evaluate backtest scores.

classmethod get(project, model_id)

Retrieve a specific datetime model

If the project does not use datetime partitioning, a ClientError will occur.

Parameters:
project : str

the id of the project the model belongs to

model_id : str

the id of the model to retrieve

Returns:
model : DatetimeModel

the model

score_backtests()

Compute the scores for all available backtests

Some backtests may be unavailable if the model is trained into their validation data.

Returns:
job : Job

a job tracking the backtest computation. When it is complete, all available backtests will have scores computed.

cross_validate()

Inherited from Model - DatetimeModels cannot request Cross Validation,

Use score_backtests instead.

get_cross_validation_scores(partition=None, metric=None)

Inherited from Model - DatetimeModels cannot request Cross Validation scores,

Use backtests instead.

request_training_predictions(data_subset)

Start a job to build training predictions

Parameters:
data_subset : str

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only
  • dr.enums.DATA_SUBSET.ALL_BACKTESTS for downloading the predictions for all
    backtest validation folds. Requires the model to have successfully scored all backtests.
Returns
——-
Job

an instance of created async job

get_series_accuracy_as_dataframe(offset=0, limit=100, metric=None, multiseries_value=None, order_by=None, reverse=False)

Retrieve the Series Accuracy for the specified model as a pandas.DataFrame.

Parameters:
offset : int, optional

The number of results to skip. Defaults to 0 if not specified.

limit : int, optional

The maximum number of results to return. Defaults to 100 if not specified.

metric : str, optional

The name of the metric to retrieve scores for. If omitted, the default project metric will be used.

multiseries_value : str, optional

If specified, only the series containing the given value in one of the series ID columns will be returned.

order_by : str, optional

Used for sorting the series. Attribute must be one of datarobot.enums.SERIES_ACCURACY_ORDER_BY.

reverse : bool, optional

Used for sorting the series. If True, will sort the series in descending order by the attribute specified by order_by.

Returns:
data

A pandas.DataFrame with the Series Accuracy for the specified model.

download_series_accuracy_as_csv(filename, encoding='utf-8', offset=0, limit=100, metric=None, multiseries_value=None, order_by=None, reverse=False)

Save the Series Accuracy for the specified model into a csv file.

Parameters:
filename : str or file object

The path or file object to save the data to.

encoding : str, optional

A string representing the encoding to use in the output csv file. Defaults to ‘utf-8’.

offset : int, optional

The number of results to skip. Defaults to 0 if not specified.

limit : int, optional

The maximum number of results to return. Defaults to 100 if not specified.

metric : str, optional

The name of the metric to retrieve scores for. If omitted, the default project metric will be used.

multiseries_value : str, optional

If specified, only the series containing the given value in one of the series ID columns will be returned.

order_by : str, optional

Used for sorting the series. Attribute must be one of datarobot.enums.SERIES_ACCURACY_ORDER_BY.

reverse : bool, optional

Used for sorting the series. If True, will sort the series in descending order by the attribute specified by order_by.

compute_series_accuracy()

Compute the Series Accuracy for this model

Returns:
Job

an instance of the created async job

retrain(time_window_sample_pct=None, featurelist_id=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None)

Submit a job to the queue to train a blender model.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters:
featurelist_id : str, optional

The featurelist id

training_row_count : str, optional

The number of rows to train the model. If this parameter is used then sample_pct should not be given.

time_window_sample_pct : int, optional

An int between 1 and 99 indicating the percentage of sampling within the time window. The points kept are determined by a random uniform sample. If specified, training_row_count must not be specified and training_duration or training_start_date and training_end_date must be specified.

training_duration : str, optional

A duration string representing the training duration for the submitted model. If specified then training_row_count must not be specified.

training_start_date : str, optional

A datetime string representing the start date of the data to use for training this model. If specified, training_end_date must also be specified. The value must be before the training_end_date value.

training_end_date : str, optional

A datetime string representing the end date of the data to use for training this model. If specified, training_start_date must also be specified. The value must be after the training_start_date value.

Returns:
job : ModelJob

The created job that is retraining the model

get_feature_effect_metadata()

Retrieve Feature Effect metadata for each backtest. Response contains status and available sources for each backtest of the model.

  • Each backtest is available for training and validation
  • If holdout is configured for the project it has holdout as backtestIndex. It has training and holdout sources available.

Start/stop models contain a single response item with startstop value for backtestIndex.

  • Feature Effect of training is always available (except for the old project which supports only Feature Effect for validation).
  • When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Effect is not available for validation or holdout.
  • Feature Effect for holdout is not available when there is no holdout configured for the project.

source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.

backtestIndex is expected parameter to submit compute request and retrieve Feature Effect. One of provided backtest indexes shall be used.

Returns:
feature_effect_metadata: FeatureEffectMetadataDatetime
get_feature_fit_metadata()

Retrieve Feature Fit metadata for each backtest. Response contains status and available sources for each backtest of the model.

  • Each backtest is available for training and validation
  • If holdout is configured for the project it has holdout as backtestIndex. It has training and holdout sources available.

Start/stop models contain a single response item with startstop value for backtestIndex.

  • Feature Fit of training is always available (except for the old project which supports only Feature Effect for validation).
  • When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Fit is not available for validation or holdout.
  • Feature Fit for holdout is not available when there is no holdout configured for the project.

source is expected parameter to retrieve Feature Fit. One of provided sources shall be used.

backtestIndex is expected parameter to submit compute request and retrieve Feature Fit. One of provided backtest indexes shall be used.

Returns:
feature_effect_metadata: FeatureFitMetadataDatetime
request_feature_effect(backtest_index)

Request feature effects to be computed for the model.

See get_feature_effect for more information on the result of the job.

See get_feature_effect_metadata for retrieving information of backtest_index.

Parameters:
backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.

The backtest index to retrieve Feature Effects for.

Returns:
job : Job

A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature effect have already been requested.

get_feature_effect(source, backtest_index)

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information of source, backtest_index.

Parameters:
source: string

The source Feature Effects are retrieved for. One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the availiable sources for feature effect.

backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.

The backtest index to retrieve Feature Effects for.

Returns:
feature_effects: FeatureEffects

The feature effects data.

Raises:
ClientError (404)

If the feature effects have not been computed or source is not valid value.

get_or_request_feature_effect(source, backtest_index, max_wait=600)

Retrieve feature effect for the model, requesting a job if it hasn’t been run previously

See get_feature_effect_metadata for retrieving information of source, backtest_index.

Parameters:
max_wait : int, optional

The maximum time to wait for a requested feature effect job to complete before erroring

source : string

The source Feature Effects are retrieved for. One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the availiable sources for feature effect.

backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.

The backtest index to retrieve Feature Effects for.

Returns:
feature_effects : FeatureEffects

The feature effects data.

request_feature_fit(backtest_index)

Request feature fit to be computed for the model.

See get_feature_fit for more information on the result of the job.

See get_feature_fit_metadata for retrieving information of backtest_index.

Parameters:
backtest_index: string, FeatureFitMetadataDatetime.backtest_index.

The backtest index to retrieve Feature Fit for.

Returns:
job : Job

A Job representing the feature fit computation. To get the completed feature fit data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature fit have already been requested.

get_feature_fit(source, backtest_index)

Retrieve Feature Fit for the model.

Feature Fit provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Fit has already been computed with request_feature_fit.

See get_feature_fit_metadata for retrieving information of source, backtest_index.

Parameters:
source: string

The source Feature Fit are retrieved for. One value of [FeatureFitMetadataDatetime.sources]. To retrieve the availiable sources for feature fit.

backtest_index: string, FeatureFitMetadataDatetime.backtest_index.

The backtest index to retrieve Feature Fit for.

Returns:
feature_fit: FeatureFit

The feature fit data.

Raises:
ClientError (404)

If the feature fit have not been computed or source is not valid value.

get_or_request_feature_fit(source, backtest_index, max_wait=600)

Retrieve feature fit for the model, requesting a job if it hasn’t been run previously

See get_feature_fit_metadata for retrieving information of source, backtest_index.

Parameters:
max_wait : int, optional

The maximum time to wait for a requested feature fit job to complete before erroring

source : string

The source Feature Fit are retrieved for. One value of [FeatureFitMetadataDatetime.sources]. To retrieve the availiable sources for feature effect.

backtest_index: string, FeatureFitMetadataDatetime.backtest_index.

The backtest index to retrieve Feature Fit for.

Returns:
feature_fit : FeatureFit

The feature fit data.

calculate_prediction_intervals(prediction_intervals_size)

Calculate prediction intervals for this DatetimeModel for the specified size.

New in version v2.19.

Parameters:
prediction_intervals_size : int

The prediction intervals size to calculate for this model. See the prediction intervals documentation for more information.

Returns:
job : Job

a Job tracking the prediction intervals computation

get_calculated_prediction_intervals(offset=None, limit=None)

Retrieve a list of already-calculated prediction intervals for this model

New in version v2.19.

Parameters:
offset : int, optional

If provided, this many results will be skipped

limit : int, optional

If provided, at most this many results will be returned. If not provided, will return at most 100 results.

Returns:
list[int]

A descending-ordered list of already-calculated prediction interval sizes

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Parameters:
params : dict

Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

description : unicode

Human-readable string describing the newly advanced-tuned model

Returns:
ModelJob

The created job to build the model

delete()

Delete a model from the project’s leaderboard.

download_export(filepath)

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters:
filepath : str

The path at which to save the exported model file.

download_scoring_code(file_name, source_code=False)

Download scoring code JAR.

Parameters:
file_name : str

File path where scoring code will be saved.

source_code : bool, optional

Set to True to download source code archive. It will not be executable.

classmethod fetch_resource_data(url, join_endpoint=True)

(Deprecated.) Used to acquire model data directly from its url.

Consider using get instead, as this is a convenience function used for development of datarobot

Parameters:
url : str

The resource we are acquiring

join_endpoint : boolean, optional

Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint

Returns:
model_data : dict

The queried model’s data

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Returns:
dict

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuningDescription and tuningParameters.

tuningDescription an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuningParameters is a list of a dicts, each has the following keys

  • parameterName : (unicode) name of the parameter (unique per task, see below)
  • parameterId : (unicode) opaque ID string uniquely identifying parameter
  • defaultValue : (*) default value of the parameter for the blueprint
  • currentValue : (*) value of the parameter that was used for this model
  • taskName : (unicode) name of the task that this parameter belongs to
  • constraints: (dict) see the notes below

Notes

The type of defaultValue and currentValue is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "length": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
  • ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
  • unicode: The parameter may be any Python unicode object.
  • int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
  • float: The value may be an object of type float within the specified range (inclusive).
  • intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)

Retrieve a list of all confusion charts available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of ConfusionChart

Data for all available confusion charts for model.

get_all_lift_charts(fallback_to_parent_insights=False)

Retrieve a list of all lift charts available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of LiftChart

Data for all available model lift charts.

get_all_residuals_charts(fallback_to_parent_insights=False)

Retrieve a list of all lift charts available for the model.

Parameters:
fallback_to_parent_insights : bool

Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of ResidualsChart

Data for all available model residuals charts.

get_all_roc_curves(fallback_to_parent_insights=False)

Retrieve a list of all ROC curves available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of RocCurve

Data for all available model ROC curves.

get_confusion_chart(source, fallback_to_parent_insights=False)

Retrieve model’s confusion chart for the specified source.

Parameters:
source : str

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
ConfusionChart

Model ConfusionChart data

Raises:
ClientError

If the insight is not available for this model

get_feature_impact(with_metadata=False)

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are co