API Reference

Advanced Options

class datarobot.helpers.AdvancedOptions(weights=None, response_cap=None, blueprint_threshold=None, seed=None, smart_downsampled=False, majority_downsampling_rate=None, offset=None, exposure=None, accuracy_optimized_mb=None, scaleout_modeling_mode=None, events_count=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, only_include_monotonic_blueprints=None)

Used when setting the target of a project to set advanced options of modeling process.

Parameters:
weights : string, optional

The name of a column indicating the weight of each row

response_cap : float in [0.5, 1), optional

Quantile of the response distribution to use for response capping.

blueprint_threshold : int, optional

Number of hours models are permitted to run before being excluded from later autopilot stages Minimum 1

seed : int

a seed to use for randomization

smart_downsampled : bool

whether to use smart downsampling to throw away excess rows of the majority class. Only applicable to classification and zero-boosted regression projects.

majority_downsampling_rate : float

the percentage between 0 and 100 of the majority rows that should be kept. Specify only if using smart downsampling. May not cause the majority class to become smaller than the minority class.

offset : list of str, optional

(New in version v2.6) the list of the names of the columns containing the offset of each row

exposure : string, optional

(New in version v2.6) the name of a column containing the exposure of each row

accuracy_optimized_mb : bool, optional

(New in version v2.6) Include additional, longer-running models that will be run by the autopilot and available to run manually.

scaleout_modeling_mode : string, optional

(New in version v2.8) Specifies the behavior of Scaleout models for the project. This is one of datarobot.enums.SCALEOUT_MODELING_MODE. If datarobot.enums.SCALEOUT_MODELING_MODE.DISABLED, no models will run during autopilot or show in the list of available blueprints. Scaleout models must be disabled for some partitioning settings including projects using datetime partitioning or projects using offset or exposure columns. If datarobot.enums.SCALEOUT_MODELING_MODE.REPOSITORY_ONLY, scaleout models will be in the list of available blueprints but not run during autopilot. If datarobot.enums.SCALEOUT_MODELING_MODE.AUTOPILOT, scaleout models will run during autopilot and be in the list of available blueprints. Scaleout models are only supported in the Hadoop enviroment with the corresponding user permission set.

events_count : string, optional

(New in version v2.8) the name of a column specifying events count.

monotonic_increasing_featurelist_id : string, optional

(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overriden at model submission time if desired.

monotonic_decreasing_featurelist_id : string, optional

(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overriden at model submission time if desired.

only_include_monotonic_blueprints : bool, optional

(new in version 2.11) when true, only blueprints that support enforcing monotonic constraints will be available in the project or selected for the autopilot.

Examples

import datarobot as dr
advanced_options = dr.AdvancedOptions(
    weights='weights_column',
    offset=['offset_column'],
    exposure='exposure_column',
    response_cap=0.7,
    blueprint_threshold=2,
    smart_downsampled=True, majority_downsampling_rate=75.0)

Blueprint

class datarobot.models.Blueprint(id=None, processes=None, model_type=None, project_id=None, blueprint_category=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None)

A Blueprint which can be used to fit models

Attributes:
id : str

the id of the blueprint

processes : list of str

the processes used by the blueprint

model_type : str

the model produced by the blueprint

project_id : str

the project the blueprint belongs to

blueprint_category : str

(New in version v2.6) Describes the category of the blueprint and the kind of model it produces.

classmethod get(project_id, blueprint_id)

Retrieve a blueprint.

Parameters:
project_id : str

The project’s id.

blueprint_id : str

Id of blueprint to retrieve.

Returns:
blueprint : Blueprint

The queried blueprint.

get_chart()

Retrieve a chart.

Returns:
BlueprintChart

The current blueprint chart.

get_documents()

Get documentation for tasks used in the blueprint.

Returns:
list of BlueprintTaskDocument

All documents available for blueprint.

class datarobot.models.BlueprintTaskDocument(title=None, task=None, description=None, parameters=None, links=None, references=None)

Document describing a task from a blueprint.

Attributes:
title : str

Title of document.

task : str

Name of the task described in document.

description : str

Task description.

parameters : list of dict(name, type, description)

Parameters that task can receive in human-readable format.

links : list of dict(name, url)

External links used in document

references : list of dict(name, url)

References used in document. When no link available url equals None.

class datarobot.models.BlueprintChart(nodes, edges)

A Blueprint chart that can be used to understand data flow in blueprint.

Attributes:
nodes : list of dict (id, label)

Chart nodes, id unique in chart.

edges : list of tuple (id1, id2)

Directions of data flow between blueprint chart nodes.

classmethod get(project_id, blueprint_id)

Retrieve a blueprint chart.

Parameters:
project_id : str

The project’s id.

blueprint_id : str

Id of blueprint to retrieve chart.

Returns:
BlueprintChart

The queried blueprint chart.

to_graphviz()

Get blueprint chart in graphviz DOT format.

Returns:
unicode

String representation of chart in graphviz DOT language.

class datarobot.models.ModelBlueprintChart(nodes, edges)

A Blueprint chart that can be used to understand data flow in model. Model blueprint chart represents reduced repository blueprint chart with only elements that used to build this particular model.

Attributes:
nodes : list of dict (id, label)

Chart nodes, id unique in chart.

edges : list of tuple (id1, id2)

Directions of data flow between blueprint chart nodes.

classmethod get(project_id, model_id)

Retrieve a model blueprint chart.

Parameters:
project_id : str

The project’s id.

model_id : str

Id of model to retrieve model blueprint chart.

Returns:
ModelBlueprintChart

The queried model blueprint chart.

to_graphviz()

Get blueprint chart in graphviz DOT format.

Returns:
unicode

String representation of chart in graphviz DOT language.

Calendar File

class datarobot.CalendarFile(calendar_end_date=None, calendar_start_date=None, created=None, id=None, name=None, num_event_types=None, num_events=None, project_ids=None, role=None)

Represents the data for a calendar file

Attributes:
id : str

The id of the calendar file.

calendar_start_date : str

The earliest date in the calendar.

calendar_end_date : str

The last date in the calendar.

created : str

The date this calendar was created, i.e. uploaded to DR.

name : str

The name of the calendar.

num_event_types : int

The number of different event types.

num_events : int

The number of events this calendar has.

project_ids : list of strings

A list containing the projectIds of the projects using this calendar.

role : str

The access role the user has for this calendar.

classmethod create(file_path, calendar_name=None)

Creates a calendar using the given file. The provided file must be a CSV in the format:

Date,   Event
<date>, <event_type>,
<date>, <event_type>,

A header row is required.

Parameters:
file_path : string

A string representing a path to a local csv file.

calendar_name : string, optional

A name to assign to the calendar. Defaults to the name of the file if not provided.

Returns:
calendar_file : CalendarFile

Instance with initialized data.

Raises:
AsyncProcessUnsuccessfulError

Raised if there was an error processing the provided calendar file.

Examples

# Creating a calendar with a specified name
cal = dr.CalendarFile.create('/home/calendars/somecalendar.csv',
                                         calendar_name='Some Calendar Name')
cal.id
>>> 5c1d4904211c0a061bc93013
cal.name
>>> Some Calendar Name

# Creating a calendar without specifying a name
cal = dr.CalendarFile.create('/home/calendars/somecalendar.csv')
cal.id
>>> 5c1d4904211c0a061bc93012
cal.name
>>> somecalendar.csv
classmethod get(calendar_id)

Gets the details of a calendar, given the id.

Parameters:
calendar_id : str

The identifier of the calendar.

Returns:
calendar_file : CalendarFile

The requested calendar.

Raises:
DataError

Raised if the calendar_id is invalid, i.e. the specified CalendarFile does not exist.

Examples

cal = dr.CalendarFile.get(some_calendar_id)
cal.id
>>> some_calendar_id
classmethod list(project_id=None, batch_size=None)

Gets the details of all calendars this user has view access for.

Parameters:
project_id : str, optional

If provided, will filter for calendars associated only with the specified project.

batch_size : int, optional

The number of calendars to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of calendars. If not specified, an appropriate default will be chosen by the server.

Returns:
calendar_list : list of CalendarFile

A list of CalendarFile objects.

Examples

calendars = dr.CalendarFile.list()
len(calendars)
>>> 10
classmethod delete(calendar_id)

Deletes the calendar specified by calendar_id.

Parameters:
calendar_id : str

The id of the calendar to delete. The requester must have OWNER access for this calendar.

Raises:
ClientError

Raised if an invalid calendar_id is provided.

Examples

# Deleting with a valid calendar_id
status_code = dr.CalendarFile.delete(some_calendar_id)
status_code
>>> 204
dr.CalendarFile.get(some_calendar_id)
>>> ClientError: Item not found
classmethod update_name(calendar_id, new_calendar_name)

Changes the name of the specified calendar to the specified name. The requester must have at least READ_WRITE permissions on the calendar.

Parameters:
calendar_id : str

The id of the calendar to update.

new_calendar_name : str

The new name to set for the specified calendar.

Returns:
status_code : int

200 for success

Raises:
ClientError

Raised if an invalid calendar_id is provided.

Examples

response = dr.CalendarFile.update_name(some_calendar_id, some_new_name)
response
>>> 200
cal = dr.CalendarFile.get(some_calendar_id)
cal.name
>>> some_new_name
classmethod share(calendar_id, access_list)

Shares the calendar with the specified users, assigning the specified roles.

Parameters:
calendar_id : str

The id of the calendar to update

access_list:

A list of dr.SharingAccess objects. Specify None for the role to delete a user’s access from the specified CalendarFile. For more information on specific access levels, see the sharing documentation.

Returns:
status_code : int

200 for success

Raises:
ClientError

Raised if unable to update permissions for a user.

AssertionError

Raised if access_list is invalid.

Examples

# assuming some_user is a valid user, share this calendar with some_user
sharing_list = [dr.SharingAccess(some_user_username,
                                 dr.enums.SHARING_ROLE.READ_WRITE)]
response = dr.CalendarFile.share(some_calendar_id, sharing_list)
response.status_code
>>> 200

# delete some_user from this calendar, assuming they have access of some kind already
delete_sharing_list = [dr.SharingAccess(some_user_username,
                                        None)]
response = dr.CalendarFile.share(some_calendar_id, delete_sharing_list)
response.status_code
>>> 200

# Attempt to add an invalid user to a calendar
invalid_sharing_list = [dr.SharingAccess(invalid_username,
                                         dr.enums.SHARING_ROLE.READ_WRITE)]
dr.CalendarFile.share(some_calendar_id, invalid_sharing_list)
>>> ClientError: Unable to update access for this calendar
classmethod get_access_list(calendar_id, batch_size=None)

Retrieve a list of users that have access to this calendar.

Parameters:
calendar_id : str

The id of the calendar to retrieve the access list for.

batch_size : int, optional

The number of access records to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of calendars. If not specified, an appropriate default will be chosen by the server.

Returns:
access_control_list : list of SharingAccess

A list of SharingAccess objects.

Raises:
ClientError

Raised if user does not have access to calendar or calendar does not exist.

Compliance Documentation Templates

class datarobot.models.compliance_doc_template.ComplianceDocTemplate(id, creator_id, creator_username, name, org_id=None, sections=None)

A compliance documentation template. Templates are used to customize contents of ComplianceDocumentation.

New in version v2.14.

Notes

Each section dictionary has the following schema:

  • title : title of the section
  • type : type of section. Must be one of “datarobot”, “user” or “table_of_contents”.

Each type of section has a different set of attributes described bellow.

Section of type "datarobot" represent a section owned by DataRobot. DataRobot sections have the following additional attributes:

  • content_id : The identifier of the content in this section. You can get the default template with get_default for a complete list of possible DataRobot section content ids.
  • sections : list of sub-section dicts nested under the parent section.

Section of type "user" represent a section with user-defined content. Those sections may contain text generated by user and have the following additional fields:

  • regularText : regular text of the section, optionally separated by \n to split paragraphs.
  • highlightedText : highlighted text of the section, optionally separated by \n to split paragraphs.
  • sections : list of sub-section dicts nested under the parent section.

Section of type "table_of_contents" represent a table of contents and has no additional attributes.

Attributes:
id : str

the id of the template

name : str

the name of the template.

creator_id : str

the id of the user who created the template

creator_username : str

username of the user who created the template

org_id : str

the id of the organization the template belongs to

sections : list of dicts

the sections of the template describing the structure of the document. Section schema is described in Notes section above.

classmethod get_default(template_type=None)

Get a default DataRobot template. This template is used for generating compliance documentation when no template is specified.

Parameters:
template_type : str or None

Type of the template. Currently supported values are “normal” and “time_series”

Returns:
template : ComplianceDocTemplate

the default template object with sections attribute populated with default sections.

classmethod create_from_json_file(name, path)

Create a template with the specified name and sections in a JSON file.

This is useful when working with sections in a JSON file. Example:

default_template = ComplianceDocTemplate.get_default()
default_template.sections_to_json_file('path/to/example.json')
# ... edit example.json in your editor
my_template = ComplianceDocTemplate.create_from_json_file(
    name='my template',
    path='path/to/example.json'
)
Parameters:
name : str

the name of the template. Must be unique for your user.

path : str

the path to find the JSON file at

Returns:
template : ComplianceDocTemplate

the created template

classmethod create(name, sections)

Create a template with the specified name and sections.

Parameters:
name : str

the name of the template. Must be unique for your user.

sections : list

list of section objects

Returns:
template : ComplianceDocTemplate

the created template

classmethod get(template_id)

Retrieve a specific template.

Parameters:
template_id : str

the id of the template to retrieve

Returns:
template : ComplianceDocTemplate

the retrieved template

classmethod list(name_part=None, limit=None, offset=None)

Get a paginated list of compliance documentation template objects.

Parameters:
name_part : str or None

Return only the templates with names matching specified string. The matching is case-insensitive.

limit : int

The number of records to return. The server will use a (possibly finite) default if not specified.

offset : int

The number of records to skip.

Returns:
templates : list of ComplianceDocTemplate

the list of template objects

sections_to_json_file(path, indent=2)

Save sections of the template to a json file at the specified path

Parameters:
path : str

the path to save the file to

indent : int

indentation to use in the json file.

update(name=None, sections=None)

Update the name or sections of an existing doc template.

Note that default or non-existent templates can not be updated.

Parameters:
name : str, optional

the new name for the template

sections : list of dicts

list of sections

delete()

Delete the compliance documentation template.

Compliance Documentation

class datarobot.models.compliance_documentation.ComplianceDocumentation(project_id, model_id, template_id=None)

A compliance documentation object.

New in version v2.14.

Examples

doc = ComplianceDocumentation('project-id', 'model-id')
job = doc.generate()
job.wait_for_completion()
doc.download('example.docx')
Attributes:
project_id : str

the id of the project

model_id : str

the id of the model

template_id : str or None

optional id of the template for the generated doc. See documentation for ComplianceDocTemplate for more info.

generate()

Start a job generating model compliance documentation.

Returns:
Job

an instance of an async job

download(filepath)

Download the generated compliance documentation file and save it to the specified path. The generated file has a DOCX format.

Parameters:
filepath : str

A file path, e.g. “/path/to/save/compliance_documentation.docx”

Confusion Chart

class datarobot.models.confusion_chart.ConfusionChart(source, data, source_model_id)

Confusion Chart data for model.

Notes

ClassMetrics is a dict containing the following:

  • class_name (string) name of the class
  • actual_count (int) number of times this class is seen in the validation data
  • predicted_count (int) number of times this class has been predicted for the validation data
  • f1 (float) F1 score
  • recall (float) recall score
  • precision (float) precision score
  • was_actual_percentages (list of dict) one vs all actual percentages in format specified below.
    • other_class_name (string) the name of the other class
    • percentage (float) the percentage of the times this class was predicted when is was actually class (from 0 to 1)
  • was_predicted_percentages (list of dict) one vs all predicted percentages in format specified below.
    • other_class_name (string) the name of the other class
    • percentage (float) the percentage of the times this class was actual predicted (from 0 to 1)
  • confusion_matrix_one_vs_all (list of list) 2d list representing 2x2 one vs all matrix.
    • This represents the True/False Negative/Positive rates as integer for each class. The data structure looks like:
    • [ [ True Negative, False Positive ], [ False Negative, True Positive ] ]
Attributes:
source : str

Confusion Chart data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.

raw_data : dict

All of the raw data for the Confusion Chart

confusion_matrix : list of list

The NxN confusion matrix

classes : list

The names of each of the classes

class_metrics : list of dicts

List of dicts with schema described as ClassMetrics above.

source_model_id : str

ID of the model this Confusion chart represents; in some cases, insights from the parent of a frozen model may be used

Database Connectivity

class datarobot.DataDriver(id=None, creator=None, base_names=None, class_name=None, canonical_name=None)

A data driver

Attributes:
id : str

the id of the driver.

class_name : str

the Java class name for the driver.

canonical_name : str

the user-friendly name of the driver.

creator : str

the id of the user who created the driver.

base_names : list of str

a list of the file name(s) of the jar files.

classmethod list()

Returns list of available drivers.

Returns:
drivers : list of DataDriver instances

contains a list of available drivers.

Examples

>>> import datarobot as dr
>>> drivers = dr.DataDriver.list()
>>> drivers
[DataDriver('mysql'), DataDriver('RedShift'), DataDriver('PostgreSQL')]
classmethod get(driver_id)

Gets the driver.

Parameters:
driver_id : str

the identifier of the driver.

Returns:
driver : DataDriver

the required driver.

Examples

>>> import datarobot as dr
>>> driver = dr.DataDriver.get('5ad08a1889453d0001ea7c5c')
>>> driver
DataDriver('PostgreSQL')
classmethod create(class_name, canonical_name, files)

Creates the driver. Only available to admin users.

Parameters:
class_name : str

the Java class name for the driver.

canonical_name : str

the user-friendly name of the driver.

files : list of str

a list of the file paths on file system file_path(s) for the driver.

Returns:
driver : DataDriver

the created driver.

Raises:
ClientError

raised if user is not granted for Can manage JDBC database drivers feature

Examples

>>> import datarobot as dr
>>> driver = dr.DataDriver.create(
...     class_name='org.postgresql.Driver',
...     canonical_name='PostgreSQL',
...     files=['/tmp/postgresql-42.2.2.jar']
... )
>>> driver
DataDriver('PostgreSQL')
update(class_name=None, canonical_name=None)

Updates the driver. Only available to admin users.

Parameters:
class_name : str

the Java class name for the driver.

canonical_name : str

the user-friendly name of the driver.

Raises:
ClientError

raised if user is not granted for Can manage JDBC database drivers feature

Examples

>>> import datarobot as dr
>>> driver = dr.DataDriver.get('5ad08a1889453d0001ea7c5c')
>>> driver.canonical_name
'PostgreSQL'
>>> driver.update(canonical_name='postgres')
>>> driver.canonical_name
'postgres'
delete()

Removes the driver. Only available to admin users.

Raises:
ClientError

raised if user is not granted for Can manage JDBC database drivers feature

class datarobot.DataStore(data_store_id=None, data_store_type=None, canonical_name=None, creator=None, updated=None, params=None, role=None)

A data store. Represents database

Attributes:
id : str

the id of the data store.

data_store_type : str

the type of data store.

canonical_name : str

the user-friendly name of the data store.

creator : str

the id of the user who created the data store.

updated : datetime.datetime

the time of the last update

params : DataStoreParameters

a list specifying data store parameters.

classmethod list()

Returns list of available data stores.

Returns:
data_stores : list of DataStore instances

contains a list of available data stores.

Examples

>>> import datarobot as dr
>>> data_stores = dr.DataStore.list()
>>> data_stores
[DataStore('Demo'), DataStore('Airlines')]
classmethod get(data_store_id)

Gets the data store.

Parameters:
data_store_id : str

the identifier of the data store.

Returns:
data_store : DataStore

the required data store.

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5a8ac90b07a57a0001be501e')
>>> data_store
DataStore('Demo')
classmethod create(data_store_type, canonical_name, driver_id, jdbc_url)

Creates the data store.

Parameters:
data_store_type : str

the type of data store.

canonical_name : str

the user-friendly name of the data store.

driver_id : str

the identifier of the DataDriver.

jdbc_url : str

the full JDBC url, for example jdbc:postgresql://my.dbaddress.org:5432/my_db.

Returns:
data_store : DataStore

the created data store.

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.create(
...     data_store_type='jdbc',
...     canonical_name='Demo DB',
...     driver_id='5a6af02eb15372000117c040',
...     jdbc_url='jdbc:postgresql://my.db.address.org:5432/perftest'
... )
>>> data_store
DataStore('Demo DB')
update(canonical_name=None, driver_id=None, jdbc_url=None)

Updates the data store.

Parameters:
canonical_name : str

optional, the user-friendly name of the data store.

driver_id : str

optional, the identifier of the DataDriver.

jdbc_url : str

optional, the full JDBC url, for example jdbc:postgresql://my.dbaddress.org:5432/my_db.

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae')
>>> data_store
DataStore('Demo DB')
>>> data_store.update(canonical_name='Demo DB updated')
>>> data_store
DataStore('Demo DB updated')
delete()

Removes the DataStore

test(username, password)

Tests database connection.

Parameters:
username : str

the username for database authentication.

password : str

the password for database authentication. The password is encrypted at server side and never saved / stored

Returns:
message : dict

message with status.

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae')
>>> data_store.test(username='db_username', password='db_password')
{'message': 'Connection successful'}
schemas(username, password)

Returns list of available schemas.

Parameters:
username : str

the username for database authentication.

password : str

the password for database authentication. The password is encrypted at server side and never saved / stored

Returns:
response : dict

dict with database name and list of str - available schemas

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae')
>>> data_store.schemas(username='db_username', password='db_password')
{'catalog': 'perftest', 'schemas': ['demo', 'information_schema', 'public']}
tables(username, password, schema=None)

Returns list of available tables in schema.

Parameters:
username : str

optional, the username for database authentication.

password : str

optional, the password for database authentication. The password is encrypted at server side and never saved / stored

schema : str

optional, the schema name.

Returns:
response : dict

dict with catalog name and tables info

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae')
>>> data_store.tables(username='db_username', password='db_password', schema='demo')
{'tables': [{'type': 'TABLE', 'name': 'diagnosis', 'schema': 'demo'}, {'type': 'TABLE',
'name': 'kickcars', 'schema': 'demo'}, {'type': 'TABLE', 'name': 'patient',
'schema': 'demo'}, {'type': 'TABLE', 'name': 'transcript', 'schema': 'demo'}],
'catalog': 'perftest'}
classmethod from_server_data(data, keep_attrs=None)

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters:
data : dict

The directly translated dict of JSON from the server. No casing fixes have taken place

keep_attrs : list

List of the dotted namespace notations for attributes to keep within the object structure even if their values are None

get_access_list()

Retrieve what users have access to this data store

New in version v2.14.

Returns:
list of :class:`SharingAccess <datarobot.SharingAccess>`
share(access_list)

Modify the ability of users to access this data store

New in version v2.14.

Parameters:
access_list : list of SharingAccess

the modifications to make.

Raises:
datarobot.ClientError :

if you do not have permission to share this data store, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the data store without an owner.

Examples

Transfer access to the data store from old_user@datarobot.com to new_user@datarobot.com

import datarobot as dr

new_access = dr.SharingAccess(new_user@datarobot.com,
                              dr.enums.SHARING_ROLE.OWNER, can_share=True)
access_list = [dr.SharingAccess(old_user@datarobot.com, None), new_access]

dr.DataStore.get('my-data-store-id').share(access_list)
class datarobot.DataSource(data_source_id=None, data_source_type=None, canonical_name=None, creator=None, updated=None, params=None, role=None)

A data source. Represents data request

Attributes:
data_source_id : str

the id of the data source.

data_source_type : str

the type of data source.

canonical_name : str

the user-friendly name of the data source.

creator : str

the id of the user who created the data source.

updated : datetime.datetime

the time of the last update.

params : DataSourceParameters

a list specifying data source parameters.

classmethod list()

Returns list of available data sources.

Returns:
data_sources : list of DataSource instances

contains a list of available data sources.

Examples

>>> import datarobot as dr
>>> data_sources = dr.DataSource.list()
>>> data_sources
[DataSource('Diagnostics'), DataSource('Airlines 100mb'), DataSource('Airlines 10mb')]
classmethod get(data_source_id)

Gets the data source.

Parameters:
data_source_id : str

the identifier of the data source.

Returns:
data_source : DataSource

the requested data source.

Examples

>>> import datarobot as dr
>>> data_source = dr.DataSource.get('5a8ac9ab07a57a0001be501f')
>>> data_source
DataSource('Diagnostics')
classmethod create(data_source_type, canonical_name, params)

Creates the data source.

Parameters:
data_source_type : str

the type of data source.

canonical_name : str

the user-friendly name of the data source.

params : DataSourceParameters

a list specifying data source parameters.

Returns:
data_source : DataSource

the created data source.

Examples

>>> import datarobot as dr
>>> params = dr.DataSourceParameters(
...     data_store_id='5a8ac90b07a57a0001be501e',
...     query='SELECT * FROM airlines10mb WHERE "Year" >= 1995;'
... )
>>> data_source = dr.DataSource.create(
...     data_source_type='jdbc',
...     canonical_name='airlines stats after 1995',
...     params=params
... )
>>> data_source
DataSource('airlines stats after 1995')
update(canonical_name=None, params=None)

Creates the data source.

Parameters:
canonical_name : str

optional, the user-friendly name of the data source.

params : DataSourceParameters

optional, the identifier of the DataDriver.

Examples

>>> import datarobot as dr
>>> data_source = dr.DataSource.get('5ad840cc613b480001570953')
>>> data_source
DataSource('airlines stats after 1995')
>>> params = dr.DataSourceParameters(
...     query='SELECT * FROM airlines10mb WHERE "Year" >= 1990;'
... )
>>> data_source.update(
...     canonical_name='airlines stats after 1990',
...     params=params
... )
>>> data_source
DataSource('airlines stats after 1990')
delete()

Removes the DataSource

classmethod from_server_data(data, keep_attrs=None)

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters:
data : dict

The directly translated dict of JSON from the server. No casing fixes have taken place

keep_attrs : list

List of the dotted namespace notations for attributes to keep within the object structure even if their values are None

get_access_list()

Retrieve what users have access to this data source

New in version v2.14.

Returns:
list of :class:`SharingAccess <datarobot.SharingAccess>`
share(access_list)

Modify the ability of users to access this data source

New in version v2.14.

Parameters:
access_list : list of SharingAccess

the modifications to make.

Raises:
datarobot.ClientError :

if you do not have permission to share this data source, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the data source without an owner

Examples

Transfer access to the data source from old_user@datarobot.com to new_user@datarobot.com

import datarobot as dr

new_access = dr.SharingAccess(new_user@datarobot.com,
                              dr.enums.SHARING_ROLE.OWNER, can_share=True)
access_list = [dr.SharingAccess(old_user@datarobot.com, None), new_access]

dr.DataSource.get('my-data-source-id').share(access_list)
class datarobot.DataSourceParameters(data_store_id=None, table=None, schema=None, partition_column=None, query=None, fetch_size=None)

Data request configuration

Attributes:
data_store_id : str

the id of the DataStore.

table : str

optional, the name of specified database table.

schema : str

optional, the name of the schema associated with the table.

partition_column : str

optional, the name of the partition column.

query : str

optional, the user specified SQL query.

fetch_size : int

optional, a user specified fetch size in the range [1, 20000]. By default a fetchSize will be assigned to balance throughput and memory usage

Deployment

class datarobot.Deployment(id=None, label=None, description=None, default_prediction_server=None, model=None, capabilities=None, prediction_usage=None, service_health=None, model_health=None, accuracy_health=None)

A deployment created from a DataRobot model.

Attributes:
id : str

the id of the deployment

label : str

the label of the deployment

description : str

the description of the deployment

default_prediction_server : dict

information on the default prediction server of the deployment

model : dict

information on the model of the deployment

capabilities : dict

information on the capabilities of the deployment

prediction_usage : dict

information on the prediction usage of the deployment

service_health : dict

information on the service health of the deployment

model_health : dict

information on the model health of the deployment

accuracy_health : dict

information on the accuracy health of the deployment

classmethod create_from_learning_model(model_id, label, description=None, default_prediction_server_id=None)

Create a deployment from a DataRobot model.

New in version v2.17.

Parameters:
model_id : str

id of the DataRobot model to deploy

label : str

a human readable label of the deployment

description : str, optional

a human readable description of the deployment

default_prediction_server_id : str

an identifier of a prediction server to be used as the default prediction server

Returns:
deployment : Deployment

The created deployment

Examples

from datarobot import Project, Deployment
project = Project.get('5506fcd38bd88f5953219da0')
model = project.get_models()[0]
deployment = Deployment.create_from_learning_model(model.id, 'New Deployment')
deployment
>>> Deployment('New Deployment')
classmethod list()

List all deployments a user can view.

New in version v2.17.

Returns:
deployments : list

a list of deployments the user can view

Examples

from datarobot import Deployment
deployments = Deployment.list()
deployments
>>> [Deployment('New Deployment'), Deployment('Previous Deployment')]
classmethod get(deployment_id)

Get information about a deployment.

New in version v2.17.

Parameters:
deployment_id : str

the id of the deployment

Returns:
deployment : Deployment

the queried deployment

Examples

from datarobot import Deployment
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
deployment.id
>>>'5c939e08962d741e34f609f0'
deployment.label
>>>'New Deployment'
delete()

Delete this deployment.

New in version v2.17.

replace_model(new_model_id, reason)

Replace the model used in this deployment. To confirm model replacement eligibility, use validate_replacement_model() beforehand.

New in version v2.17.

Model replacement is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.

Predictions made against this deployment will start using the new model as soon as the initial request is completed. There will be no interruption for predictions throughout the process.

Parameters:
new_model_id : str

The id of the new model to use

reason : MODEL_REPLACEMENT_REASON

The reason for the model replacement. Must be one of ‘ACCURACY’, ‘DATA_DRIFT’, ‘ERRORS’, ‘SCHEDULED_REFRESH’, ‘SCORING_SPEED’, or ‘OTHER’. This value will be stored in the model history to keep track of why a model was replaced

Examples

from datarobot import Deployment
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
deployment.model['id'], deployment.model['type']
>>>('5c0a979859b00004ba52e431', 'Decision Tree Classifier (Gini)')

deployment.replace_model('5c0a969859b00004ba52e41b', MODEL_REPLACEMENT_REASON.ACCURACY)
deployment.model['id'], deployment.model['type']
>>>('5c0a969859b00004ba52e41b', 'Support Vector Classifier (Linear Kernel)')
validate_replacement_model(new_model_id)

Validate a model can be used as the replacement model of the deployment.

New in version v2.17.

Parameters:
new_model_id : str

the id of the new model to validate

Returns:
status : str

status of the validation, will be one of ‘passing’, ‘warning’ or ‘failing’. If the status is passing or warning, use replace_model() to perform a model replacement. If the status is failing, refer to checks for more detail on why the new model cannot be used as a replacement.

message : str

message for the validation result

checks : dict

explain why the new model can or cannot replace the deployment’s current model

get_drift_tracking_settings()

Retrieve drift tracking settings of this deployment.

New in version v2.17.

Returns:
settings : dict

Drift tracking settings of the deployment containing two nested dicts with key target_drift and feature_drift, which are further described below.

Target drift setting contains:

enabled : bool

If target drift tracking is enabled for this deployment. To create or update existing ‘’target_drift’’ settings, see update_drift_tracking_settings()

Feature drift setting contains:

enabled : bool

If feature drift tracking is enabled for this deployment. To create or update existing ‘’feature_drift’’ settings, see update_drift_tracking_settings()

update_drift_tracking_settings(target_drift_enabled=None, feature_drift_enabled=None)

Update drift tracking settings of this deployment.

New in version v2.17.

Updating drift tracking setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.

Parameters:
target_drift_enabled : bool, optional

if target drift tracking is to be turned on

feature_drift_enabled : bool, optional

if feature drift tracking is to be turned on

Feature

class datarobot.models.Feature(id, project_id=None, name=None, feature_type=None, importance=None, low_information=None, unique_count=None, na_count=None, date_format=None, min=None, max=None, mean=None, median=None, std_dev=None, time_series_eligible=None, time_series_eligibility_reason=None, time_step=None, time_unit=None, target_leakage=None)

A feature from a project’s dataset

These are features either included in the originally uploaded dataset or added to it via feature transformations. In time series projects, these will be distinct from the ModelingFeature s created during partitioning; otherwise, they will correspond to the same features. For more information about input and modeling features, see the time series documentation.

The min, max, mean, median, and std_dev attributes provide information about the distribution of the feature in the EDA sample data. For non-numeric features or features created prior to these summary statistics becoming available, they will be None. For features where the summary statistics are available, they will be in a format compatible with the data type, i.e. date type features will have their summary statistics expressed as ISO-8601 formatted date strings.

Attributes:
id : int

the id for the feature - note that name is used to reference the feature instead of id

project_id : str

the id of the project the feature belongs to

name : str

the name of the feature

feature_type : str

the type of the feature, e.g. ‘Categorical’, ‘Text’

importance : float or None

numeric measure of the strength of relationship between the feature and target (independent of any model or other features); may be None for non-modeling features such as partition columns

low_information : bool

whether a feature is considered too uninformative for modeling (e.g. because it has too few values)

unique_count : int

number of unique values

na_count : int or None

number of missing values

date_format : str or None

For Date features, the date format string for how this feature was interpreted, compatible with https://docs.python.org/2/library/time.html#time.strftime . For other feature types, None.

min : str, int, float, or None

The minimum value of the source data in the EDA sample

max : str, int, float, or None

The maximum value of the source data in the EDA sample

mean : str, int, or, float

The arithmetic mean of the source data in the EDA sample

median : str, int, float, or None

The median of the source data in the EDA sample

std_dev : str, int, float, or None

The standard deviation of the source data in the EDA sample

time_series_eligible : bool

Whether this feature can be used as the datetime partition column in a time series project.

time_series_eligibility_reason : str

Why the feature is ineligible for the datetime partition column in a time series project, or ‘suitable’ when it is eligible.

time_step : int or None

For time series eligible features, a positive integer determining the interval at which windows can be specified. If used as the datetime partition column on a time series project, the feature derivation and forecast windows must start and end at an integer multiple of this value. None for features that are not time series eligible.

time_unit : str or None

For time series eligible features, the time unit covered by a single time step, e.g. ‘HOUR’, or None for features that are not time series eligible.

target_leakage : str

Whether a feature is considered to have target leakage or not. A value of ‘SKIPPED_DETECTION’ indicates that target leakage detection was not run on the feature. ‘FALSE’ indicates no leakage, ‘MODERATE’ indicates a moderate risk of target leakage, and ‘HIGH_RISK’ indicates a high risk of target leakage

classmethod get(project_id, feature_name)

Retrieve a single feature

Parameters:
project_id : str

The ID of the project the feature is associated with.

feature_name : str

The name of the feature to retrieve

Returns:
feature : Feature

The queried instance

get_multiseries_properties(multiseries_id_columns, max_wait=600)

Retrieve time series properties for a potential multiseries datetime partition column

Multiseries time series projects use multiseries id columns to model multiple distinct series within a single project. This function returns the time series properties (time step and time unit) of this column if it were used as a datetime partition column with the specified multiseries id columns, running multiseries detection automatically if it had not previously been successfully ran.

Parameters:
multiseries_id_columns : list of str

the name(s) of the multiseries id columns to use with this datetime partition column. Currently only one multiseries id column is supported.

max_wait : int, optional

if a multiseries detection task is run, the maximum amount of time to wait for it to complete before giving up

Returns:
properties : dict

A dict with three keys:

  • time_series_eligible : bool, whether the column can be used as a partition column
  • time_unit : str or null, the inferred time unit if used as a partition column
  • time_step : int or null, the inferred time step if used as a partition column
get_cross_series_properties(datetime_partition_column, cross_series_group_by_columns, max_wait=600)

Retrieve cross-series properties for multiseries ID column.

This function returns the cross-series properties (eligibility as group-by column) of this column if it were used with specified datetime partition column and with current multiseries id column, running cross-series group-by validation automatically if it had not previously been successfully ran.

Parameters:
datetime_partition_column : datetime partition column
cross_series_group_by_columns : list of str

the name(s) of the columns to use with this multiseries ID column. Currently only one cross-series group-by column is supported.

max_wait : int, optional

if a multiseries detection task is run, the maximum amount of time to wait for it to complete before giving up

Returns:
properties : dict

A dict with three keys:

  • name : str, column name
  • eligibility : str, reason for column eligibility
  • isEligible : bool, is column eligible as cross-series group-by
class datarobot.models.ModelingFeature(project_id=None, name=None, feature_type=None, importance=None, low_information=None, unique_count=None, na_count=None, date_format=None, min=None, max=None, mean=None, median=None, std_dev=None, parent_feature_names=None)

A feature used for modeling

In time series projects, a new set of modeling features is created after setting the partitioning options. These features are automatically derived from those in the project’s dataset and are the features used for modeling. Modeling features are only accessible once the target and partitioning options have been set. In projects that don’t use time series modeling, once the target has been set, ModelingFeatures and Features will behave the same.

For more information about input and modeling features, see the time series documentation.

As with the Feature object, the min, max, `mean, median, and std_dev attributes provide information about the distribution of the feature in the EDA sample data. For non-numeric features, they will be None. For features where the summary statistics are available, they will be in a format compatible with the data type, i.e. date type features will have their summary statistics expressed as ISO-8601 formatted date strings.

Attributes:
project_id : str

the id of the project the feature belongs to

name : str

the name of the feature

feature_type : str

the type of the feature, e.g. ‘Categorical’, ‘Text’

importance : float or None

numeric measure of the strength of relationship between the feature and target (independent of any model or other features); may be None for non-modeling features such as partition columns

low_information : bool

whether a feature is considered too uninformative for modeling (e.g. because it has too few values)

unique_count : int

number of unique values

na_count : int or None

number of missing values

date_format : str or None

For Date features, the date format string for how this feature was interpreted, compatible with https://docs.python.org/2/library/time.html#time.strftime . For other feature types, None.

min : str, int, float, or None

The minimum value of the source data in the EDA sample

max : str, int, float, or None

The maximum value of the source data in the EDA sample

mean : str, int, or, float

The arithmetic mean of the source data in the EDA sample

median : str, int, float, or None

The median of the source data in the EDA sample

std_dev : str, int, float, or None

The standard deviation of the source data in the EDA sample

parent_feature_names : list of str

A list of the names of input features used to derive this modeling feature. In cases where the input features and modeling features are the same, this will simply contain the feature’s name. Note that if a derived feature was used to create this modeling feature, the values here will not necessarily correspond to the features that must be supplied at prediction time.

classmethod get(project_id, feature_name)

Retrieve a single modeling feature

Parameters:
project_id : str

The ID of the project the feature is associated with.

feature_name : str

The name of the feature to retrieve

Returns:
feature : ModelingFeature

The requested feature

class datarobot.models.FeatureHistogram(plot)

A histogram plot data for a specific feature

New in version v2.14.

Histogram is a popular way of visual representation of feature values distribution. Here histogram is represented as an ordered collection of bins. For categorical features every bin represents exactly one of feature values and the count in that bin is the number of occurrences of that value. For numeric features every bin represents a range of values (low end inclusive, high end exclusive) and the count in the bin is the total number of occurrences of all values in this range. In addition, each bin may contain a target feature average for values in that bin (see target description below).

Notes

HistogramBin contains:

  • label : (str) for categorical features: the value of the feature, for numeric: the low end of bin range, so that the difference between two consecutive bin labels is the length of the bin
  • count : (int or float) number of values in this bin’s range If project uses weights, the value is equal to the sum of weights of all feature values in bin’s range
  • target : (float or None) Average of the target feature values for the bin. Present only for informative features if project target has already been selected and AIM processing has finished. For multiclass projects the value is always null.
Attributes:
plot : list

a list of dictionaries with a schema described as HistogramBin

classmethod get(project_id, feature_name, bin_limit=None)

Retrieve a single feature histogram

Parameters:
project_id : str

The ID of the project the feature is associated with.

feature_name : str

The name of the feature to retrieve

bin_limit : int or None

Desired max number of histogram bins. If omitted, by default endpoint will use 60.

Returns:
featureHistogram : FeatureHistogram

The queried instance with plot attribute in it.

Feature List

class datarobot.models.Featurelist(id=None, name=None, features=None, project_id=None, created=None, is_user_created=None, num_models=None, description=None)

A set of features used in modeling

Attributes:
id : str

the id of the featurelist

name : str

the name of the featurelist

features : list of str

the names of all the Features in the featurelist

project_id : str

the project the featurelist belongs to

created : datetime.datetime

(New in version v2.13) when the featurelist was created

is_user_created : bool

(New in version v2.13) whether the featurelist was created by a user or by DataRobot automation

num_models : int

(New in version v2.13) the number of models currently using this featurelist. A model is considered to use a featurelist if it is used to train the model or as a monotonic constraint featurelist, or if the model is a blender with at least one component model using the featurelist.

description : basestring

(New in version v2.13) the description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.

classmethod get(project_id, featurelist_id)

Retrieve a known feature list

Parameters:
project_id : str

The id of the project the featurelist is associated with

featurelist_id : str

The ID of the featurelist to retrieve

Returns:
featurelist : Featurelist

The queried instance

delete(dry_run=False, delete_dependencies=False)

Delete a featurelist, and any models and jobs using it

All models using a featurelist, whether as the training featurelist or as a monotonic constraint featurelist, will also be deleted when the deletion is executed and any queued or running jobs using it will be cancelled. Similarly, predictions made on these models will also be deleted. All the entities that are to be deleted with a featurelist are described as “dependencies” of it. To preview the results of deleting a featurelist, call delete with dry_run=True

When deleting a featurelist with dependencies, users must specify delete_dependencies=True to confirm they want to delete the featurelist and all its dependencies. Without that option, only featurelists with no dependencies may be successfully deleted and others will error.

Featurelists configured into the project as a default featurelist or as a default monotonic constraint featurelist cannot be deleted.

Featurelists used in a model deployment cannot be deleted until the model deployment is deleted.

Parameters:
dry_run : bool, optional

specify True to preview the result of deleting the featurelist, instead of actually deleting it.

delete_dependencies : bool, optional

specify True to successfully delete featurelists with dependencies; if left False by default, featurelists without dependencies can be successfully deleted and those with dependencies will error upon attempting to delete them.

Returns:
result : dict
A dictionary describing the result of deleting the featurelist, with the following keys
  • dry_run : bool, whether the deletion was a dry run or an actual deletion
  • can_delete : bool, whether the featurelist can actually be deleted
  • deletion_blocked_reason : str, why the featurelist can’t be deleted (if it can’t)
  • num_affected_models : int, the number of models using this featurelist
  • num_affected_jobs : int, the number of jobs using this featurelist
update(name=None, description=None)

Update the name or description of an existing featurelist

Note that only user-created featurelists can be renamed, and that names must not conflict with names used by other featurelists.

Parameters:
name : str, optional

the new name for the featurelist

description : str, optional

the new description for the featurelist

class datarobot.models.ModelingFeaturelist(id=None, name=None, features=None, project_id=None, created=None, is_user_created=None, num_models=None, description=None)

A set of features that can be used to build a model

In time series projects, a new set of modeling features is created after setting the partitioning options. These features are automatically derived from those in the project’s dataset and are the features used for modeling. Modeling features are only accessible once the target and partitioning options have been set. In projects that don’t use time series modeling, once the target has been set, ModelingFeaturelists and Featurelists will behave the same.

For more information about input and modeling features, see the time series documentation.

Attributes:
id : str

the id of the modeling featurelist

project_id : str

the id of the project the modeling featurelist belongs to

name : str

the name of the modeling featurelist

features : list of str

a list of the names of features included in this modeling featurelist

created : datetime.datetime

(New in version v2.13) when the featurelist was created

is_user_created : bool

(New in version v2.13) whether the featurelist was created by a user or by DataRobot automation

num_models : int

(New in version v2.13) the number of models currently using this featurelist. A model is considered to use a featurelist if it is used to train the model or as a monotonic constraint featurelist, or if the model is a blender with at least one component model using the featurelist.

description : basestring

(New in version v2.13) the description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.

classmethod get(project_id, featurelist_id)

Retrieve a modeling featurelist

Modeling featurelists can only be retrieved once the target and partitioning options have been set.

Parameters:
project_id : str

the id of the project the modeling featurelist belongs to

featurelist_id : str

the id of the modeling featurelist to retrieve

Returns:
featurelist : ModelingFeaturelist

the specified featurelist

delete(dry_run=False, delete_dependencies=False)

Delete a featurelist, and any models and jobs using it

All models using a featurelist, whether as the training featurelist or as a monotonic constraint featurelist, will also be deleted when the deletion is executed and any queued or running jobs using it will be cancelled. Similarly, predictions made on these models will also be deleted. All the entities that are to be deleted with a featurelist are described as “dependencies” of it. To preview the results of deleting a featurelist, call delete with dry_run=True

When deleting a featurelist with dependencies, users must specify delete_dependencies=True to confirm they want to delete the featurelist and all its dependencies. Without that option, only featurelists with no dependencies may be successfully deleted and others will error.

Featurelists configured into the project as a default featurelist or as a default monotonic constraint featurelist cannot be deleted.

Featurelists used in a model deployment cannot be deleted until the model deployment is deleted.

Parameters:
dry_run : bool, optional

specify True to preview the result of deleting the featurelist, instead of actually deleting it.

delete_dependencies : bool, optional

specify True to successfully delete featurelists with dependencies; if left False by default, featurelists without dependencies can be successfully deleted and those with dependencies will error upon attempting to delete them.

Returns:
result : dict
A dictionary describing the result of deleting the featurelist, with the following keys
  • dry_run : bool, whether the deletion was a dry run or an actual deletion
  • can_delete : bool, whether the featurelist can actually be deleted
  • deletion_blocked_reason : str, why the featurelist can’t be deleted (if it can’t)
  • num_affected_models : int, the number of models using this featurelist
  • num_affected_jobs : int, the number of jobs using this featurelist
update(name=None, description=None)

Update the name or description of an existing featurelist

Note that only user-created featurelists can be renamed, and that names must not conflict with names used by other featurelists.

Parameters:
name : str, optional

the new name for the featurelist

description : str, optional

the new description for the featurelist

Job

class datarobot.models.Job(data, completed_resource_url=None)

Tracks asynchronous work being done within a project

Attributes:
id : int

the id of the job

project_id : str

the id of the project the job belongs to

status : str

the status of the job - will be one of datarobot.enums.QUEUE_STATUS

job_type : str

what kind of work the job is doing - will be one of datarobot.enums.JOB_TYPE

is_blocked : bool

if true, the job is blocked (cannot be executed) until its dependencies are resolved

classmethod get(project_id, job_id)

Fetches one job.

Parameters:
project_id : str

The identifier of the project in which the job resides

job_id : str

The job id

Returns:
job : Job

The job

Raises:
AsyncFailureError

Querying this resource gave a status code other than 200 or 303

cancel()

Cancel this job. If this job has not finished running, it will be removed and canceled.

get_result()
Returns:
result : object
Return type depends on the job type:
  • for model jobs, a Model is returned
  • for predict jobs, a pandas.DataFrame (with predictions) is returned
  • for featureImpact jobs, a list of dicts (see Model.get_feature_impact for more detail)
  • for primeRulesets jobs, a list of Rulesets
  • for primeModel jobs, a PrimeModel
  • for primeDownloadValidation jobs, a PrimeFile
  • for reasonCodesInitialization jobs, a ReasonCodesInitialization
  • for reasonCodes jobs, a ReasonCodes
  • for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
  • for predictionExplanations jobs, a PredictionExplanations
Raises:
JobNotFinished

If the job is not finished, the result is not available.

AsyncProcessUnsuccessfulError

If the job errored or was aborted

get_result_when_complete(max_wait=600)
Parameters:
max_wait : int, optional

How long to wait for the job to finish.

Returns:
result: object

Return type is the same as would be returned by Job.get_result.

Raises:
AsyncTimeoutError

If the job does not finish in time

AsyncProcessUnsuccessfulError

If the job errored or was aborted

refresh()

Update this object with the latest job data from the server.

wait_for_completion(max_wait=600)

Waits for job to complete.

Parameters:
max_wait : int, optional

How long to wait for the job to finish.

class datarobot.models.TrainingPredictionsJob(data, model_id, data_subset, **kwargs)
classmethod get(project_id, job_id, model_id=None, data_subset=None)

Fetches one training predictions job.

The resulting TrainingPredictions object will be annotated with model_id and data_subset.

Parameters:
project_id : str

The identifier of the project in which the job resides

job_id : str

The job id

model_id : str

The identifier of the model used for computing training predictions

data_subset : dr.enums.DATA_SUBSET, optional

Data subset used for computing training predictions

Returns:
job : TrainingPredictionsJob

The job

refresh()

Update this object with the latest job data from the server.

cancel()

Cancel this job. If this job has not finished running, it will be removed and canceled.

get_result()
Returns:
result : object
Return type depends on the job type:
  • for model jobs, a Model is returned
  • for predict jobs, a pandas.DataFrame (with predictions) is returned
  • for featureImpact jobs, a list of dicts (see Model.get_feature_impact for more detail)
  • for primeRulesets jobs, a list of Rulesets
  • for primeModel jobs, a PrimeModel
  • for primeDownloadValidation jobs, a PrimeFile
  • for reasonCodesInitialization jobs, a ReasonCodesInitialization
  • for reasonCodes jobs, a ReasonCodes
  • for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
  • for predictionExplanations jobs, a PredictionExplanations
Raises:
JobNotFinished

If the job is not finished, the result is not available.

AsyncProcessUnsuccessfulError

If the job errored or was aborted

get_result_when_complete(max_wait=600)
Parameters:
max_wait : int, optional

How long to wait for the job to finish.

Returns:
result: object

Return type is the same as would be returned by Job.get_result.

Raises:
AsyncTimeoutError

If the job does not finish in time

AsyncProcessUnsuccessfulError

If the job errored or was aborted

wait_for_completion(max_wait=600)

Waits for job to complete.

Parameters:
max_wait : int, optional

How long to wait for the job to finish.

Lift Chart

class datarobot.models.lift_chart.LiftChart(source, bins, source_model_id)

Lift chart data for model.

Notes

LiftChartBin is a dict containing the following:

  • actual (float) Sum of actual target values in bin
  • predicted (float) Sum of predicted target values in bin
  • bin_weight (float) The weight of the bin. For weighted projects, it is the sum of the weights of the rows in the bin. For unweighted projects, it is the number of rows in the bin.
Attributes:
source : str

Lift chart data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.

bins : list of dict

List of dicts with schema described as LiftChartBin above.

source_model_id : str

ID of the model this lift chart represents; in some cases, insights from the parent of a frozen model may be used

Missing Values Report

class datarobot.models.missing_report.MissingValuesReport(missing_values_report)

Missing values report for model, contains list of reports per feature sorted by missing count in descending order.

Notes

Report per feature contains:

  • feature : feature name.
  • type : feature type – ‘Numeric’ or ‘Categorical’.
  • missing_count : missing values count in training data.
  • missing_percentage : missing values percentage in training data.
  • tasks : list of information per each task, which was applied to feature.

task information contains:

  • id : a number of task in the blueprint diagram.
  • name : task name.
  • descriptions : human readable aggregated information about how the task handles missing values. The following descriptions may be present: what value is imputed for missing values, whether the feature being missing is treated as a feature by the task, whether missing values are treated as infrequent values, whether infrequent values are treated as missing values, and whether missing values are ignored.
classmethod get(project_id, model_id)

Retrieve a missing report.

Parameters:
project_id : str

The project’s id.

model_id : str

The model’s id.

Returns:
MissingValuesReport

The queried missing report.

Models

Model

class datarobot.models.Model(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, project=None, data=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None)

A model trained on a project’s dataset capable of making predictions

Attributes:
id : str

the id of the model

project_id : str

the id of the project the model belongs to

processes : list of str

the processes used by the model

featurelist_name : str

the name of the featurelist used by the model

featurelist_id : str

the id of the featurelist used by the model

sample_pct : float or None

the percentage of the project dataset used in training the model. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date and training_end_date instead.

training_row_count : int or None

the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.

training_duration : str or None

only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

training_start_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

training_end_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

model_type : str

what model this is, e.g. ‘Nystroem Kernel SVM Regressor’

model_category : str

what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models

is_frozen : bool

whether this model is a frozen model

blueprint_id : str

the id of the blueprint used in this model

metrics : dict

a mapping from each metric to the model’s scores for that metric

monotonic_increasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

monotonic_decreasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

supports_monotonic_constraints : bool

optinonal, whether this model supports enforcing monotonic constraints

is_starred : bool

whether this model marked as starred

prediction_threshold : float

for binary classification projects, the threshold used for predictions

prediction_threshold_read_only : bool

indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

classmethod get(project, model_id)

Retrieve a specific model.

Parameters:
project : str

The project’s id.

model_id : str

The model_id of the leaderboard item to retrieve.

Returns:
model : Model

The queried instance.

Raises:
ValueError

passed project parameter value is of not supported type

classmethod fetch_resource_data(url, join_endpoint=True)

(Deprecated.) Used to acquire model data directly from its url.

Consider using get instead, as this is a convenience function used for development of datarobot

Parameters:
url : string

The resource we are acquiring

join_endpoint : boolean, optional

Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint

Returns:
model_data : dict

The queried model’s data

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:
features : list of str

The names of the features used in the model.

get_supported_capabilities()

Retrieves a summary of the capabilities supported by a model.

New in version v2.14.

Returns:
supportsBlending: bool

whether the model supports blending

supportsMonotonicConstraints: bool

whether the model supports monotonic constraints

hasWordCloud: bool

whether the model has word cloud data available

eligibleForPrime: bool

whether the model is eligible for Prime

hasParameters: bool

whether the model has parameters that can be retrieved

delete()

Delete a model from the project’s leaderboard.

Returns:
url : str

Permanent static hyperlink to this model at leaderboard.

open_model_browser()

Opens model at project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Note

For datetime partitioned projects, use train_datetime instead.

Parameters:
sample_pct : float, optional

The amount of data to use for training, as a percentage of the project dataset from 0 to 100.

featurelist_id : str, optional

The identifier of the featurelist to use. If not defined, the featurelist of this model is used.

scoring_type : str, optional

Either SCORING_TYPE.validation or SCORING_TYPE.cross_validation. SCORING_TYPE.validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, SCORING_TYPE.cross_validation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.

training_row_count : int, optional

The number of rows to use to train the requested model.

monotonic_increasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:
model_job_id : str

id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Examples

project = Project.get('p-id')
model = Model.get('p-id', 'l-id')
model_job_id = model.train(training_row_count=project.max_train_rows)
train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None)

Train this model on a different featurelist or amount of data

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

Parameters:
featurelist_id : str, optional

the featurelist to use to train the model. If not specified, the featurelist of this model is used.

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:
job : ModelJob

the created job to build the model

request_predictions(dataset_id, include_prediction_intervals=None, prediction_intervals_size=None)

Request predictions against a previously uploaded dataset

Parameters:
dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

include_prediction_intervals : bool, optional

(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.

prediction_intervals_size : int, optional

(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).

Returns:
job : PredictJob

The job computing the predictions

get_feature_impact()

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in additional, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:
feature_impacts : list of dict

The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:
ClientError (404)

If the feature impacts have not been computed.

request_feature_impact()

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Returns:
job : Job

A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature impacts have already been requested.

get_or_request_feature_impact(max_wait=600)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:
max_wait : int, optional

The maximum time to wait for a requested feature impact job to complete before erroring

Returns:
feature_impacts : list of dict

The feature impact data. See get_feature_impact for the exact schema.

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:
prime_eligibility : dict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:
job : Job

the job generating the rulesets

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:
rulesets : list of Ruleset
download_export(filepath)

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters:
filepath : str

The path at which to save the exported model file.

request_transferable_export()

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Examples

model = datarobot.Model.get('p-id', 'l-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
request_frozen_model(sample_pct=None, training_row_count=None)

Train a new frozen model with parameters from this model

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:
sample_pct : float

optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.

training_row_count : int

(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns:
model_job : ModelJob

the modeling job training a frozen model

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None)

Train a new frozen model with parameters from this model

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

Parameters:
training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

training_start_date : datetime.datetime, optional

the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

training_end_date : datetime.datetime, optional

the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:
model_job : ModelJob

the modeling job training a frozen model

get_parameters()

Retrieve model parameters.

Returns:
ModelParameters

Model parameters for this model.

get_lift_chart(source, fallback_to_parent_insights=False)

Retrieve model lift chart for the specified source.

Parameters:
source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
LiftChart

Model lift chart data

Raises:
ClientError

If the insight is not available for this model

get_all_lift_charts(fallback_to_parent_insights=False)

Retrieve a list of all lift charts available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of LiftChart

Data for all available model lift charts.

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:
ParetoFront

Model ParetoFront data

get_confusion_chart(source, fallback_to_parent_insights=False)

Retrieve model’s confusion chart for the specified source.

Parameters:
source : str

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
ConfusionChart

Model ConfusionChart data

Raises:
ClientError

If the insight is not available for this model

get_all_confusion_charts(fallback_to_parent_insights=False)

Retrieve a list of all confusion charts available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of ConfusionChart

Data for all available confusion charts for model.

get_roc_curve(source, fallback_to_parent_insights=False)

Retrieve model ROC curve for the specified source.

Parameters:
source : str

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns:
RocCurve

Model ROC curve data

Raises:
ClientError

If the insight is not available for this model

get_all_roc_curves(fallback_to_parent_insights=False)

Retrieve a list of all ROC curves available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of RocCurve

Data for all available model ROC curves.

get_word_cloud(exclude_stop_words=False)

Retrieve a word cloud data for the model.

Parameters:
exclude_stop_words : bool, optional

Set to True if you want stopwords filtered out of response.

Returns:
WordCloud

Word cloud data for the model.

download_scoring_code(file_name, source_code=False)

Download scoring code JAR.

Parameters:
file_name : str

File path where scoring code will be saved.

source_code : bool, optional

Set to True to download source code archive. It will not be executable.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:
list of BlueprintTaskDocument

All documents available for the model.

get_model_blueprint_chart()

Retrieve a model blueprint chart that can be used to understand data flow in blueprint.

Returns:
ModelBlueprintChart

The queried model blueprint chart.

get_missing_report_info()

Retrieve a model missing data report on training data that can be used to understand missing values treatment in a model. Report consists of missing values reports for features which took part in modelling and are numeric or categorical.

Returns:
An iterable of MissingReportPerFeature

The queried model missing report, sorted by missing count (DESCENDING order).

request_training_predictions(data_subset)

Start a job to build training predictions

Parameters:
data_subset : str

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
    models in datetime partitioned projects
  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
    all data except training set. Not valid for models in datetime partitioned projects
  • dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
  • dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
    the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
Returns:
Job

an instance of created async job

cross_validate()

Run Cross Validation on this model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:
ModelJob

The created job to build the model

get_cross_validation_scores(partition=None, metric=None)

Returns a dictionary keyed by metric showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:
partition : float

optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value.

metric: unicode

optional name of the metric to filter to resulting cross validation scores by

Returns:
cross_validation_scores: dict

A dictionary keyed by metric showing cross validation scores per partition.

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Parameters:
params : dict

Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

description : unicode

Human-readable string describing the newly advanced-tuned model

Returns:
ModelJob

The created job to build the model

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Returns:
dict

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuningDescription and tuningParameters.

tuningDescription an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuningParameters is a list of a dicts, each has the following keys

  • parameterName : (unicode) name of the parameter (unique per task, see below)
  • parameterId : (unicode) opaque ID string uniquely identifying parameter
  • defaultValue : (*) default value of the parameter for the blueprint
  • currentValue : (*) value of the parameter that was used for this model
  • taskName : (unicode) name of the task that this parameter belongs to
  • constraints: (dict) see the notes below

Notes

The type of defaultValue and currentValue is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "length": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
  • ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
  • unicode: The parameter may be any Python unicode object.
  • int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
  • float: The value may be an object of type float within the specified range (inclusive).
  • intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Returns:
AdvancedTuningSession

Session for setting up and running Advanced Tuning on a model

star_model()

Mark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

unstar_model()

Unmark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

set_prediction_threshold(threshold)

Set a custom prediction threshold for the model

May not be used once prediction_threshold_read_only is True for this model.

Parameters:
threshold : float

only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

PrimeModel

class datarobot.models.PrimeModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, parent_model_id=None, ruleset_id=None, rule_count=None, score=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None)

A DataRobot Prime model approximating a parent model with downloadable code

Attributes:
id : str

the id of the model

project_id : str

the id of the project the model belongs to

processes : list of str

the processes used by the model

featurelist_name : str

the name of the featurelist used by the model

featurelist_id : str

the id of the featurelist used by the model

sample_pct : float

the percentage of the project dataset used in training the model

training_row_count : int or None

the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.

training_duration : str or None

only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

training_start_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

training_end_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

model_type : str

what model this is, e.g. ‘DataRobot Prime’

model_category : str

what kind of model this is - always ‘prime’ for DataRobot Prime models

is_frozen : bool

whether this model is a frozen model

blueprint_id : str

the id of the blueprint used in this model

metrics : dict

a mapping from each metric to the model’s scores for that metric

ruleset : Ruleset

the ruleset used in the Prime model

parent_model_id : str

the id of the model that this Prime model approximates

monotonic_increasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

monotonic_decreasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

supports_monotonic_constraints : bool

optional, whether this model supports enforcing monotonic constraints

is_starred : bool

whether this model is marked as starred

prediction_threshold : float

for binary classification projects, the threshold used for predictions

prediction_threshold_read_only : bool

indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

classmethod get(project_id, model_id)

Retrieve a specific prime model.

Parameters:
project_id : str

The id of the project the prime model belongs to

model_id : str

The model_id of the prime model to retrieve.

Returns:
model : PrimeModel

The queried instance.

request_download_validation(language)

Prep and validate the downloadable code for the ruleset associated with this model

Parameters:
language : str

the language the code should be downloaded in - see datarobot.enums.PRIME_LANGUAGE for available languages

Returns:
job : Job

A job tracking the code preparation and validation

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Parameters:
params : dict

Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

description : unicode

Human-readable string describing the newly advanced-tuned model

Returns:
ModelJob

The created job to build the model

cross_validate()

Run Cross Validation on this model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:
ModelJob

The created job to build the model

delete()

Delete a model from the project’s leaderboard.

download_export(filepath)

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters:
filepath : str

The path at which to save the exported model file.

download_scoring_code(file_name, source_code=False)

Download scoring code JAR.

Parameters:
file_name : str

File path where scoring code will be saved.

source_code : bool, optional

Set to True to download source code archive. It will not be executable.

classmethod fetch_resource_data(url, join_endpoint=True)

(Deprecated.) Used to acquire model data directly from its url.

Consider using get instead, as this is a convenience function used for development of datarobot

Parameters:
url : string

The resource we are acquiring

join_endpoint : boolean, optional

Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint

Returns:
model_data : dict

The queried model’s data

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Returns:
dict

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuningDescription and tuningParameters.

tuningDescription an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuningParameters is a list of a dicts, each has the following keys

  • parameterName : (unicode) name of the parameter (unique per task, see below)
  • parameterId : (unicode) opaque ID string uniquely identifying parameter
  • defaultValue : (*) default value of the parameter for the blueprint
  • currentValue : (*) value of the parameter that was used for this model
  • taskName : (unicode) name of the task that this parameter belongs to
  • constraints: (dict) see the notes below

Notes

The type of defaultValue and currentValue is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "length": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
  • ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
  • unicode: The parameter may be any Python unicode object.
  • int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
  • float: The value may be an object of type float within the specified range (inclusive).
  • intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)

Retrieve a list of all confusion charts available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of ConfusionChart

Data for all available confusion charts for model.

get_all_lift_charts(fallback_to_parent_insights=False)

Retrieve a list of all lift charts available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of LiftChart

Data for all available model lift charts.

get_all_roc_curves(fallback_to_parent_insights=False)

Retrieve a list of all ROC curves available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of RocCurve

Data for all available model ROC curves.

get_confusion_chart(source, fallback_to_parent_insights=False)

Retrieve model’s confusion chart for the specified source.

Parameters:
source : str

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
ConfusionChart

Model ConfusionChart data

Raises:
ClientError

If the insight is not available for this model

get_cross_validation_scores(partition=None, metric=None)

Returns a dictionary keyed by metric showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:
partition : float

optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value.

metric: unicode

optional name of the metric to filter to resulting cross validation scores by

Returns:
cross_validation_scores: dict

A dictionary keyed by metric showing cross validation scores per partition.

get_feature_impact()

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in additional, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:
feature_impacts : list of dict

The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:
ClientError (404)

If the feature impacts have not been computed.

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:
features : list of str

The names of the features used in the model.

Returns:
url : str

Permanent static hyperlink to this model at leaderboard.

get_lift_chart(source, fallback_to_parent_insights=False)

Retrieve model lift chart for the specified source.

Parameters:
source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
LiftChart

Model lift chart data

Raises:
ClientError

If the insight is not available for this model

get_missing_report_info()

Retrieve a model missing data report on training data that can be used to understand missing values treatment in a model. Report consists of missing values reports for features which took part in modelling and are numeric or categorical.

Returns:
An iterable of MissingReportPerFeature

The queried model missing report, sorted by missing count (DESCENDING order).

get_model_blueprint_chart()

Retrieve a model blueprint chart that can be used to understand data flow in blueprint.

Returns:
ModelBlueprintChart

The queried model blueprint chart.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:
list of BlueprintTaskDocument

All documents available for the model.

get_or_request_feature_impact(max_wait=600)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:
max_wait : int, optional

The maximum time to wait for a requested feature impact job to complete before erroring

Returns:
feature_impacts : list of dict

The feature impact data. See get_feature_impact for the exact schema.

get_parameters()

Retrieve model parameters.

Returns:
ModelParameters

Model parameters for this model.

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:
ParetoFront

Model ParetoFront data

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:
prime_eligibility : dict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_roc_curve(source, fallback_to_parent_insights=False)

Retrieve model ROC curve for the specified source.

Parameters:
source : str

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns:
RocCurve

Model ROC curve data

Raises:
ClientError

If the insight is not available for this model

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:
rulesets : list of Ruleset
get_supported_capabilities()

Retrieves a summary of the capabilities supported by a model.

New in version v2.14.

Returns:
supportsBlending: bool

whether the model supports blending

supportsMonotonicConstraints: bool

whether the model supports monotonic constraints

hasWordCloud: bool

whether the model has word cloud data available

eligibleForPrime: bool

whether the model is eligible for Prime

hasParameters: bool

whether the model has parameters that can be retrieved

get_word_cloud(exclude_stop_words=False)

Retrieve a word cloud data for the model.

Parameters:
exclude_stop_words : bool, optional

Set to True if you want stopwords filtered out of response.

Returns:
WordCloud

Word cloud data for the model.

open_model_browser()

Opens model at project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

request_feature_impact()

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Returns:
job : Job

A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature impacts have already been requested.

request_predictions(dataset_id, include_prediction_intervals=None, prediction_intervals_size=None)

Request predictions against a previously uploaded dataset

Parameters:
dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

include_prediction_intervals : bool, optional

(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.

prediction_intervals_size : int, optional

(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).

Returns:
job : PredictJob

The job computing the predictions

request_training_predictions(data_subset)

Start a job to build training predictions

Parameters:
data_subset : str

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
    models in datetime partitioned projects
  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
    all data except training set. Not valid for models in datetime partitioned projects
  • dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
  • dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
    the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
Returns:
Job

an instance of created async job

request_transferable_export()

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Examples

model = datarobot.Model.get('p-id', 'l-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
set_prediction_threshold(threshold)

Set a custom prediction threshold for the model

May not be used once prediction_threshold_read_only is True for this model.

Parameters:
threshold : float

only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Returns:
AdvancedTuningSession

Session for setting up and running Advanced Tuning on a model

unstar_model()

Unmark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

BlenderModel

class datarobot.models.BlenderModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, model_ids=None, blender_method=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None)

Blender model that combines prediction results from other models.

Attributes:
id : str

the id of the model

project_id : str

the id of the project the model belongs to

processes : list of str

the processes used by the model

featurelist_name : str

the name of the featurelist used by the model

featurelist_id : str

the id of the featurelist used by the model

sample_pct : float

the percentage of the project dataset used in training the model

training_row_count : int or None

the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.

training_duration : str or None

only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

training_start_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

training_end_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

model_type : str

what model this is, e.g. ‘DataRobot Prime’

model_category : str

what kind of model this is - always ‘prime’ for DataRobot Prime models

is_frozen : bool

whether this model is a frozen model

blueprint_id : str

the id of the blueprint used in this model

metrics : dict

a mapping from each metric to the model’s scores for that metric

model_ids : list of str

List of model ids used in blender

blender_method : str

Method used to blend results from underlying models

monotonic_increasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

monotonic_decreasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

supports_monotonic_constraints : bool

optional, whether this model supports enforcing monotonic constraints

is_starred : bool

whether this model marked as starred

prediction_threshold : float

for binary classification projects, the threshold used for predictions

prediction_threshold_read_only : bool

indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

classmethod get(project_id, model_id)

Retrieve a specific blender.

Parameters:
project_id : str

The project’s id.

model_id : str

The model_id of the leaderboard item to retrieve.

Returns:
model : BlenderModel

The queried instance.

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Parameters:
params : dict

Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

description : unicode

Human-readable string describing the newly advanced-tuned model

Returns:
ModelJob

The created job to build the model

cross_validate()

Run Cross Validation on this model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:
ModelJob

The created job to build the model

delete()

Delete a model from the project’s leaderboard.

download_export(filepath)

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters:
filepath : str

The path at which to save the exported model file.

download_scoring_code(file_name, source_code=False)

Download scoring code JAR.

Parameters:
file_name : str

File path where scoring code will be saved.

source_code : bool, optional

Set to True to download source code archive. It will not be executable.

classmethod fetch_resource_data(url, join_endpoint=True)

(Deprecated.) Used to acquire model data directly from its url.

Consider using get instead, as this is a convenience function used for development of datarobot

Parameters:
url : string

The resource we are acquiring

join_endpoint : boolean, optional

Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint

Returns:
model_data : dict

The queried model’s data

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Returns:
dict

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuningDescription and tuningParameters.

tuningDescription an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuningParameters is a list of a dicts, each has the following keys

  • parameterName : (unicode) name of the parameter (unique per task, see below)
  • parameterId : (unicode) opaque ID string uniquely identifying parameter
  • defaultValue : (*) default value of the parameter for the blueprint
  • currentValue : (*) value of the parameter that was used for this model
  • taskName : (unicode) name of the task that this parameter belongs to
  • constraints: (dict) see the notes below

Notes

The type of defaultValue and currentValue is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "length": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
  • ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
  • unicode: The parameter may be any Python unicode object.
  • int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
  • float: The value may be an object of type float within the specified range (inclusive).
  • intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)

Retrieve a list of all confusion charts available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of ConfusionChart

Data for all available confusion charts for model.

get_all_lift_charts(fallback_to_parent_insights=False)

Retrieve a list of all lift charts available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of LiftChart

Data for all available model lift charts.

get_all_roc_curves(fallback_to_parent_insights=False)

Retrieve a list of all ROC curves available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of RocCurve

Data for all available model ROC curves.

get_confusion_chart(source, fallback_to_parent_insights=False)

Retrieve model’s confusion chart for the specified source.

Parameters:
source : str

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
ConfusionChart

Model ConfusionChart data

Raises:
ClientError

If the insight is not available for this model

get_cross_validation_scores(partition=None, metric=None)

Returns a dictionary keyed by metric showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:
partition : float

optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value.

metric: unicode

optional name of the metric to filter to resulting cross validation scores by

Returns:
cross_validation_scores: dict

A dictionary keyed by metric showing cross validation scores per partition.

get_feature_impact()

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in additional, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:
feature_impacts : list of dict

The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:
ClientError (404)

If the feature impacts have not been computed.

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:
features : list of str

The names of the features used in the model.

Returns:
url : str

Permanent static hyperlink to this model at leaderboard.

get_lift_chart(source, fallback_to_parent_insights=False)

Retrieve model lift chart for the specified source.

Parameters:
source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
LiftChart

Model lift chart data

Raises:
ClientError

If the insight is not available for this model

get_missing_report_info()

Retrieve a model missing data report on training data that can be used to understand missing values treatment in a model. Report consists of missing values reports for features which took part in modelling and are numeric or categorical.

Returns:
An iterable of MissingReportPerFeature

The queried model missing report, sorted by missing count (DESCENDING order).

get_model_blueprint_chart()

Retrieve a model blueprint chart that can be used to understand data flow in blueprint.

Returns:
ModelBlueprintChart

The queried model blueprint chart.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:
list of BlueprintTaskDocument

All documents available for the model.

get_or_request_feature_impact(max_wait=600)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:
max_wait : int, optional

The maximum time to wait for a requested feature impact job to complete before erroring

Returns:
feature_impacts : list of dict

The feature impact data. See get_feature_impact for the exact schema.

get_parameters()

Retrieve model parameters.

Returns:
ModelParameters

Model parameters for this model.

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:
ParetoFront

Model ParetoFront data

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:
prime_eligibility : dict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_roc_curve(source, fallback_to_parent_insights=False)

Retrieve model ROC curve for the specified source.

Parameters:
source : str

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns:
RocCurve

Model ROC curve data

Raises:
ClientError

If the insight is not available for this model

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:
rulesets : list of Ruleset
get_supported_capabilities()

Retrieves a summary of the capabilities supported by a model.

New in version v2.14.

Returns:
supportsBlending: bool

whether the model supports blending

supportsMonotonicConstraints: bool

whether the model supports monotonic constraints

hasWordCloud: bool

whether the model has word cloud data available

eligibleForPrime: bool

whether the model is eligible for Prime

hasParameters: bool

whether the model has parameters that can be retrieved

get_word_cloud(exclude_stop_words=False)

Retrieve a word cloud data for the model.

Parameters:
exclude_stop_words : bool, optional

Set to True if you want stopwords filtered out of response.

Returns:
WordCloud

Word cloud data for the model.

open_model_browser()

Opens model at project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:
job : Job

the job generating the rulesets

request_feature_impact()

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Returns:
job : Job

A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None)

Train a new frozen model with parameters from this model

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

Parameters:
training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

training_start_date : datetime.datetime, optional

the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

training_end_date : datetime.datetime, optional

the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:
model_job : ModelJob

the modeling job training a frozen model

request_frozen_model(sample_pct=None, training_row_count=None)

Train a new frozen model with parameters from this model

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:
sample_pct : float

optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.

training_row_count : int

(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns:
model_job : ModelJob

the modeling job training a frozen model

request_predictions(dataset_id, include_prediction_intervals=None, prediction_intervals_size=None)

Request predictions against a previously uploaded dataset

Parameters:
dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

include_prediction_intervals : bool, optional

(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.

prediction_intervals_size : int, optional

(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).

Returns:
job : PredictJob

The job computing the predictions

request_training_predictions(data_subset)

Start a job to build training predictions

Parameters:
data_subset : str

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
    models in datetime partitioned projects
  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
    all data except training set. Not valid for models in datetime partitioned projects
  • dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
  • dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
    the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
Returns:
Job

an instance of created async job

request_transferable_export()

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Examples

model = datarobot.Model.get('p-id', 'l-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
set_prediction_threshold(threshold)

Set a custom prediction threshold for the model

May not be used once prediction_threshold_read_only is True for this model.

Parameters:
threshold : float

only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Returns:
AdvancedTuningSession

Session for setting up and running Advanced Tuning on a model

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Note

For datetime partitioned projects, use train_datetime instead.

Parameters:
sample_pct : float, optional

The amount of data to use for training, as a percentage of the project dataset from 0 to 100.

featurelist_id : str, optional

The identifier of the featurelist to use. If not defined, the featurelist of this model is used.

scoring_type : str, optional

Either SCORING_TYPE.validation or SCORING_TYPE.cross_validation. SCORING_TYPE.validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, SCORING_TYPE.cross_validation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.

training_row_count : int, optional

The number of rows to use to train the requested model.

monotonic_increasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:
model_job_id : str

id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Examples

project = Project.get('p-id')
model = Model.get('p-id', 'l-id')
model_job_id = model.train(training_row_count=project.max_train_rows)
train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None)

Train this model on a different featurelist or amount of data

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

Parameters:
featurelist_id : str, optional

the featurelist to use to train the model. If not specified, the featurelist of this model is used.

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:
job : ModelJob

the created job to build the model

unstar_model()

Unmark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

DatetimeModel

class datarobot.models.DatetimeModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, training_info=None, holdout_score=None, holdout_status=None, data_selection_method=None, backtests=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, effective_feature_derivation_window_start=None, effective_feature_derivation_window_end=None, forecast_window_start=None, forecast_window_end=None, windows_basis_unit=None)

A model from a datetime partitioned project

Only one of training_row_count, training_duration, and training_start_date and training_end_date will be specified, depending on the data_selection_method of the model. Whichever method was selected determines the amount of data used to train on when making predictions and scoring the backtests and the holdout.

Attributes:
id : str

the id of the model

project_id : str

the id of the project the model belongs to

processes : list of str

the processes used by the model

featurelist_name : str

the name of the featurelist used by the model

featurelist_id : str

the id of the featurelist used by the model

sample_pct : float

the percentage of the project dataset used in training the model

training_row_count : int or None

If specified, an int specifying the number of rows used to train the model and evaluate backtest scores.

training_duration : str or None

If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

training_start_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

training_end_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

time_window_sample_pct : int or None

An integer between 1 and 99 indicating the percentage of sampling within the training window. The points kept are determined by a random uniform sample. If not specified, no sampling was done.

model_type : str

what model this is, e.g. ‘Nystroem Kernel SVM Regressor’

model_category : str

what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models

is_frozen : bool

whether this model is a frozen model

blueprint_id : str

the id of the blueprint used in this model

metrics : dict

a mapping from each metric to the model’s scores for that metric. The keys in metrics are the different metrics used to evaluate the model, and the values are the results. The dictionaries inside of metrics will be as described here: ‘validation’, the score for a single backtest; ‘crossValidation’, always None; ‘backtesting’, the average score for all backtests if all are available and computed, or None otherwise; ‘backtestingScores’, a list of scores for all backtests where the score is None if that backtest does not have a score available; and ‘holdout’, the score for the holdout or None if the holdout is locked or the score is unavailable.

backtests : list of dict

describes what data was used to fit each backtest, the score for the project metric, and why the backtest score is unavailable if it is not provided.

data_selection_method : str

which of training_row_count, training_duration, or training_start_data and training_end_date were used to determine the data used to fit the model. One of ‘rowCount’, ‘duration’, or ‘selectedDateRange’.

training_info : dict

describes which data was used to train on when scoring the holdout and making predictions. training_info` will have the following keys: holdout_training_start_date, holdout_training_duration, holdout_training_row_count, holdout_training_end_date, prediction_training_start_date, prediction_training_duration, prediction_training_row_count, prediction_training_end_date. Start and end dates will be datetimes, durations will be duration strings, and rows will be integers.

holdout_score : float or None

the score against the holdout, if available and the holdout is unlocked, according to the project metric.

holdout_status : string or None

the status of the holdout score, e.g. “COMPLETED”, “HOLDOUT_BOUNDARIES_EXCEEDED”. Unavailable if the holdout fold was disabled in the partitioning configuration.

monotonic_increasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

monotonic_decreasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

supports_monotonic_constraints : bool

optional, whether this model supports enforcing monotonic constraints

is_starred : bool

whether this model marked as starred

prediction_threshold : float

for binary classification projects, the threshold used for predictions

prediction_threshold_read_only : bool

indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

effective_feature_derivation_window_start : int or None

(New in v2.16) For time series projects only. How many timeUnits into the past relative to the forecast point the user needs to provide history for at prediction time. This can differ from the feature_derivation_window_start set on the project due to the differencing method and period selected, or if the model is a time series native model such as ARIMA. Will be a negative integer in time series projects and None otherwise.

effective_feature_derivation_window_end : int or None

(New in v2.16) For time series projects only. How many timeUnits into the past relative to the forecast point the feature derivation window should end. Will be a non-positive integer in time series projects and None otherwise.

forecast_window_start : int or None

(New in v2.16) For time series projects only. How many timeUnits into the future relative to the forecast point the forecast window should start. Note that this field will be the same as what is shown in the project settings. Will be a non-negative integer in time series projects and None otherwise.

forecast_window_end : int or None

(New in v2.16) For time series projects only. How many timeUnits into the future relative to the forecast point the forecast window should end. Note that this field will be the same as what is shown in the project settings. Will be a non-negative integer in time series projects and None otherwise.

windows_basis_unit : str or None

(New in v2.16) For time series projects only. Indicates which unit is the basis for the feature derivation window and the forecast window. Note that this field will be the same as what is shown in the project settings. In time series projects, will be either the detected time unit or “ROW”, and None otherwise.

classmethod get(project, model_id)

Retrieve a specific datetime model

If the project does not use datetime partitioning, a ClientError will occur.

Parameters:
project : str

the id of the project the model belongs to

model_id : str

the id of the model to retrieve

Returns:
model : DatetimeModel

the model

score_backtests()

Compute the scores for all available backtests

Some backtests may be unavailable if the model is trained into their validation data.

Returns:
job : Job

a job tracking the backtest computation. When it is complete, all available backtests will have scores computed.

cross_validate()

Inherited from Model - DatetimeModels cannot request Cross Validation,

Use score_backtests instead.

get_cross_validation_scores(partition=None, metric=None)

Inherited from Model - DatetimeModels cannot request Cross Validation scores,

Use backtests instead.

request_training_predictions(data_subset)

Start a job to build training predictions

Parameters:
data_subset : str

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only
  • dr.enums.DATA_SUBSET.ALL_BACKTESTS for downloading the predictions for all
    backtest validation folds. Requires the model to have successfully scored all backtests.
Returns
——-
Job

an instance of created async job

get_series_accuracy_as_dataframe()

Retrieve the Series Accuracy for the specified model as a pandas.DataFrame.

Returns:
data

A pandas.DataFrame with the Series Accuracy for the specified model.

download_series_accuracy_as_csv(filename, encoding='utf-8')

Save the Series Accuracy for the specified model into a csv file.

Parameters:
filename : str or file object

The path or file object to save the data to.

encoding : str, optional

A string representing the encoding to use in the output csv file. Defaults to ‘utf-8’.

compute_series_accuracy()

Compute the Series Accuracy for this model

Returns:
Job

an instance of the created async job

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Parameters:
params : dict

Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

description : unicode

Human-readable string describing the newly advanced-tuned model

Returns:
ModelJob

The created job to build the model

delete()

Delete a model from the project’s leaderboard.

download_export(filepath)

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters:
filepath : str

The path at which to save the exported model file.

download_scoring_code(file_name, source_code=False)

Download scoring code JAR.

Parameters:
file_name : str

File path where scoring code will be saved.

source_code : bool, optional

Set to True to download source code archive. It will not be executable.

classmethod fetch_resource_data(url, join_endpoint=True)

(Deprecated.) Used to acquire model data directly from its url.

Consider using get instead, as this is a convenience function used for development of datarobot

Parameters:
url : string

The resource we are acquiring

join_endpoint : boolean, optional

Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint

Returns:
model_data : dict

The queried model’s data

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Returns:
dict

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuningDescription and tuningParameters.

tuningDescription an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuningParameters is a list of a dicts, each has the following keys

  • parameterName : (unicode) name of the parameter (unique per task, see below)
  • parameterId : (unicode) opaque ID string uniquely identifying parameter
  • defaultValue : (*) default value of the parameter for the blueprint
  • currentValue : (*) value of the parameter that was used for this model
  • taskName : (unicode) name of the task that this parameter belongs to
  • constraints: (dict) see the notes below

Notes

The type of defaultValue and currentValue is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "length": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
  • ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
  • unicode: The parameter may be any Python unicode object.
  • int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
  • float: The value may be an object of type float within the specified range (inclusive).
  • intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)

Retrieve a list of all confusion charts available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of ConfusionChart

Data for all available confusion charts for model.

get_all_lift_charts(fallback_to_parent_insights=False)

Retrieve a list of all lift charts available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of LiftChart

Data for all available model lift charts.

get_all_roc_curves(fallback_to_parent_insights=False)

Retrieve a list of all ROC curves available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of RocCurve

Data for all available model ROC curves.

get_confusion_chart(source, fallback_to_parent_insights=False)

Retrieve model’s confusion chart for the specified source.

Parameters:
source : str

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
ConfusionChart

Model ConfusionChart data

Raises:
ClientError

If the insight is not available for this model

get_feature_impact()

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in additional, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:
feature_impacts : list of dict

The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:
ClientError (404)

If the feature impacts have not been computed.

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:
features : list of str

The names of the features used in the model.

Returns:
url : str

Permanent static hyperlink to this model at leaderboard.

get_lift_chart(source, fallback_to_parent_insights=False)

Retrieve model lift chart for the specified source.

Parameters:
source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
LiftChart

Model lift chart data

Raises:
ClientError

If the insight is not available for this model

get_missing_report_info()

Retrieve a model missing data report on training data that can be used to understand missing values treatment in a model. Report consists of missing values reports for features which took part in modelling and are numeric or categorical.

Returns:
An iterable of MissingReportPerFeature

The queried model missing report, sorted by missing count (DESCENDING order).

get_model_blueprint_chart()

Retrieve a model blueprint chart that can be used to understand data flow in blueprint.

Returns:
ModelBlueprintChart

The queried model blueprint chart.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:
list of BlueprintTaskDocument

All documents available for the model.

get_or_request_feature_impact(max_wait=600)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:
max_wait : int, optional

The maximum time to wait for a requested feature impact job to complete before erroring

Returns:
feature_impacts : list of dict

The feature impact data. See get_feature_impact for the exact schema.

get_parameters()

Retrieve model parameters.

Returns:
ModelParameters

Model parameters for this model.

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:
ParetoFront

Model ParetoFront data

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:
prime_eligibility : dict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_roc_curve(source, fallback_to_parent_insights=False)

Retrieve model ROC curve for the specified source.

Parameters:
source : str

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns:
RocCurve

Model ROC curve data

Raises:
ClientError

If the insight is not available for this model

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:
rulesets : list of Ruleset
get_supported_capabilities()

Retrieves a summary of the capabilities supported by a model.

New in version v2.14.

Returns:
supportsBlending: bool

whether the model supports blending

supportsMonotonicConstraints: bool

whether the model supports monotonic constraints

hasWordCloud: bool

whether the model has word cloud data available

eligibleForPrime: bool

whether the model is eligible for Prime

hasParameters: bool

whether the model has parameters that can be retrieved

get_word_cloud(exclude_stop_words=False)

Retrieve a word cloud data for the model.

Parameters:
exclude_stop_words : bool, optional

Set to True if you want stopwords filtered out of response.

Returns:
WordCloud

Word cloud data for the model.

open_model_browser()

Opens model at project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:
job : Job

the job generating the rulesets

request_feature_impact()

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Returns:
job : Job

A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None)

Train a new frozen model with parameters from this model

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

Parameters:
training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

training_start_date : datetime.datetime, optional

the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

training_end_date : datetime.datetime, optional

the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:
model_job : ModelJob

the modeling job training a frozen model

request_predictions(dataset_id, include_prediction_intervals=None, prediction_intervals_size=None)

Request predictions against a previously uploaded dataset

Parameters:
dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

include_prediction_intervals : bool, optional

(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.

prediction_intervals_size : int, optional

(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).

Returns:
job : PredictJob

The job computing the predictions

request_transferable_export()

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Examples

model = datarobot.Model.get('p-id', 'l-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
set_prediction_threshold(threshold)

Set a custom prediction threshold for the model

May not be used once prediction_threshold_read_only is True for this model.

Parameters:
threshold : float

only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Returns:
AdvancedTuningSession

Session for setting up and running Advanced Tuning on a model

train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None)

Train this model on a different featurelist or amount of data

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

Parameters:
featurelist_id : str, optional

the featurelist to use to train the model. If not specified, the featurelist of this model is used.

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:
job : ModelJob

the created job to build the model

unstar_model()

Unmark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Frozen Model

class datarobot.models.FrozenModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, parent_model_id=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None)

A model tuned with parameters which are derived from another model

Attributes:
id : str

the id of the model

project_id : str

the id of the project the model belongs to

processes : list of str

the processes used by the model

featurelist_name : str

the name of the featurelist used by the model

featurelist_id : str

the id of the featurelist used by the model

sample_pct : float

the percentage of the project dataset used in training the model

training_row_count : int or None

the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.

training_duration : str or None

only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

training_start_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

training_end_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

model_type : str

what model this is, e.g. ‘Nystroem Kernel SVM Regressor’

model_category : str

what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models

is_frozen : bool

whether this model is a frozen model

parent_model_id : str

the id of the model that tuning parameters are derived from

blueprint_id : str

the id of the blueprint used in this model

metrics : dict

a mapping from each metric to the model’s scores for that metric

monotonic_increasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

monotonic_decreasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

supports_monotonic_constraints : bool

optional, whether this model supports enforcing monotonic constraints

is_starred : bool

whether this model marked as starred

prediction_threshold : float

for binary classification projects, the threshold used for predictions

prediction_threshold_read_only : bool

indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

classmethod get(project_id, model_id)

Retrieve a specific frozen model.

Parameters:
project_id : str

The project’s id.

model_id : str

The model_id of the leaderboard item to retrieve.

Returns:
model : FrozenModel

The queried instance.

Imported Model

Note

Imported Models are used in Stand Alone Scoring Engines. If you are not an administrator of such an engine, they are not relevant to you.

class datarobot.models.ImportedModel(id, imported_at=None, model_id=None, target=None, featurelist_name=None, dataset_name=None, model_name=None, project_id=None, version=None, note=None, origin_url=None, imported_by_username=None, project_name=None, created_by_username=None, created_by_id=None, imported_by_id=None, display_name=None)

Represents an imported model available for making predictions. These are only relevant for administrators of on-premise Stand Alone Scoring Engines.

ImportedModels are trained in one DataRobot application, exported as a .drmodel file, and then imported for use in a Stand Alone Scoring Engine.

Attributes:
id : str

id of the import

model_name : str

model type describing the model generated by DataRobot

display_name : str

manually specified human-readable name of the imported model

note : str

manually added node about this imported model

imported_at : datetime

the time the model was imported

imported_by_username : str

username of the user who imported the model

imported_by_id : str

id of the user who imported the model

origin_url : str

URL of the application the model was exported from

model_id : str

original id of the model prior to export

featurelist_name : str

name of the featurelist used to train the model

project_id : str

id of the project the model belonged to prior to export

project_name : str

name of the project the model belonged to prior to export

target : str

the target of the project the model belonged to prior to export

version : float

project version of the project the model belonged to

dataset_name : str

filename of the dataset used to create the project the model belonged to

created_by_username : str

username of the user who created the model prior to export

created_by_id : str

id of the user who created the model prior to export

classmethod create(path)

Import a previously exported model for predictions.

Parameters:
path : str

The path to the exported model file

classmethod get(import_id)

Retrieve imported model info

Parameters:
import_id : str

The ID of the imported model.

Returns:
imported_model : ImportedModel

The ImportedModel instance

classmethod list(limit=None, offset=None)

List the imported models.

Parameters:
limit : int

The number of records to return. The server will use a (possibly finite) default if not specified.

offset : int

The number of records to skip.

Returns:
imported_models : list[ImportedModel]
update(display_name=None, note=None)

Update the display name or note for an imported model. The ImportedModel object is updated in place.

Parameters:
display_name : str

The new display name.

note : str

The new note.

delete()

Delete this imported model.

RatingTableModel

class datarobot.models.RatingTableModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, rating_table_id=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None)

A model that has a rating table.

Attributes:
id : str

the id of the model

project_id : str

the id of the project the model belongs to

processes : list of str

the processes used by the model

featurelist_name : str

the name of the featurelist used by the model

featurelist_id : str

the id of the featurelist used by the model

sample_pct : float or None

the percentage of the project dataset used in training the model. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date and training_end_date instead.

training_row_count : int or None

the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.

training_duration : str or None

only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.

training_start_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.

training_end_date : datetime or None

only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.

model_type : str

what model this is, e.g. ‘Nystroem Kernel SVM Regressor’

model_category : str

what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models

is_frozen : bool

whether this model is a frozen model

blueprint_id : str

the id of the blueprint used in this model

metrics : dict

a mapping from each metric to the model’s scores for that metric

rating_table_id : str

the id of the rating table that belongs to this model

monotonic_increasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.

monotonic_decreasing_featurelist_id : str

optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.

supports_monotonic_constraints : bool

optional, whether this model supports enforcing monotonic constraints

is_starred : bool

whether this model marked as starred

prediction_threshold : float

for binary classification projects, the threshold used for predictions

prediction_threshold_read_only : bool

indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.

classmethod get(project_id, model_id)

Retrieve a specific rating table model

If the project does not have a rating table, a ClientError will occur.

Parameters:
project_id : str

the id of the project the model belongs to

model_id : str

the id of the model to retrieve

Returns:
model : RatingTableModel

the model

classmethod create_from_rating_table(project_id, rating_table_id)

Creates a new model from a validated rating table record. The RatingTable must not be associated with an existing model.

Parameters:
project_id : str

the id of the project the rating table belongs to

rating_table_id : str

the id of the rating table to create this model from

Returns:
job: Job

an instance of created async job

Raises:
ClientError (422)

Raised if creating model from a RatingTable that failed validation

JobAlreadyRequested

Raised if creating model from a RatingTable that is already associated with a RatingTableModel

advanced_tune(params, description=None)

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Parameters:
params : dict

Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.

description : unicode

Human-readable string describing the newly advanced-tuned model

Returns:
ModelJob

The created job to build the model

cross_validate()

Run Cross Validation on this model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns:
ModelJob

The created job to build the model

delete()

Delete a model from the project’s leaderboard.

download_export(filepath)

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters:
filepath : str

The path at which to save the exported model file.

download_scoring_code(file_name, source_code=False)

Download scoring code JAR.

Parameters:
file_name : str

File path where scoring code will be saved.

source_code : bool, optional

Set to True to download source code archive. It will not be executable.

classmethod fetch_resource_data(url, join_endpoint=True)

(Deprecated.) Used to acquire model data directly from its url.

Consider using get instead, as this is a convenience function used for development of datarobot

Parameters:
url : string

The resource we are acquiring

join_endpoint : boolean, optional

Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint

Returns:
model_data : dict

The queried model’s data

get_advanced_tuning_parameters()

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Returns:
dict

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuningDescription and tuningParameters.

tuningDescription an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuningParameters is a list of a dicts, each has the following keys

  • parameterName : (unicode) name of the parameter (unique per task, see below)
  • parameterId : (unicode) opaque ID string uniquely identifying parameter
  • defaultValue : (*) default value of the parameter for the blueprint
  • currentValue : (*) value of the parameter that was used for this model
  • taskName : (unicode) name of the task that this parameter belongs to
  • constraints: (dict) see the notes below

Notes

The type of defaultValue and currentValue is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "length": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

  • select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
  • ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
  • unicode: The parameter may be any Python unicode object.
  • int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
  • float: The value may be an object of type float within the specified range (inclusive).
  • intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

get_all_confusion_charts(fallback_to_parent_insights=False)

Retrieve a list of all confusion charts available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of ConfusionChart

Data for all available confusion charts for model.

get_all_lift_charts(fallback_to_parent_insights=False)

Retrieve a list of all lift charts available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of LiftChart

Data for all available model lift charts.

get_all_roc_curves(fallback_to_parent_insights=False)

Retrieve a list of all ROC curves available for the model.

Parameters:
fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns:
list of RocCurve

Data for all available model ROC curves.

get_confusion_chart(source, fallback_to_parent_insights=False)

Retrieve model’s confusion chart for the specified source.

Parameters:
source : str

Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
ConfusionChart

Model ConfusionChart data

Raises:
ClientError

If the insight is not available for this model

get_cross_validation_scores(partition=None, metric=None)

Returns a dictionary keyed by metric showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters:
partition : float

optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value.

metric: unicode

optional name of the metric to filter to resulting cross validation scores by

Returns:
cross_validation_scores: dict

A dictionary keyed by metric showing cross validation scores per partition.

get_feature_impact()

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in additional, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns:
feature_impacts : list of dict

The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:
ClientError (404)

If the feature impacts have not been computed.

get_features_used()

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns:
features : list of str

The names of the features used in the model.

Returns:
url : str

Permanent static hyperlink to this model at leaderboard.

get_lift_chart(source, fallback_to_parent_insights=False)

Retrieve model lift chart for the specified source.

Parameters:
source : str

Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns:
LiftChart

Model lift chart data

Raises:
ClientError

If the insight is not available for this model

get_missing_report_info()

Retrieve a model missing data report on training data that can be used to understand missing values treatment in a model. Report consists of missing values reports for features which took part in modelling and are numeric or categorical.

Returns:
An iterable of MissingReportPerFeature

The queried model missing report, sorted by missing count (DESCENDING order).

get_model_blueprint_chart()

Retrieve a model blueprint chart that can be used to understand data flow in blueprint.

Returns:
ModelBlueprintChart

The queried model blueprint chart.

get_model_blueprint_documents()

Get documentation for tasks used in this model.

Returns:
list of BlueprintTaskDocument

All documents available for the model.

get_or_request_feature_impact(max_wait=600)

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters:
max_wait : int, optional

The maximum time to wait for a requested feature impact job to complete before erroring

Returns:
feature_impacts : list of dict

The feature impact data. See get_feature_impact for the exact schema.

get_parameters()

Retrieve model parameters.

Returns:
ModelParameters

Model parameters for this model.

get_pareto_front()

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns:
ParetoFront

Model ParetoFront data

get_prime_eligibility()

Check if this model can be approximated with DataRobot Prime

Returns:
prime_eligibility : dict

a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_roc_curve(source, fallback_to_parent_insights=False)

Retrieve model ROC curve for the specified source.

Parameters:
source : str

ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.

fallback_to_parent_insights : bool

(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns:
RocCurve

Model ROC curve data

Raises:
ClientError

If the insight is not available for this model

get_rulesets()

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns:
rulesets : list of Ruleset
get_supported_capabilities()

Retrieves a summary of the capabilities supported by a model.

New in version v2.14.

Returns:
supportsBlending: bool

whether the model supports blending

supportsMonotonicConstraints: bool

whether the model supports monotonic constraints

hasWordCloud: bool

whether the model has word cloud data available

eligibleForPrime: bool

whether the model is eligible for Prime

hasParameters: bool

whether the model has parameters that can be retrieved

get_word_cloud(exclude_stop_words=False)

Retrieve a word cloud data for the model.

Parameters:
exclude_stop_words : bool, optional

Set to True if you want stopwords filtered out of response.

Returns:
WordCloud

Word cloud data for the model.

open_model_browser()

Opens model at project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

request_approximation()

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns:
job : Job

the job generating the rulesets

request_feature_impact()

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Returns:
job : Job

A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises:
JobAlreadyRequested (422)

If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None)

Train a new frozen model with parameters from this model

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

Parameters:
training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

training_start_date : datetime.datetime, optional

the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.

training_end_date : datetime.datetime, optional

the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:
model_job : ModelJob

the modeling job training a frozen model

request_frozen_model(sample_pct=None, training_row_count=None)

Train a new frozen model with parameters from this model

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters:
sample_pct : float

optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.

training_row_count : int

(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns:
model_job : ModelJob

the modeling job training a frozen model

request_predictions(dataset_id, include_prediction_intervals=None, prediction_intervals_size=None)

Request predictions against a previously uploaded dataset

Parameters:
dataset_id : string

The dataset to make predictions against (as uploaded from Project.upload_dataset)

include_prediction_intervals : bool, optional

(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.

prediction_intervals_size : int, optional

(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).

Returns:
job : PredictJob

The job computing the predictions

request_training_predictions(data_subset)

Start a job to build training predictions

Parameters:
data_subset : str

data set definition to build predictions on. Choices are:

  • dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
    models in datetime partitioned projects
  • dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
    all data except training set. Not valid for models in datetime partitioned projects
  • dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
  • dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
    the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
Returns:
Job

an instance of created async job

request_transferable_export()

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Examples

model = datarobot.Model.get('p-id', 'l-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
set_prediction_threshold(threshold)

Set a custom prediction threshold for the model

May not be used once prediction_threshold_read_only is True for this model.

Parameters:
threshold : float

only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()

Mark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

start_advanced_tuning_session()

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.

Returns:
AdvancedTuningSession

Session for setting up and running Advanced Tuning on a model

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Note

For datetime partitioned projects, use train_datetime instead.

Parameters:
sample_pct : float, optional

The amount of data to use for training, as a percentage of the project dataset from 0 to 100.

featurelist_id : str, optional

The identifier of the featurelist to use. If not defined, the featurelist of this model is used.

scoring_type : str, optional

Either SCORING_TYPE.validation or SCORING_TYPE.cross_validation. SCORING_TYPE.validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, SCORING_TYPE.cross_validation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.

training_row_count : int, optional

The number of rows to use to train the requested model.

monotonic_increasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_id : str

(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:
model_job_id : str

id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Examples

project = Project.get('p-id')
model = Model.get('p-id', 'l-id')
model_job_id = model.train(training_row_count=project.max_train_rows)
train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None)

Train this model on a different featurelist or amount of data

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

Parameters:
featurelist_id : str, optional

the featurelist to use to train the model. If not specified, the featurelist of this model is used.

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

time_window_sample_pct : int, optional

may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample.

Returns:
job : ModelJob

the created job to build the model

unstar_model()

Unmark the model as starred

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Advanced Tuning

class datarobot.models.advanced_tuning.AdvancedTuningSession(model)

A session enabling users to configure and run advanced tuning for a model.

Every model contains a set of one or more tasks. Every task contains a set of zero or more parameters. This class allows tuning the values of each parameter on each task of a model, before running that model.

This session is client-side only and is not persistent. Only the final model, constructed when run is called, is persisted on the DataRobot server.

Attributes:
description : basestring

Description for the new advance-tuned model. Defaults to the same description as the base model.

get_task_names()

Get the list of task names that are available for this model

Returns:
list(basestring)

List of task names

get_parameter_names(task_name)

Get the list of parameter names available for a specific task

Returns:
list(basestring)

List of parameter names

set_parameter(value, task_name=None, parameter_name=None, parameter_id=None)

Set the value of a parameter to be used

The caller must supply enough of the optional arguments to this function to uniquely identify the parameter that is being set. For example, a less-common parameter name such as ‘building_block__complementary_error_function’ might only be used once (if at all) by a single task in a model. In which case it may be sufficient to simply specify ‘parameter_name’. But a more-common name such as ‘random_seed’ might be used by several of the model’s tasks, and it may be necessary to also specify ‘task_name’ to clarify which task’s random seed is to be set. This function only affects client-side state. It will not check that the new parameter value(s) are valid.

Parameters:
task_name : basestring

Name of the task whose parameter needs to be set

parameter_name : basestring

Name of the parameter to set

parameter_id : basestring

ID of the parameter to set

value : int, float, list, or basestring

New value for the parameter, with legal values determined by the parameter being set

Raises:
NoParametersFoundException

if no matching parameters are found.

NonUniqueParametersException

if multiple parameters matched the specified filtering criteria

get_parameters()

Returns the set of parameters available to this model

The returned parameters have one additional key, “value”, reflecting any new values that have been set in this AdvancedTuningSession. When the session is run, “value” will be used, or if it is unset, “current_value”.

Returns:
parameters : dict

“Parameters” dictionary, same as specified on Model.get_advanced_tuning_params.

An additional field is added per parameter to the ‘tuningParameters’ list in the dictionary:
value : int, float, list, or basestring

The current value of the parameter. None if none has been specified.

run()

Submit this model for Advanced Tuning.

Returns:
datarobot.models.modeljob.ModelJob

The created job to build the model

ModelJob

datarobot.models.modeljob.wait_for_async_model_creation(project_id, model_job_id, max_wait=600)

Given a Project id and ModelJob id poll for status of process responsible for model creation until model is created.

Parameters:
project_id : str

The identifier of the project

model_job_id : str

The identifier of the ModelJob

max_wait : int, optional

Time in seconds after which model creation is considered unsuccessful

Returns:
model : Model

Newly created model

Raises:
AsyncModelCreationError

Raised if status of fetched ModelJob object is error

AsyncTimeoutError

Model wasn’t created in time, specified by max_wait parameter

class datarobot.models.ModelJob(data, completed_resource_url=None)

Tracks asynchronous work being done within a project

Attributes:
id : int

the id of the job

project_id : str

the id of the project the job belongs to

status : str

the status of the job - will be one of datarobot.enums.QUEUE_STATUS

job_type : str

what kind of work the job is doing - will be ‘model’ for modeling jobs

is_blocked : bool

if true, the job is blocked (cannot be executed) until its dependencies are resolved

sample_pct : float

the percentage of the project’s dataset used in this modeling job

model_type : str

the model this job builds (e.g. ‘Nystroem Kernel SVM Regressor’)

processes : list of str

the processes used by the model

featurelist_id : str

the id of the featurelist used in this modeling job

blueprint : Blueprint

the blueprint used in this modeling job

classmethod from_job(job)

Transforms a generic Job into a ModelJob

Parameters:
job: Job

A generic job representing a ModelJob

Returns:
model_job: ModelJob

A fully populated ModelJob with all the details of the job

Raises:
ValueError:

If the generic Job was not a model job, e.g. job_type != JOB_TYPE.MODEL

classmethod get(project_id, model_job_id)

Fetches one ModelJob. If the job finished, raises PendingJobFinished exception.

Parameters:
project_id : str

The identifier of the project the model belongs to

model_job_id : str

The identifier of the model_job

Returns:
model_job : ModelJob

The pending ModelJob

Raises:
PendingJobFinished

If the job being queried already finished, and the server is re-routing to the finished model.

AsyncFailureError

Querying this resource gave a status code other than 200 or 303

classmethod get_model(project_id, model_job_id)

Fetches a finished model from the job used to create it.

Parameters:
project_id : str

The identifier of the project the model belongs to

model_job_id : str

The identifier of the model_job

Returns:
model : Model

The finished model

Raises:
JobNotFinished

If the job has not finished yet

AsyncFailureError

Querying the model_job in question gave a status code other than 200 or 303

cancel()

Cancel this job. If this job has not finished running, it will be removed and canceled.

get_result()
Returns:
result : object
Return type depends on the job type:
  • for model jobs, a Model is returned
  • for predict jobs, a pandas.DataFrame (with predictions) is returned
  • for featureImpact jobs, a list of dicts (see Model.get_feature_impact for more detail)
  • for primeRulesets jobs, a list of Rulesets
  • for primeModel jobs, a PrimeModel
  • for primeDownloadValidation jobs, a PrimeFile
  • for reasonCodesInitialization jobs, a ReasonCodesInitialization
  • for reasonCodes jobs, a ReasonCodes
  • for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
  • for predictionExplanations jobs, a PredictionExplanations
Raises:
JobNotFinished

If the job is not finished, the result is not available.

AsyncProcessUnsuccessfulError

If the job errored or was aborted

get_result_when_complete(max_wait=600)
Parameters:
max_wait : int, optional

How long to wait for the job to finish.

Returns:
result: object

Return type is the same as would be returned by Job.get_result.

Raises:
AsyncTimeoutError

If the job does not finish in time

AsyncProcessUnsuccessfulError

If the job errored or was aborted

refresh()

Update this object with the latest job data from the server.

wait_for_completion(max_wait=600)

Waits for job to complete.

Parameters:
max_wait : int, optional

How long to wait for the job to finish.

Pareto Front

class datarobot.models.pareto_front.ParetoFront(project_id, error_metric, hyperparameters, target_type, solutions)

Pareto front data for a Eureqa model.

The pareto front reflects the tradeoffs between error and complexity for particular model. The solutions reflect possible Eureqa models that are different levels of complexity. By default, only one solution will have a corresponding model, but models can be created for each solution.

Attributes:
project_id : str

the ID of the project the model belongs to

error_metric : str

Eureqa error-metric identifier used to compute error metrics for this search. Note that Eureqa error metrics do NOT correspond 1:1 with DataRobot error metrics – the available metrics are not the same, and are computed from a subset of the training data rather than from the validation data.

hyperparameters : dict

Hyperparameters used by this run of the Eureqa blueprint

target_type : str

Indicating what kind of modeling is being done in this project, either ‘Regression’, ‘Binary’ (Binary classification), or ‘Multiclass’ (Multiclass classification).

solutions : list(Solution)

Solutions that Eureqa has found to model this data. Some solutions will have greater accuracy. Others will have slightly less accuracy but will use simpler expressions.

class datarobot.models.pareto_front.Solution(eureqa_solution_id, complexity, error, expression, expression_annotated, best_model, project_id)

Eureqa Solution.

A solution represents a possible Eureqa model; however not all solutions have models associated with them. It must have a model created before it can be used to make predictions, etc.

Attributes:
eureqa_solution_id: str

ID of this Solution

complexity: int

Complexity score for this solution. Complexity score is a function of the mathematical operators used in the current solution. The Complexity calculation can be tuned via model hyperparameters.

error: float

Error for the current solution, as computed by Eureqa using the ‘error_metric’ error metric.

expression: str

Eureqa model equation string.

expression_annotated: str

Eureqa model equation string with variable names tagged for easy identification.

best_model: bool

True, if the model is determined to be the best

create_model()

Add this solution to the leaderboard, if it is not already present.

Partitioning

class datarobot.RandomCV(holdout_pct, reps, seed=0)

A partition in which observations are randomly assigned to cross-validation groups and the holdout set.

Parameters:
holdout_pct : int

the desired percentage of dataset to assign to holdout set

reps : int

number of cross validation folds to use

seed : int

a seed to use for randomization

class datarobot.StratifiedCV(holdout_pct, reps, seed=0)

A partition in which observations are randomly assigned to cross-validation groups and the holdout set, preserving in each group the same ratio of positive to negative cases as in the original data.

Parameters:
holdout_pct : int

the desired percentage of dataset to assign to holdout set

reps : int

number of cross validation folds to use

seed : int

a seed to use for randomization

class datarobot.GroupCV(holdout_pct, reps, partition_key_cols, seed=0)

A partition in which one column is specified, and rows sharing a common value for that column are guaranteed to stay together in the partitioning into cross-validation groups and the holdout set.

Parameters:
holdout_pct : int

the desired percentage of dataset to assign to holdout set

reps : int

number of cross validation folds to use

partition_key_cols : list

a list containing a single string, where the string is the name of the column whose values should remain together in partitioning

seed : int

a seed to use for randomization

class datarobot.UserCV(user_partition_col, cv_holdout_level, seed=0)

A partition where the cross-validation folds and the holdout set are specified by the user.

Parameters:
user_partition_col : string

the name of the column containing the partition assignments

cv_holdout_level

the value of the partition column indicating a row is part of the holdout set

seed : int

a seed to use for randomization

class datarobot.RandomTVH(holdout_pct, validation_pct, seed=0)

Specifies a partitioning method in which rows are randomly assigned to training, validation, and holdout.

Parameters:
holdout_pct : int

the desired percentage of dataset to assign to holdout set

validation_pct : int

the desired percentage of dataset to assign to validation set

seed : int

a seed to use for randomization

class datarobot.UserTVH(user_partition_col, training_level, validation_level, holdout_level, seed=0)

Specifies a partitioning method in which rows are assigned by the user to training, validation, and holdout sets.

Parameters:
user_partition_col : string

the name of the column containing the partition assignments

training_level

the value of the partition column indicating a row is part of the training set

validation_level

the value of the partition column indicating a row is part of the validation set

holdout_level

the value of the partition column indicating a row is part of the holdout set (use None if you want no holdout set)

seed : int

a seed to use for randomization

class datarobot.StratifiedTVH(holdout_pct, validation_pct, seed=0)

A partition in which observations are randomly assigned to train, validation, and holdout sets, preserving in each group the same ratio of positive to negative cases as in the original data.

Parameters:
holdout_pct : int

the desired percentage of dataset to assign to holdout set

validation_pct : int

the desired percentage of dataset to assign to validation set

seed : int

a seed to use for randomization

class datarobot.GroupTVH(holdout_pct, validation_pct, partition_key_cols, seed=0)

A partition in which one column is specified, and rows sharing a common value for that column are guaranteed to stay together in the partitioning into the training, validation, and holdout sets.

Parameters:
holdout_pct : int

the desired percentage of dataset to assign to holdout set

validation_pct : int

the desired percentage of dataset to assign to validation set

partition_key_cols : list

a list containing a single string, where the string is the name of the column whose values should remain together in partitioning

seed : int

a seed to use for randomization

class datarobot.DatetimePartitioningSpecification(datetime_partition_column, autopilot_data_selection_method=None, validation_duration=None, holdout_start_date=None, holdout_duration=None, disable_holdout=None, gap_duration=None, number_of_backtests=None, backtests=None, use_time_series=False, default_to_known_in_advance=False, default_to_do_not_derive=False, feature_derivation_window_start=None, feature_derivation_window_end=None, feature_settings=None, forecast_window_start=None, forecast_window_end=None, windows_basis_unit=None, treat_as_exponential=None, differencing_method=None, periodicities=None, multiseries_id_columns=None, target=None, use_cross_series_features=None, aggregation_type=None, cross_series_group_by_columns=None, calendar_id=None)

Uniquely defines a DatetimePartitioning for some project

Includes only the attributes of DatetimePartitioning that are directly controllable by users, not those determined by the DataRobot application based on the project dataset and the user-controlled settings.

This is the specification that should be passed to Project.set_target via the partitioning_method parameter. To see the full partitioning based on the project dataset, use DatetimePartitioning.generate.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method.

Attributes:
datetime_partition_column : str

the name of the column whose values as dates are used to assign a row to a particular partition

autopilot_data_selection_method : str

one of datarobot.enums.DATETIME_AUTOPILOT_DATA_SELECTION_METHOD. Whether models created by the autopilot should use “rowCount” or “duration” as their data_selection_method.

validation_duration : str or None

the default validation_duration for the backtests

holdout_start_date : datetime.datetime or None

The start date of holdout scoring data. If holdout_start_date is specified, holdout_duration must also be specified. If disable_holdout is set to True, neither holdout_duration nor holdout_start_date must be specified.

holdout_duration : str or None

The duration of the holdout scoring data. If holdout_duration is specified, holdout_start_date must also be specified. If disable_holdout is set to True, neither holdout_duration nor holdout_start_date must be specified.

disable_holdout : bool or None

(New in version v2.8) Whether to suppress allocating a holdout fold. If set to True, holdout_start_date and holdout_duration must not be specified.

gap_duration : str or None

The duration of the gap between training and holdout scoring data

number_of_backtests : int or None

the number of backtests to use

backtests : list of BacktestSpecification

the exact specification of backtests to use. The indexes of the specified backtests should range from 0 to number_of_backtests - 1. If any backtest is left unspecified, a default configuration will be chosen.

use_time_series : bool

(New in version v2.8) Whether to create a time series project (if True) or an OTV project which uses datetime partitioning (if False). The default behaviour is to create an OTV project.

default_to_known_in_advance : bool

(New in version v2.11) Optional, for time series projects only. Sets whether all features default to being treated as known in advance. Known in advance features are expected to be known for dates in the future when making predictions, e.g., “is this a holiday?”. The default is false, all features are not known in advance. Individual features can be set to a value different than the default using the featureSettings parameter.

default_to_do_not_derive : bool

(New in v2.17) Optional, for time series projects only. Sets whether all features default to being treated as do-not-derive features, excluding them from feature derivation. Individual features can be set to a value different than the default by using the featureSettings parameter.

feature_derivation_window_start : int or None

(New in version v2.8) Only used for time series projects. Offset into the past to define how far back relative to the forecast point the feature derivation window should start. Expressed in terms of the time_unit of the datetime_partition_column and should be negative or zero.

feature_derivation_window_end : int or None

(New in version v2.8) Only used for time series projects. Offset into the past to define how far back relative to the forecast point the feature derivation window should end. Expressed in terms of the time_unit of the datetime_partition_column, and should be a positive value.

feature_settings : list of FeatureSettings objects

(New in version v2.9) Optional, a list specifying per feature settings, can be left unspecified.

forecast_window_start : int or None

(New in version v2.8) Only used for time series projects. Offset into the future to define how far forward relative to the forecast point the forecast window should start. Expressed in terms of the time_unit of the datetime_partition_column.

forecast_window_end : int or None

(New in version v2.8) Only used for time series projects. Offset into the future to define how far forward relative to the forecast point the forecast window should end. Expressed in terms of the time_unit of the datetime_partition_column.

windows_basis_unit : string, optional

(New in version v2.14) Only used for time series projects. Indicates which unit is a basis for feature derivation window and forecast window. Valid options are detected time unit (one of the datarobot.enums.TIME_UNITS) or “ROW”. If omitted, the default value is detected time unit.

treat_as_exponential : string, optional

(New in version v2.9) defaults to “auto”. Used to specify whether to treat data as exponential trend and apply transformations like log-transform. Use values from the datarobot.enums.TREAT_AS_EXPONENTIAL enum.

differencing_method : string, optional

(New in version v2.9) defaults to “auto”. Used to specify which differencing method to apply of case if data is stationary. Use values from datarobot.enums.DIFFERENCING_METHOD enum.

periodicities : list of Periodicity, optional

(New in version v2.9) a list of datarobot.Periodicity. Periodicities units should be ‘ROW’, if windows_basis_unit is ‘ROW’

multiseries_id_columns : list of str or null

(New in version v2.11) a list of the names of multiseries id columns to define series within the training data. Currently only one multiseries id column is supported.

use_cross_series_features : bool

(New in version v2.14) Whether to use cross series features.

aggregation_type : str, optional

(New in version v2.14) The aggregation type to apply when creating cross series features. Optional, must be one of “total” or “average”.

cross_series_group_by_columns : list of str, optional

(New in version v2.15) List of columns (currently of length 1). Optional setting that indicates how to further split series into related groups. For example, if every series is sales of an individual product, the series group-by could be the product category with values like “men’s clothing”, “sports equipment”, etc.. Must be used with multiseries and useCrossSeriesFeatures enabled.

calendar_id : str, optional

(New in version v2.15) The id of the CalendarFile to use with this project.

collect_payload()

Set up the dict that should be sent to the server when setting the target Returns ——- partitioning_spec : dict

prep_payload(project_id, max_wait=600)

Run any necessary validation and prep of the payload, including async operations

Mainly used for the datetime partitioning spec but implemented in general for consistency

class datarobot.BacktestSpecification(index, gap_duration, validation_start_date, validation_duration)

Uniquely defines a Backtest used in a DatetimePartitioning

Includes only the attributes of a backtest directly controllable by users. The other attributes are assigned by the DataRobot application based on the project dataset and the user-controlled settings.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method.

Attributes:
index : int

the index of the backtest to update

gap_duration : str

the desired duration of the gap between training and validation scoring data for the backtest

validation_start_date : datetime.datetime

the desired start date of the validation scoring data for this backtest

validation_duration : datetime.datetime

the desired duration of the validation scoring data for this backtest

class datarobot.FeatureSettings(feature_name, known_in_advance=None, do_not_derive=None)

Per feature settings

Attributes:
feature_name : string

name of the feature

known_in_advance : bool

(New in version v2.11) Optional, for time series projects only. Sets whether the feature is known in advance, i.e., values for future dates are known at prediction time. If not specified, the feature uses the value from the default_to_known_in_advance flag.

do_not_derive : bool

(New in v2.17) Optional, for time series projects only. Sets whether the feature is excluded from feature derivation. If not specified, the feature uses the value from the default_to_do_not_derive flag.

class datarobot.Periodicity(time_steps, time_unit)

Periodicity configuration

Parameters:
time_steps : int

Time step value

time_unit : string

Time step unit, valid options are values from datarobot.enums.TIME_UNITS

Examples

from datarobot as dr
periodicities = [
    dr.Periodicity(time_steps=10, time_unit=dr.enums.TIME_UNITS.HOUR),
    dr.Periodicity(time_steps=600, time_unit=dr.enums.TIME_UNITS.MINUTE)]
spec = dr.DatetimePartitioningSpecification(
    # ...
    periodicities=periodicities
)
class datarobot.DatetimePartitioning(project_id=None, datetime_partition_column=None, date_format=None, autopilot_data_selection_method=None, validation_duration=None, available_training_start_date=None, available_training_duration=None, available_training_row_count=None, available_training_end_date=None, primary_training_start_date=None, primary_training_duration=None, primary_training_row_count=None, primary_training_end_date=None, gap_start_date=None, gap_duration=None, gap_row_count=None, gap_end_date=None, holdout_start_date=None, holdout_duration=None, holdout_row_count=None, holdout_end_date=None, number_of_backtests=None, backtests=None, total_row_count=None, use_time_series=False, default_to_known_in_advance=False, default_to_do_not_derive=False, feature_derivation_window_start=None, feature_derivation_window_end=None, feature_settings=None, forecast_window_start=None, forecast_window_end=None, windows_basis_unit=None, treat_as_exponential=None, differencing_method=None, periodicities=None, multiseries_id_columns=None, number_of_known_in_advance_features=0, number_of_do_not_derive_features=0, use_cross_series_features=None, aggregation_type=None, cross_series_group_by_columns=None, calendar_id=None, calendar_name=None)

Full partitioning of a project for datetime partitioning

Includes both the attributes specified by the user, as well as those determined by the DataRobot application based on the project dataset. In order to use a partitioning to set the target, call to_specification and pass the resulting DatetimePartitioningSpecification to Project.set_target.

The available training data corresponds to all the data available for training, while the primary training data corresponds to the data that can be used to train while ensuring that all backtests are available. If a model is trained with more data than is available in the primary training data, then all backtests may not have scores available.

Attributes:
project_id : str

the id of the project this partitioning applies to

datetime_partition_column : str

the name of the column whose values as dates are used to assign a row to a particular partition

date_format : str

the format (e.g. “%Y-%m-%d %H:%M:%S”) by which the partition column was interpreted (compatible with strftime [https://docs.python.org/2/library/time.html#time.strftime] )

autopilot_data_selection_method : str

one of datarobot.enums.DATETIME_AUTOPILOT_DATA_SELECTION_METHOD. Whether models created by the autopilot use “rowCount” or “duration” as their data_selection_method.

validation_duration : str or None

the validation duration specified when initializing the partitioning - not directly significant if the backtests have been modified, but used as the default validation_duration for the backtests. Can be absent if it’s a time series project with an irregular primary date/time feature.

available_training_start_date : datetime.datetime

The start date of the available training data for scoring the holdout

available_training_duration : str

The duration of the available training data for scoring the holdout

available_training_row_count : int or None

The number of rows in the available training data for scoring the holdout. Only available when retrieving the partitioning after setting the target.

available_training_end_date : datetime.datetime

The end date of the available training data for scoring the holdout

primary_training_start_date : datetime.datetime or None

The start date of primary training data for scoring the holdout. Unavailable when the holdout fold is disabled.

primary_training_duration : str

The duration of the primary training data for scoring the holdout

primary_training_row_count : int or None

The number of rows in the primary training data for scoring the holdout. Only available when retrieving the partitioning after setting the target.

primary_training_end_date : datetime.datetime or None

The end date of the primary training data for scoring the holdout. Unavailable when the holdout fold is disabled.

gap_start_date : datetime.datetime or None

The start date of the gap between training and holdout scoring data. Unavailable when the holdout fold is disabled.

gap_duration : str

The duration of the gap between training and holdout scoring data

gap_row_count : int or None

The number of rows in the gap between training and holdout scoring data. Only available when retrieving the partitioning after setting the target.

gap_end_date : datetime.datetime or None

The end date of the gap between training and holdout scoring data. Unavailable when the holdout fold is disabled.

holdout_start_date : datetime.datetime or None

The start date of holdout scoring data. Unavailable when the holdout fold is disabled.

holdout_duration : str

The duration of the holdout scoring data

holdout_row_count : int or None

The number of rows in the holdout scoring data. Only available when retrieving the partitioning after setting the target.

holdout_end_date : datetime.datetime or None

The end date of the holdout scoring data. Unavailable when the holdout fold is disabled.

number_of_backtests : int

the number of backtests used

backtests : list of partitioning_methods.Backtest

the configured Backtests

total_row_count : int

the number of rows in the project dataset. Only available when retrieving the partitioning after setting the target.

use_time_series : bool

(New in version v2.8) Whether to create a time series project (if True) or an OTV project which uses datetime partitioning (if False). The default behaviour is to create an OTV project.

default_to_known_in_advance : bool

(New in version v2.11) Optional, for time series projects only. Sets whether all features default to being treated as known in advance. Known in advance features are expected to be known for dates in the future when making predictions, e.g., “is this a holiday?”. The default is false, all features are not known in advance. Individual features can be set to a value different than the default using the featureSettings parameter.

default_to_do_not_derive : bool

(New in v2.17) Optional, for time series projects only. Sets whether all features default to being treated as do-not-derive features, excluding them from feature derivation. The default is false, all features are used for feature derivation. Individual features can be set to a value different than the default by using the featureSettings parameter.

feature_derivation_window_start : int or None

(New in version v2.8) Only used for time series projects. Offset into the past to define how far back relative to the forecast point the feature derivation window should start. Expressed in terms of the time_unit of the datetime_partition_column.

feature_derivation_window_end : int or None

(New in version v2.8) Only used for time series projects. Offset into the past to define how far back relative to the forecast point the feature derivation window should end. Expressed in terms of the time_unit of the datetime_partition_column.

feature_settings : list of FeatureSettings

(New in version v2.9) Optional, a list specifying per feature settings, can be left unspecified.

forecast_window_start : int or None

(New in version v2.8) Only used for time series projects. Offset into the future to define how far forward relative to the forecast point the forecast window should start. Expressed in terms of the time_unit of the datetime_partition_column.

forecast_window_end : int or None

(New in version v2.8) Only used for time series projects. Offset into the future to define how far forward relative to the forecast point the forecast window should end. Expressed in terms of the time_unit of the datetime_partition_column.

windows_basis_unit : string, optional

(New in version v2.14) Only used for time series projects. Indicates which unit is a basis for feature derivation window and forecast window. Valid options are detected time unit (one of the datarobot.enums.TIME_UNITS) or “ROW”. If omitted, the default value is detected time unit.

treat_as_exponential : string, optional

(New in version v2.9) defaults to “auto”. Used to specify whether to treat data as exponential trend and apply transformations like log-transform. Use values from datarobot.enums.TREAT_AS_EXPONENTIAL enum.

differencing_method : string, optional

(New in version v2.9) defaults to “auto”. Used to specify which differencing method to apply of case if data is stationary. Use values from datarobot.enums.DIFFERENCING_METHOD enum.

periodicities : list of Periodicity, optional

(New in version v2.9) a list of datarobot.Periodicity Periodicities units should be ‘ROW’, if windows_basis_unit is ‘ROW’

multiseries_id_columns : list of str or null

(New in version v2.11) a list of the names of multiseries id columns to define series within the training data. Currently only one multiseries id column is supported.

number_of_known_in_advance_features : int

(New in version v2.14) Number of features that are marked as known in advance.

number_of_do_not_derive_features : int

(New in v2.17) Number of features that are excluded from derivation.

use_cross_series_features : bool

(New in version v2.14) Whether to use cross series features.

aggregation_type : str, optional

(New in version v2.14) The aggregation type to apply when creating cross series features. Optional, must be one of “total” or “average”.

cross_series_group_by_columns : list of str, optional

(New in version v2.15) List of columns (currently of length 1). Optional setting that indicates how to further split series into related groups. For example, if every series is sales of an individual product, the series group-by could be the product category with values like “men’s clothing”, “sports equipment”, etc.. Must be used with multiseries and useCrossSeriesFeatures enabled.

calendar_id : str, optional

(New in version v2.15) The id of the CalendarFile to use with this project. Only available for time series projects.

calendar_name : str, optional

(New in version v2.17) The name of the CalendarFile used with this project. Only available for time series projects.

classmethod generate(project_id, spec, max_wait=600)

Preview the full partitioning determined by a DatetimePartitioningSpecification

Based on the project dataset and the partitioning specification, inspect the full partitioning that would be used if the same specification were passed into Project.set_target.

Parameters:
project_id : str

the id of the project

spec : DatetimePartitioningSpec

the desired partitioning

max_wait : int, optional

For some settings (e.g. generating a partitioning preview for a multiseries project for the first time), an asynchronous task must be run to analyze the dataset. max_wait governs the maximum time (in seconds) to wait before giving up. In all non-multiseries projects, this is unused.

Returns:
DatetimePartitioning :

the full generated partitioning

classmethod get(project_id)

Retrieve the DatetimePartitioning from a project

Only available if the project has already set the target as a datetime project.

Parameters:
project_id : str

the id of the project to retrieve partitioning for

Returns:
DatetimePartitioning : the full partitioning for the project
classmethod feature_log_list(project_id, offset=None, limit=None)

Retrieve the feature derivation log content and log length for a time series project.

The Time Series Feature Log provides details about the feature generation process for a time series project. It includes information about which features are generated and their priority, as well as the detected properties of the time series data such as whether the series is stationary, and periodicities detected.

This route is only supported for time series projects that have finished partitioning.

The feature derivation log will include information about:

  • Detected stationarity of the series:
    e.g. ‘Series detected as non-stationary’
  • Detected presence of multiplicative trend in the series:
    e.g. ‘Multiplicative trend detected’
  • Detected presence of multiplicative trend in the series:
    e.g. ‘Detected periodicities: 7 day’
  • Maximum number of feature to be generated:
    e.g. ‘Maximum number of feature to be generated is 1440’
  • Window sizes used in rolling statistics / lag extractors
    e.g. ‘The window sizes chosen to be: 2 months
    (because the time step is 1 month and Feature Derivation Window is 2 months)’
  • Features that are specified as known-in-advance
    e.g. ‘Variables treated as apriori: holiday’
  • Details about why certain variables are transformed in the input data
    e.g. ‘Generating variable “y (log)” from “y” because multiplicative trend
    is detected’
  • Details about features generated as timeseries features, and their priority
    e.g. ‘Generating feature “date (actual)” from “date” (priority: 1)’
Parameters:
project_id : str

project id to retrieve a feature derivation log for.

offset : int

optional, defaults is 0, this many results will be skipped.

limit : int

optional, defaults to 100, at most this many results are returned. To specify

no limit, use 0. The default may change without notice.
classmethod feature_log_retrieve(project_id)

Retrieve the feature derivation log content and log length for a time series project.

The Time Series Feature Log provides details about the feature generation process for a time series project. It includes information about which features are generated and their priority, as well as the detected properties of the time series data such as whether the series is stationary, and periodicities detected.

This route is only supported for time series projects that have finished partitioning.

The feature derivation log will include information about:

  • Detected stationarity of the series:
    e.g. ‘Series detected as non-stationary’
  • Detected presence of multiplicative trend in the series:
    e.g. ‘Multiplicative trend detected’
  • Detected presence of multiplicative trend in the series:
    e.g. ‘Detected periodicities: 7 day’
  • Maximum number of feature to be generated:
    e.g. ‘Maximum number of feature to be generated is 1440’
  • Window sizes used in rolling statistics / lag extractors
    e.g. ‘The window sizes chosen to be: 2 months
    (because the time step is 1 month and Feature Derivation Window is 2 months)’
  • Features that are specified as known-in-advance
    e.g. ‘Variables treated as apriori: holiday’
  • Details about why certain variables are transformed in the input data
    e.g. ‘Generating variable “y (log)” from “y” because multiplicative trend
    is detected’
  • Details about features generated as timeseries features, and their priority
    e.g. ‘Generating feature “date (actual)” from “date” (priority: 1)’
Parameters:
project_id : str

project id to retrieve a feature derivation log for.

to_specification()

Render the DatetimePartitioning as a DatetimePartitioningSpecification

The resulting specification can be used when setting the target, and contains only the attributes directly controllable by users.

Returns:
DatetimePartitioningSpecification:

the specification for this partitioning

to_dataframe()

Render the partitioning settings as a dataframe for convenience of display

Excludes project_id, datetime_partition_column, date_format, autopilot_data_selection_method, validation_duration, and number_of_backtests, as well as the row count information, if present.

Also excludes the time series specific parameters for use_time_series, default_to_known_in_advance, default_to_do_not_derive, and defining the feature derivation and forecast windows.

class datarobot.helpers.partitioning_methods.Backtest(index=None, available_training_start_date=None, available_training_duration=None, available_training_row_count=None, available_training_end_date=None, primary_training_start_date=None, primary_training_duration=None, primary_training_row_count=None, primary_training_end_date=None, gap_start_date=None, gap_duration=None, gap_row_count=None, gap_end_date=None, validation_start_date=None, validation_duration=None, validation_row_count=None, validation_end_date=None, total_row_count=None)

A backtest used to evaluate models trained in a datetime partitioned project

When setting up a datetime partitioning project, backtests are specified by a BacktestSpecification.

The available training data corresponds to all the data available for training, while the primary training data corresponds to the data that can be used to train while ensuring that all backtests are available. If a model is trained with more data than is available in the primary training data, then all backtests may not have scores available.

Attributes:
index : int

the index of the backtest

available_training_start_date : datetime.datetime

the start date of the available training data for this backtest

available_training_duration : str

the duration of available training data for this backtest

available_training_row_count : int or None

the number of rows of available training data for this backtest. Only available when retrieving from a project where the target is set.

available_training_end_date : datetime.datetime

the end date of the available training data for this backtest

primary_training_start_date : datetime.datetime

the start date of the primary training data for this backtest

primary_training_duration : str

the duration of the primary training data for this backtest

primary_training_row_count : int or None

the number of rows of primary training data for this backtest. Only available when retrieving from a project where the target is set.

primary_training_end_date : datetime.datetime

the end date of the primary training data for this backtest

gap_start_date : datetime.datetime

the start date of the gap between training and validation scoring data for this backtest

gap_duration : str

the duration of the gap between training and validation scoring data for this backtest

gap_row_count : int or None

the number of rows in the gap between training and validation scoring data for this backtest. Only available when retrieving from a project where the target is set.

gap_end_date : datetime.datetime

the end date of the gap between training and validation scoring data for this backtest

validation_start_date : datetime.datetime

the start date of the validation scoring data for this backtest

validation_duration : str

the duration of the validation scoring data for this backtest

validation_row_count : int or None

the number of rows of validation scoring data for this backtest. Only available when retrieving from a project where the target is set.

validation_end_date : datetime.datetime

the end date of the validation scoring data for this backtest

total_row_count : int or None

the number of rows in this backtest. Only available when retrieving from a project where the target is set.

to_specification()

Render this backtest as a BacktestSpecification

A BacktestSpecification includes only the attributes users can directly control, not those indirectly determined by the project dataset.

Returns:
BacktestSpecification

the specification for this backtest

to_dataframe()

Render this backtest as a dataframe for convenience of display

Returns:
backtest_partitioning : pandas.Dataframe

the backtest attributes, formatted into a dataframe

datarobot.helpers.partitioning_methods.construct_duration_string(years=0, months=0, days=0, hours=0, minutes=0, seconds=0)

Construct a valid string representing a duration in accordance with ISO8601

A duration of six months, 3 days, and 12 hours could be represented as P6M3DT12H.

Parameters:
years : int

the number of years in the duration

months : int

the number of months in the duration

days : int

the number of days in the duration

hours : int

the number of hours in the duration

minutes : int

the number of minutes in the duration

seconds : int

the number of seconds in the duration

Returns:
duration_string: str

The duration string, specified compatibly with ISO8601

PredictJob

datarobot.models.predict_job.wait_for_async_predictions(project_id, predict_job_id, max_wait=600)

Given a Project id and PredictJob id poll for status of process responsible for predictions generation until it’s finished

Parameters:
project_id : str

The identifier of the project

predict_job_id : str

The identifier of the PredictJob

max_wait : int, optional

Time in seconds after which predictions creation is considered unsuccessful

Returns:
predictions : pandas.DataFrame

Generated predictions.

Raises:
AsyncPredictionsGenerationError

Raised if status of fetched PredictJob object is error

AsyncTimeoutError

Predictions weren’t generated in time, specified by max_wait parameter

class datarobot.models.PredictJob(data, completed_resource_url=None)

Tracks asynchronous work being done within a project

Attributes:
id : int

the id of the job

project_id : str

the id of the project the job belongs to

status : str

the status of the job - will be one of datarobot.enums.QUEUE_STATUS

job_type : str

what kind of work the job is doing - will be ‘predict’ for predict jobs

is_blocked : bool

if true, the job is blocked (cannot be executed) until its dependencies are resolved

message : str

a message about the state of the job, typically explaining why an error occurred

classmethod from_job(job)

Transforms a generic Job into a PredictJob

Parameters:
job: Job

A generic job representing a PredictJob

Returns:
predict_job: PredictJob

A fully populated PredictJob with all the details of the job

Raises:
ValueError:

If the generic Job was not a predict job, e.g. job_type != JOB_TYPE.PREDICT

classmethod create(model, sourcedata)

Note

Deprecated in v2.3 in favor of Project.upload_dataset and Model.request_predictions. That workflow allows you to reuse the same dataset for predictions from multiple models within one project.

Starts predictions generation for provided data using previously created model.

Parameters:
model : Model

Model to use for predictions generation

sourcedata : str, file or pandas.DataFrame

Data to be used for predictions. If this parameter is a str, it can be either a path to a local file or raw file content. If using a file on disk, the filename must consist of ASCII characters only. The file must be a CSV, and cannot be compressed

Returns:
predict_job_id : str

id of created job, can be used as parameter to PredictJob.get or PredictJob.get_predictions methods or wait_for_async_predictions function

Raises:
InputNotUnderstoodError

If the parameter for sourcedata didn’t resolve into known data types

Examples

model = Model.get('p-id', 'l-id')
predict_job = PredictJob.create(model, './data_to_predict.csv')
classmethod get(project_id, predict_job_id)

Fetches one PredictJob. If the job finished, raises PendingJobFinished exception.

Parameters:
project_id : str

The identifier of the project the model on which prediction was started belongs to

predict_job_id : str

The identifier of the predict_job

Returns:
predict_job : PredictJob

The pending PredictJob

Raises:
PendingJobFinished

If the job being queried already finished, and the server is re-routing to the finished predictions.

AsyncFailureError

Querying this resource gave a status code other than 200 or 303

classmethod get_predictions(project_id, predict_job_id, class_prefix='class_')

Fetches finished predictions from the job used to generate them.

Note

The prediction API for classifications now returns an additional prediction_values dictionary that is converted into a series of class_prefixed columns in the final dataframe. For example, <label> = 1.0 is converted to ‘class_1.0’. If you are on an older version of the client (prior to v2.8), you must update to v2.8 to correctly pivot this data.

Parameters:
project_id : str

The identifier of the project to which belongs the model used for predictions generation

predict_job_id : str

The identifier of the predict_job

class_prefix : str

The prefix to append to labels in the final dataframe (e.g., apple -> class_apple)

Returns:
predictions : pandas.DataFrame

Generated predictions

Raises:
JobNotFinished

If the job has not finished yet

AsyncFailureError

Querying the predict_job in question gave a status code other than 200 or 303

cancel()

Cancel this job. If this job has not finished running, it will be removed and canceled.

get_result()
Returns:
result : object
Return type depends on the job type:
  • for model jobs, a Model is returned
  • for predict jobs, a pandas.DataFrame (with predictions) is returned
  • for featureImpact jobs, a list of dicts (see Model.get_feature_impact for more detail)
  • for primeRulesets jobs, a list of Rulesets
  • for primeModel jobs, a PrimeModel
  • for primeDownloadValidation jobs, a PrimeFile
  • for reasonCodesInitialization jobs, a ReasonCodesInitialization
  • for reasonCodes jobs, a ReasonCodes
  • for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
  • for predictionExplanations jobs, a PredictionExplanations
Raises:
JobNotFinished

If the job is not finished, the result is not available.

AsyncProcessUnsuccessfulError

If the job errored or was aborted

get_result_when_complete(max_wait=600)
Parameters:
max_wait : int, optional

How long to wait for the job to finish.

Returns:
result: object

Return type is the same as would be returned by Job.get_result.

Raises:
AsyncTimeoutError

If the job does not finish in time

AsyncProcessUnsuccessfulError

If the job errored or was aborted

refresh()

Update this object with the latest job data from the server.

wait_for_completion(max_wait=600)

Waits for job to complete.

Parameters:
max_wait : int, optional

How long to wait for the job to finish.

Prediction Dataset

class datarobot.models.PredictionDataset(project_id, id, name, created, num_rows, num_columns, forecast_point=None, predictions_start_date=None, predictions_end_date=None, relax_known_in_advance_features_check=None, data_quality_warnings=None)

A dataset uploaded to make predictions

Typically created via project.upload_dataset

Attributes:
id : str

the id of the dataset

project_id : str

the id of the project the dataset belongs to

created : str

the time the dataset was created

name : str

the name of the dataset

num_rows : int

the number of rows in the dataset

num_columns : int

the number of columns in the dataset

forecast_point : datetime.datetime or None

Only specified in time series projects. The point relative to which predictions will be generated, based on the forecast window of the project. See the time series documentation for more information.

predictions_start_date : datetime.datetime or None, optional

Only specified in time series projects. The start date for bulk predictions. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with forecastPoint parameter.

predictions_end_date : datetime.datetime or None, optional

Only specified in time series projects. The end date for bulk predictions. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with forecastPoint parameter.

relax_known_in_advance_features_check : bool, optional

(New in version v2.15) For Time Series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

data_quality_warnings : dict, optional

(New in version 2.15) A dictionary that contains available warnings about potential problems in this prediction dataset. Empty if no warnings.

classmethod get(project_id, dataset_id)

Retrieve information about a dataset uploaded for predictions

Parameters:
project_id:

the id of the project to query

dataset_id:

the id of the dataset to retrieve

Returns:
dataset: PredictionDataset

A dataset uploaded to make predictions

delete()

Delete a dataset uploaded for predictions

Will also delete predictions made using this dataset and cancel any predict jobs using this dataset.

Prediction Explanations

class datarobot.PredictionExplanationsInitialization(project_id, model_id, prediction_explanations_sample=None)

Represents a prediction explanations initialization of a model.

Attributes:
project_id : str

id of the project the model belongs to

model_id : str

id of the model the prediction explanations initialization is for

prediction_explanations_sample : list of dict

a small sample of prediction explanations that could be generated for the model

classmethod get(project_id, model_id)

Retrieve the prediction explanations initialization for a model.

Prediction explanations initializations are a prerequisite for computing prediction explanations, and include a sample what the computed prediction explanations for a prediction dataset would look like.

Parameters:
project_id : str

id of the project the model belongs to

model_id : str

id of the model the prediction explanations initialization is for

Returns:
prediction_explanations_initialization : PredictionExplanationsInitialization

The queried instance.

Raises:
ClientError (404)

If the project or model does not exist or the initialization has not been computed.

classmethod create(project_id, model_id)

Create a prediction explanations initialization for the specified model.

Parameters:
project_id : str

id of the project the model belongs to

model_id : str

id of the model for which initialization is requested

Returns:
job : Job

an instance of created async job

delete()

Delete this prediction explanations initialization.

class datarobot.PredictionExplanations(id, project_id, model_id, dataset_id, max_explanations, num_columns, finish_time, prediction_explanations_location, threshold_low=None, threshold_high=None)

Represents prediction explanations metadata and provides access to computation results.

Examples

prediction_explanations = dr.PredictionExplanations.get(project_id, explanations_id)
for row in prediction_explanations.get_rows():
    print(row)  # row is an instance of PredictionExplanationsRow
Attributes:
id : str

id of the record and prediction explanations computation result

project_id : str

id of the project the model belongs to

model_id : str

id of the model the prediction explanations are for

dataset_id : str

id of the prediction dataset prediction explanations were computed for

max_explanations : int

maximum number of prediction explanations to supply per row of the dataset

threshold_low : float

the lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset

threshold_high : float

the high threshold, above which a prediction must score in order for prediction explanations to be computed for a row in the dataset

num_columns : int

the number of columns prediction explanations were computed for

finish_time : float

timestamp referencing when computation for these prediction explanations finished

prediction_explanations_location : str

where to retrieve the prediction explanations

classmethod get(project_id, prediction_explanations_id)

Retrieve a specific prediction explanations.

Parameters:
project_id : str

id of the project the explanations belong to

prediction_explanations_id : str

id of the prediction explanations

Returns:
prediction_explanations : PredictionExplanations

The queried instance.

classmethod create(project_id, model_id, dataset_id, max_explanations=None, threshold_low=None, threshold_high=None)

Create prediction explanations for the specified dataset.

In order to create PredictionExplanations for a particular model and dataset, you must first:

  • Compute feature impact for the model via datarobot.Model.get_feature_impact()
  • Compute a PredictionExplanationsInitialization for the model via datarobot.PredictionExplanationsInitialization.create(project_id, model_id)
  • Compute predictions for the model and dataset via datarobot.Model.request_predictions(dataset_id)

threshold_high and threshold_low are optional filters applied to speed up computation. When at least one is specified, only the selected outlier rows will have prediction explanations computed. Rows are considered to be outliers if their predicted value (in case of regression projects) or probability of being the positive class (in case of classification projects) is less than threshold_low or greater than thresholdHigh. If neither is specified, prediction explanations will be computed for all rows.

Parameters:
project_id : str

id of the project the model belongs to

model_id : str

id of the model for which prediction explanations are requested

dataset_id : str

id of the prediction dataset for which prediction explanations are requested

threshold_low : float, optional

the lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset. If neither threshold_high nor threshold_low is specified, prediction explanations will be computed for all rows.

threshold_high : float, optional

the high threshold, above which a prediction must score in order for prediction explanations to be computed. If neither threshold_high nor threshold_low is specified, prediction explanations will be computed for all rows.

max_explanations : int, optional

the maximum number of prediction explanations to supply per row of the dataset, default: 3.

Returns:
job: Job

an instance of created async job

classmethod list(project_id, model_id=None, limit=None, offset=None)

List of prediction explanations for a specified project.

Parameters:
project_id : str

id of the project to list prediction explanations for

model_id : str, optional

if specified, only prediction explanations computed for this model will be returned

limit : int or None

at most this many results are returned, default: no limit

offset : int or None

this many results will be skipped, default: 0

Returns:
prediction_explanations : list[PredictionExplanations]
get_rows(batch_size=None, exclude_adjusted_predictions=True)

Retrieve prediction explanations rows.

Parameters:
batch_size : int or None, optional

maximum number of prediction explanations rows to retrieve per request

exclude_adjusted_predictions : bool

Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

Yields:
prediction_explanations_row : PredictionExplanationsRow

Represents prediction explanations computed for a prediction row.

get_all_as_dataframe(exclude_adjusted_predictions=True)

Retrieve all prediction explanations rows and return them as a pandas.DataFrame.

Returned dataframe has the following structure:

  • row_id : row id from prediction dataset
  • prediction : the output of the model for this row
  • adjusted_prediction : adjusted prediction values (only appears for projects that utilize prediction adjustments, e.g. projects with an exposure column)
  • class_0_label : a class level from the target (only appears for classification projects)
  • class_0_probability : the probability that the target is this class (only appears for classification projects)
  • class_1_label : a class level from the target (only appears for classification projects)
  • class_1_probability : the probability that the target is this class (only appears for classification projects)
  • explanation_0_feature : the name of the feature contributing to the prediction for this explanation
  • explanation_0_feature_value : the value the feature took on
  • explanation_0_label : the output being driven by this explanation. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.
  • explanation_0_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this explanation
  • explanation_0_strength : the amount this feature’s value affected the prediction
  • explanation_N_feature : the name of the feature contributing to the prediction for this explanation
  • explanation_N_feature_value : the value the feature took on
  • explanation_N_label : the output being driven by this explanation. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.
  • explanation_N_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this explanation
  • explanation_N_strength : the amount this feature’s value affected the prediction

For classification projects, the server does not guarantee any ordering on the prediction values, however within this function we sort the values so that class_X corresponds to the same class from row to row.

Parameters:
exclude_adjusted_predictions : bool

Optional, defaults to True. Set this to False to include adjusted prediction values in the returned dataframe.

Returns:
dataframe: pandas.DataFrame
download_to_csv(filename, encoding='utf-8', exclude_adjusted_predictions=True)

Save prediction explanations rows into CSV file.

Parameters:
filename : str or file object

path or file object to save prediction explanations rows

encoding : string, optional

A string representing the encoding to use in the output file, defaults to ‘utf-8’

exclude_adjusted_predictions : bool

Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

get_prediction_explanations_page(limit=None, offset=None, exclude_adjusted_predictions=True)

Get prediction explanations.

If you don’t want use a generator interface, you can access paginated prediction explanations directly.

Parameters:
limit : int or None

the number of records to return, the server will use a (possibly finite) default if not specified

offset : int or None

the number of records to skip, default 0

exclude_adjusted_predictions : bool

Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

Returns:
prediction_explanations : PredictionExplanationsPage
delete()

Delete these prediction explanations.

class datarobot.models.prediction_explanations.PredictionExplanationsRow(row_id, prediction, prediction_values, prediction_explanations=None, adjusted_prediction=None, adjusted_prediction_values=None)

Represents prediction explanations computed for a prediction row.

Notes

PredictionValue contains:

  • label : describes what this model output corresponds to. For regression projects, it is the name of the target feature. For classification projects, it is a level from the target feature.
  • value : the output of the prediction. For regression projects, it is the predicted value of the target. For classification projects, it is the predicted probability the row belongs to the class identified by the label.

PredictionExplanation contains:

  • label : described what output was driven by this explanation. For regression projects, it is the name of the target feature. For classification projects, it is the class whose probability increasing would correspond to a positive strength of this prediction explanation.
  • feature : the name of the feature contributing to the prediction
  • feature_value : the value the feature took on for this row
  • strength : the amount this feature’s value affected the prediction
  • qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’)
Attributes:
row_id : int

which row this PredictionExplanationsRow describes

prediction : float

the output of the model for this row

adjusted_prediction : float or None

adjusted prediction value for projects that provide this information, None otherwise

prediction_values : list

an array of dictionaries with a schema described as PredictionValue

adjusted_prediction_values : list

same as prediction_values but for adjusted predictions

prediction_explanations : list

an array of dictionaries with a schema described as PredictionExplanation

class datarobot.models.prediction_explanations.PredictionExplanationsPage(id, count=None, previous=None, next=None, data=None, prediction_explanations_record_location=None, adjustment_method=None)

Represents a batch of prediction explanations received by one request.

Attributes:
id : str

id of the prediction explanations computation result

data : list[dict]

list of raw prediction explanations; each row corresponds to a row of the prediction dataset

count : int

total number of rows computed

previous_page : str

where to retrieve previous page of prediction explanations, None if current page is the first

next_page : str

where to retrieve next page of prediction explanations, None if current page is the last

prediction_explanations_record_location : str

where to retrieve the prediction explanations metadata

adjustment_method : str

Adjustment method that was applied to predictions, or ‘N/A’ if no adjustments were done.

classmethod get(project_id, prediction_explanations_id, limit=None, offset=0, exclude_adjusted_predictions=True)

Retrieve prediction explanations.

Parameters:
project_id : str

id of the project the model belongs to

prediction_explanations_id : str

id of the prediction explanations

limit : int or None

the number of records to return; the server will use a (possibly finite) default if not specified

offset : int or None

the number of records to skip, default 0

exclude_adjusted_predictions : bool

Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

Returns:
prediction_explanations : PredictionExplanationsPage

The queried instance.

Predictions

class datarobot.models.Predictions(project_id, prediction_id, model_id=None, dataset_id=None, includes_prediction_intervals=None, prediction_intervals_size=None)

Represents predictions metadata and provides access to prediction results.

Examples

List all predictions for a project

import datarobot as dr

# Fetch all predictions for a project
all_predictions = dr.Predictions.list(project_id)

# Inspect all calculated predictions
for predictions in all_predictions:
    print(predictions)  # repr includes project_id, model_id, and dataset_id

Retrieve predictions by id

import datarobot as dr

# Getting predictions by id
predictions = dr.Predictions.get(project_id, prediction_id)

# Dump actual predictions
df = predictions.get_all_as_dataframe()
print(df)
Attributes:
project_id : str

id of the project the model belongs to

model_id : str

id of the model

prediction_id : str

id of generated predictions

includes_prediction_intervals : bool, optional

(New in v2.16) For time series projects only. Indicates if prediction intervals will be part of the response. Defaults to False.

prediction_intervals_size : int, optional

(New in v2.16) For time series projects only. Indicates the percentile used for prediction intervals calculation. Will be present only if includes_prediction_intervals is True.

classmethod list(project_id, model_id=None, dataset_id=None)

Fetch all the computed predictions metadata for a project.

Parameters:
project_id : str

id of the project

model_id : str, optional

if specified, only predictions metadata for this model will be retrieved

dataset_id : str, optional

if specified, only predictions metadata for this dataset will be retrieved

Returns:
A list of :py:class:`Predictions <datarobot.models.Predictions>` objects
classmethod get(project_id, prediction_id)

Retrieve the specific predictions metadata

Parameters:
project_id : str

id of the project the model belongs to

prediction_id : str

id of the prediction set

Returns:
:py:class:`Predictions <datarobot.models.Predictions>` object representing specified
predictions
get_all_as_dataframe(class_prefix='class_')

Retrieve all prediction rows and return them as a pandas.DataFrame.

Parameters:
class_prefix : str, optional

The prefix to append to labels in the final dataframe. Default is class_ (e.g., apple -> class_apple)

Returns:
dataframe: pandas.DataFrame
download_to_csv(filename, encoding='utf-8')

Save prediction rows into CSV file.

Parameters:
filename : str or file object

path or file object to save prediction rows

encoding : string, optional

A string representing the encoding to use in the output file, defaults to ‘utf-8’

PredictionServer

class datarobot.PredictionServer(id=None, url=None, datarobot_key=None)

A prediction server can be used to make predictions

Attributes:
id : str

the id of the prediction server

url : str

the url of the prediction server

datarobot_key : str

the datarobot-key header used in requests to this prediction server

classmethod list()

Returns a list of prediction servers a user can use to make predictions.

New in version v2.17.

Returns:
prediction_servers : list of PredictionServer instances

Contains a list of prediction servers that can be used to make predictions.

Examples

prediction_servers = PredictionServer.list()
prediction_servers
>>> [PredictionServer('https://example.com')]

Ruleset

class datarobot.models.Ruleset(project_id=None, parent_model_id=None, model_id=None, ruleset_id=None, rule_count=None, score=None)

Represents an approximation of a model with DataRobot Prime

Attributes:
id : str

the id of the ruleset

rule_count : int

the number of rules used to approximate the model

score : float

the validation score of the approximation

project_id : str

the project the approximation belongs to

parent_model_id : str

the model being approximated

model_id : str or None

the model using this ruleset (if it exists). Will be None if no such model has been trained.

request_model()

Request training for a model using this ruleset

Training a model using a ruleset is a necessary prerequisite for being able to download the code for a ruleset.

Returns:
job: Job

the job fitting the new Prime model

PrimeFile

class datarobot.models.PrimeFile(id=None, project_id=None, parent_model_id=None, model_id=None, ruleset_id=None, language=None, is_valid=None)

Represents an executable file available for download of the code for a DataRobot Prime model

Attributes:
id : str

the id of the PrimeFile

project_id : str

the id of the project this PrimeFile belongs to

parent_model_id : str

the model being approximated by this PrimeFile

model_id : str

the prime model this file represents

ruleset_id : int

the ruleset being used in this PrimeFile

language : str

the language of the code in this file - see enums.LANGUAGE for possibilities

is_valid : bool

whether the code passed basic validation

download(filepath)

Download the code and save it to a file

Parameters:
filepath: string

the location to save the file to

Project

class datarobot.models.Project(id=None, project_name=None, mode=None, target=None, target_type=None, holdout_unlocked=None, metric=None, stage=None, partition=None, positive_class=None, created=None, advanced_options=None, recommender=None, max_train_pct=None, max_train_rows=None, scaleout_max_train_pct=None, scaleout_max_train_rows=None, file_name=None)

A project built from a particular training dataset

Attributes:
id : str

the id of the project

project_name : str

the name of the project

mode : int

the autopilot mode currently selected for the project - 0 for full autopilot, 1 for semi-automatic, and 2 for manual

target : str

the name of the selected target features

target_type : str

Indicating what kind of modeling is being done in this project Options are: ‘Regression’, ‘Binary’ (Binary classification), ‘Multiclass’ (Multiclass classification)

holdout_unlocked : bool

whether the holdout has been unlocked

metric : str

the selected project metric (e.g. LogLoss)

stage : str

the stage the project has reached - one of datarobot.enums.PROJECT_STAGE

partition : dict

information about the selected partitioning options

positive_class : str

for binary classification projects, the selected positive class; otherwise, None

created : datetime

the time the project was created

advanced_options : dict

information on the advanced options that were selected for the project settings, e.g. a weights column or a cap of the runtime of models that can advance autopilot stages

recommender : dict

information on the recommender settings of the project (i.e. whether it is a recommender project, or the id columns)

max_train_pct : float

the maximum percentage of the project dataset that can be used without going into the validation data or being too large to submit any blueprint for training

max_train_rows : int

the maximum number of rows that can be trained on without going into the validation data or being too large to submit any blueprint for training

scaleout_max_train_pct : float

the maximum percentage of the project dataset that can be used to successfully train a scaleout model without going into the validation data. May exceed max_train_pct, in which case only scaleout models can be trained up to this point.

scaleout_max_train_rows : int

the maximum number of rows that can be used to successfully train a scaleout model without going into the validation data. May exceed max_train_rows, in which case only scaleout models can be trained up to this point.

file_name : str

the name of the file uploaded for the project dataset

classmethod get(project_id)

Gets information about a project.

Parameters:
project_id : str

The identifier of the project you want to load.

Returns:
project : Project

The queried project

Examples

import datarobot as dr
p = dr.Project.get(project_id='54e639a18bd88f08078ca831')
p.id
>>>'54e639a18bd88f08078ca831'
p.project_name
>>>'Some project name'
classmethod create(sourcedata, project_name='Untitled Project', max_wait=600, read_timeout=600, dataset_filename=None)

Creates a project with provided data.

Project creation is asynchronous process, which means that after initial request we will keep polling status of async process that is responsible for project creation until it’s finished. For SDK users this only means that this method might raise exceptions related to it’s async nature.

Parameters:
sourcedata : basestring, file or pandas.DataFrame

Dataset to use for the project. If string can be either a path to a local file, url to publicly available file or raw file content. If using a file, the filename must consist of ASCII characters only.

project_name : str, unicode, optional

The name to assign to the empty project.

max_wait : int, optional

Time in seconds after which project creation is considered unsuccessful

read_timeout: int

The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete

dataset_filename : string or None, optional

(New in version v2.14) File name to use for dataset. Ignored for url and file path sources.

Returns:
project : Project

Instance with initialized data.

Raises:
InputNotUnderstoodError

Raised if sourcedata isn’t one of supported types.

AsyncFailureError

Polling for status of async process resulted in response with unsupported status code. Beginning in version 2.1, this will be ProjectAsyncFailureError, a subclass of AsyncFailureError

AsyncProcessUnsuccessfulError

Raised if project creation was unsuccessful

AsyncTimeoutError

Raised if project creation took more time, than specified by max_wait parameter

Examples

p = Project.create('/home/datasets/somedataset.csv',
                   project_name="New API project")
p.id
>>> '5921731dkqshda8yd28h'
p.project_name
>>> 'New API project'
classmethod encrypted_string(plaintext)

Sends a string to DataRobot to be encrypted

This is used for passwords that DataRobot uses to access external data sources

Parameters:
plaintext : str

The string to encrypt

Returns:
ciphertext : str

The encrypted string

classmethod create_from_hdfs(url, port=None, project_name=None, max_wait=600)

Create a project from a datasource on a WebHDFS server.

Parameters:
url : str

The location of the WebHDFS file, both server and full path. Per the DataRobot specification, must begin with hdfs://, e.g. hdfs:///tmp/10kDiabetes.csv

port : int, optional

The port to use. If not specified, will default to the server default (50070)

project_name : str, optional

A name to give to the project

max_wait : int

The maximum number of seconds to wait before giving up.

Returns:
Project

Examples

p = Project.create_from_hdfs('hdfs:///tmp/somedataset.csv',
                             project_name="New API project")
p.id
>>> '5921731dkqshda8yd28h'
p.project_name
>>> 'New API project'
classmethod create_from_data_source(data_source_id, username, password, project_name=None, max_wait=600)

Create a project from a data source. Either data_source or data_source_id should be specified.

Parameters:
data_source_id : str

the identifier of the data source.

username : str

the username for database authentication.

password : str

the password for database authentication. The password is encrypted at server side and never saved / stored.

project_name : str, optional

optional, a name to give to the project.

max_wait : int

optional, the maximum number of seconds to wait before giving up.

Returns:
Project
classmethod from_async(async_location, max_wait=600)

Given a temporary async status location poll for no more than max_wait seconds until the async process (project creation or setting the target, for example) finishes successfully, then return the ready project

Parameters:
async_location : str

The URL for the temporary async status resource. This is returned as a header in the response to a request that initiates an async process

max_wait : int

The maximum number of seconds to wait before giving up.

Returns:
project : Project

The project, now ready

Raises:
ProjectAsyncFailureError

If the server returned an unexpected response while polling for the asynchronous operation to resolve

AsyncProcessUnsuccessfulError

If the final result of the asynchronous operation was a failure

AsyncTimeoutError

If the asynchronous operation did not resolve within the time specified

classmethod start(sourcedata, target, project_name='Untitled Project', worker_count=None, metric=None, autopilot_on=True, blueprint_threshold=None, response_cap=None, partitioning_method=None, positive_class=None, target_type=None)

Chain together project creation, file upload, and target selection.

Parameters:
sourcedata : str or pandas.DataFrame

The path to the file to upload. Can be either a path to a local file or a publicly accessible URL. If the source is a DataFrame, it will be serialized to a temporary buffer. If using a file, the filename must consist of ASCII characters only.

target : str

The name of the target column in the uploaded file.

project_name : str

The project name.

Returns:
project : Project

The newly created and initialized project.

Other Parameters:
 
worker_count : int, optional

The number of workers that you want to allocate to this project.

metric : str, optional

The name of metric to use.

autopilot_on : boolean, default True

Whether or not to begin modeling automatically.

blueprint_threshold : int, optional

Number of hours the model is permitted to run. Minimum 1

response_cap : float, optional

Quantile of the response distribution to use for response capping Must be in range 0.5 .. 1.0

partitioning_method : PartitioningMethod object, optional

It should be one of PartitioningMethod object.

positive_class : str, float, or int; optional

Specifies a level of the target column that should treated as the positive class for binary classification. May only be specified for binary classification targets.

target_type : str, optional

Override the automaticially selected target_type. An example usage would be setting the target_type=’Mutliclass’ when you want to preform a multiclass classification task on a numeric column that has a low cardinality. You can use TARGET_TYPE enum.

Raises:
AsyncFailureError

Polling for status of async process resulted in response with unsupported status code

AsyncProcessUnsuccessfulError

Raised if project creation or target setting was unsuccessful

AsyncTimeoutError

Raised if project creation or target setting timed out

Examples

Project.start("./tests/fixtures/file.csv",
              "a_target",
              project_name="test_name",
              worker_count=4,
              metric="a_metric")
classmethod list(search_params=None)

Returns the projects associated with this account.

Parameters:
search_params : dict, optional.

If not None, the returned projects are filtered by lookup. Currently you can query projects by:

  • project_name
Returns:
projects : list of Project instances

Contains a list of projects associated with this user account.

Raises:
TypeError

Raised if search_params parameter is provided, but is not of supported type.

Examples

List all projects .. code-block:: python

p_list = Project.list() p_list >>> [Project(‘Project One’), Project(‘Two’)]

Search for projects by name .. code-block:: python

Project.list(search_params={‘project_name’: ‘red’}) >>> [Project(‘Predtime’), Project(‘Fred Project’)]
refresh()

Fetches the latest state of the project, and updates this object with that information. This is an inplace update, not a new object.

Returns:
self : Project

the now-updated project

delete()

Removes this project from your account.

set_target(target, mode='auto', metric=None, quickrun=None, worker_count=None, positive_class=None, partitioning_method=None, featurelist_id=None, advanced_options=None, max_wait=600, target_type=None)

Set target variable of an existing project that has a file uploaded to it.

Target setting is asynchronous process, which means that after initial request we will keep polling status of async process that is responsible for target setting until it’s finished. For SDK users this only means that this method might raise exceptions related to it’s async nature.

Parameters:
target : str

Name of variable.

mode : str, optional

You can use AUTOPILOT_MODE enum to choose between

  • AUTOPILOT_MODE.FULL_AUTO
  • AUTOPILOT_MODE.MANUAL
  • AUTOPILOT_MODE.QUICK

If unspecified, FULL_AUTO is used

metric : str, optional

Name of the metric to use for evaluating models. You can query the metrics available for the target by way of Project.get_metrics. If none is specified, then the default recommended by DataRobot is used.

quickrun : bool, optional

Deprecated - pass AUTOPILOT_MODE.QUICK as mode instead. Sets whether project should be run in quick run mode. This setting causes DataRobot to recommend a more limited set of models in order to get a base set of models and insights more quickly.

worker_count : int, optional

The number of concurrent workers to request for this project. If None, then the default is used. (New in version v2.14) Setting this to -1 will request the maximum number available to your account.

partitioning_method : PartitioningMethod object, optional

It should be one of PartitioningMethod object.

positive_class : str, float, or int; optional

Specifies a level of the target column that should treated as the positive class for binary classification. May only be specified for binary classification targets.

featurelist_id : str, optional

Specifies which feature list to use.

advanced_options : AdvancedOptions, optional

Used to set advanced options of project creation.

max_wait : int, optional

Time in seconds after which target setting is considered unsuccessful.

target_type : str, optional

Override the automatically selected target_type. An example usage would be setting the target_type=’Mutliclass’ when you want to preform a multiclass classification task on a numeric column that has a low cardinality. You can use TARGET_TYPE enum.

Returns:
project : Project

The instance with updated attributes.

Raises:
AsyncFailureError

Polling for status of async process resulted in response with unsupported status code

AsyncProcessUnsuccessfulError

Raised if target setting was unsuccessful

AsyncTimeoutError

Raised if target setting took more time, than specified by max_wait parameter

TypeError

Raised if advanced_options, partitioning_method or target_type is provided, but is not of supported type

See also

datarobot.models.Project.start
combines project creation, file upload, and target selection
get_models(order_by=None, search_params=None, with_metric=None)

List all completed, successful models in the leaderboard for the given project.

Parameters:
order_by : str or list of strings, optional

If not None, the returned models are ordered by this attribute. If None, the default return is the order of default project metric.

Allowed attributes to sort by are:

  • metric
  • sample_pct

If the sort attribute is preceded by a hyphen, models will be sorted in descending order, otherwise in ascending order.

Multiple sort attributes can be included as a comma-delimited string or in a list e.g. order_by=`sample_pct,-metric` or order_by=[sample_pct, -metric]

Using metric to sort by will result in models being sorted according to their validation score by how well they did according to the project metric.

search_params : dict, optional.

If not None, the returned models are filtered by lookup. Currently you can query models by:

  • name
  • sample_pct
  • is_starred
with_metric : str, optional.

If not None, the returned models will only have scores for this metric. Otherwise all the metrics are returned.

Returns:
models : a list of Model instances.

All of the models that have been trained in this project.

Raises:
TypeError

Raised if order_by or search_params parameter is provided, but is not of supported type.

Examples

Project.get('pid').get_models(order_by=['-sample_pct',
                              'metric'])

# Getting models that contain "Ridge" in name
# and with sample_pct more than 64
Project.get('pid').get_models(
    search_params={
        'sample_pct__gt': 64,
        'name': "Ridge"
    })

# Filtering models based on 'starred' flag:
Project.get('pid').get_models(search_params={'is_starred': True})
get_datetime_models()

List all models in the project as DatetimeModels

Requires the project to be datetime partitioned. If it is not, a ClientError will occur.

Returns:
models : list of DatetimeModel

the datetime models

get_prime_models()

List all DataRobot Prime models for the project Prime models were created to approximate a parent model, and have downloadable code.

Returns:
models : list of PrimeModel
get_prime_files(parent_model_id=None, model_id=None)

List all downloadable code files from DataRobot Prime for the project

Parameters:
parent_model_id : str, optional

Filter for only those prime files approximating this parent model

model_id : str, optional

Filter for only those prime files with code for this prime model

Returns:
files: list of PrimeFile
get_datasets()

List all the datasets that have been uploaded for predictions

Returns:
datasets : list of PredictionDataset instances
upload_dataset(sourcedata, max_wait=600, read_timeout=600, forecast_point=None, predictions_start_date=None, predictions_end_date=None, dataset_filename=None, relax_known_in_advance_features_check=None)

Upload a new dataset to make predictions against

Parameters:
sourcedata : str, file or pandas.DataFrame

Data to be used for predictions. If string, can be either a path to a local file, url to publicly available file, or raw file content. If using a file on disk, the filename must consist of ASCII characters only.

max_wait : int, optional

The maximum number of seconds to wait for the uploaded dataset to be processed before raising an error.

read_timeout : int, optional

The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete

forecast_point : datetime.datetime or None, optional

(New in version v2.8) May only be specified for Time Series projects, otherwise the upload will be rejected. The time in the dataset relative to which predictions should be generated in a Time Series project. See the Time Series documentation for more information. If not provided, will default to using the latest forecast point in the dataset.

predictions_start_date : datetime.datetime or None, optional

(New in version v2.11) May only be specified for time series projects. The start date for bulk predictions. This parameter should be provided in conjunction with predictions_end_date. Cannot be provided with the forecast_point parameter.

predictions_end_date : datetime.datetime or None, optional

(New in version v2.11) May only be specified for time series projects. The end date for bulk predictions. This parameter should be provided in conjunction with predictions_start_date. Cannot be provided with the forecast_point parameter.

dataset_filename : string or None, optional

(New in version v2.14) File name to use for the dataset. Ignored for url and file path sources.

relax_known_in_advance_features_check : bool, optional

(New in version v2.15) For Time Series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

Returns:
dataset : PredictionDataset

The newly uploaded dataset.

Raises:
InputNotUnderstoodError

Raised if sourcedata isn’t one of supported types.

AsyncFailureError

Raised if polling for the status of an async process resulted in a response with an unsupported status code.

AsyncProcessUnsuccessfulError

Raised if project creation was unsuccessful (i.e. the server reported an error in uploading the dataset).

AsyncTimeoutError

Raised if processing the uploaded dataset took more time than specified by the max_wait parameter.

ValueError

Raised if forecast_point or predictions_start_date and predictions_end_date are provided, but are not of the supported type.

upload_dataset_from_data_source(data_source_id, username, password, max_wait=600, forecast_point=None, relax_known_in_advance_features_check=None)

Upload a new dataset from a data source to make predictions against

Parameters:
data_source_id : str

The identifier of the data source.

username : str

The username for database authentication.

password : str

The password for database authentication. The password is encrypted at server side and never saved / stored.

max_wait : int, optional

Optional, the maximum number of seconds to wait before giving up.

forecast_point : datetime.datetime or None, optional

(New in version v2.8) May only be specified for Time Series projects, otherwise the upload will be rejected. The time in the dataset relative to which predictions should be generated in a Time Series project. See the Time Series documentation for more information. If not provided, will default to using the latest forecast point in the dataset.

relax_known_in_advance_features_check : bool, optional

(New in version v2.15) For Time Series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

Returns:
dataset : PredictionDataset

the newly uploaded dataset

get_blueprints()

List all blueprints recommended for a project.

Returns:
menu : list of Blueprint instances

All the blueprints recommended by DataRobot for a project

get_features()

List all features for this project

Returns:
list of Feature

all features for this project

get_modeling_features(batch_size=None)

List all modeling features for this project

Only available once the target and partitioning settings have been set. For more information on the distinction between input and modeling features, see the time series documentation<input_vs_modeling>.

Parameters:
batch_size : int, optional

The number of features to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of features. If not specified, an appropriate default will be chosen by the server.

Returns:
list of ModelingFeature

All modeling features in this project

get_featurelists()

List all featurelists created for this project

Returns:
list of Featurelist

all featurelists created for this project

get_associations(assoc_type, metric)

Get pairwise feature association statistics for a project’s informative features

New in version v2.17.

Parameters:
assoc_type : string

the type of association, must be either ‘association’ or ‘correlation’

metric : string

the specified association metric, must be one of ‘mutualInfo’, ‘cramersV’, ‘spearman’, ‘pearson’ or ‘tau’

Returns:
association_data : dict

this data has 2 keys: features and strengths

features : list

a list of dictionaries for each feature with the following keys:

  • alphabetic_sort_index : int
    a number representing the alphabetical order of this feature compared to the other features in this dataset
  • feature : string
    the name of the feature
  • importance_sort_index: int
    a number ranking the importance of this feature compared to the other features in this dataset
  • strength_sort_index: int
    a number ranking the strength of this feature compared to the other features in this dataset
strengths : list

a list of dictionaries for pairwise strength data with the following keys:

  • feature1 : string
    the name of the first feature
  • feature2 : string
    the name of the second feature
  • statistic : float
    feature association statistics for feature1 and feature2
get_association_matrix_details(feature1, feature2)

Get a sample of the actual values used to measure the association between a pair of features

New in version v2.17.

Parameters:
feature1 : str

the name of the first feature of interest

feature2 : str

the name of the second feature of interest

Returns:
dict

this data has 3 keys: features, types and values

features : list

the name of feature1 and feature2

types : list

the type of feature1 and feature2, will be ‘C’ for categorical and ‘N’ for numeric.

values : list

a list of data triplets for pairwise plotting e.g. {“values”: [[460.0, 428.5, 0.001], [1679.3, 259.0, 0.001], …] The first entry of each list is a value of feature1, the second entry of each list is a value of feature2, and the third is the relative frequency of the pair of datapoints in the sample.

get_modeling_featurelists(batch_size=None)

List all modeling featurelists created for this project

Modeling featurelists can only be created after the target and partitioning options have been set for a project. In time series projects, these are the featurelists that can be used for modeling; in other projects, they behave the same as regular featurelists.

See the time series documentation for more information.

Parameters:
batch_size : int, optional

The number of featurelists to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of features. If not specified, an appropriate default will be chosen by the server.

Returns:
list of ModelingFeaturelist

all modeling featurelists in this project

create_type_transform_feature(name, parent_name, variable_type, replacement=None, date_extraction=None, max_wait=600)

Create a new feature by transforming the type of an existing feature in the project

Note that only the following transformations are supported:

  1. Text to categorical or numeric
  2. Categorical to text or numeric
  3. Numeric to categorical
  4. Date to categorical or numeric

Note

Special considerations when casting numeric to categorical

There are two parameters which can be used for variableType to convert numeric data to categorical levels. These differ in the assumptions they make about the input data, and are very important when considering the data that will be used to make predictions. The assumptions that each makes are:

  • categorical : The data in the column is all integral, and there are no missing values. If either of these conditions do not hold in the training set, the transformation will be rejected. During predictions, if any of the values in the parent column are missing, the predictions will error
  • categoricalInt : New in v2.6 All of the data in the column should be considered categorical in its string form when cast to an int by truncation. For example the value 3 will be cast as the string 3 and the value 3.14 will also be cast as the string 3. Further, the value -3.6 will become the string -3. Missing values will still be recognized as missing.

For convenience these are represented in the enum VARIABLE_TYPE_TRANSFORM with the names CATEGORICAL and CATEGORICAL_INT

Parameters:
name : str

The name to give to the new feature

parent_name : str

The name of the feature to transform

variable_type : str

The type the new column should have. See the values within datarobot.enums.VARIABLE_TYPE_TRANSFORM

replacement : str or float, optional

The value that missing or unconverable data should have

date_extraction : str, optional

Must be specified when parent_name is a date column (and left None otherwise). Specifies which value from a date should be extracted. See the list of values in datarobot.enums.DATE_EXTRACTION

max_wait : int, optional

The maximum amount of time to wait for DataRobot to finish processing the new column. This process can take more time with more data to process. If this operation times out, an AsyncTimeoutError will occur. DataRobot continues the processing and the new column may successfully be constructed.

Returns:
Feature

The data of the new Feature

Raises:
AsyncFailureError

If any of the responses from the server are unexpected

AsyncProcessUnsuccessfulError

If the job being waited for has failed or has been cancelled

AsyncTimeoutError

If the resource did not resolve in time

create_featurelist(name, features)

Creates a new featurelist

Parameters:
name : str

The name to give to this new featurelist. Names must be unique, so an error will be returned from the server if this name has already been used in this project.

features : list of str

The names of the features. Each feature must exist in the project already.

Returns:
Featurelist

newly created featurelist

Raises:
DuplicateFeaturesError

Raised if features variable contains duplicate features

Examples

project = Project.get('5223deadbeefdeadbeef0101')
flists = project.get_featurelists()

# Create a new featurelist using a subset of features from an
# existing featurelist
flist = flists[0]
features = flist.features[::2]  # Half of the features

new_flist = project.create_featurelist(name='Feature Subset',
                                       features=features)
create_modeling_featurelist(name, features)

Create a new modeling featurelist

Modeling featurelists can only be created after the target and partitioning options have been set for a project. In time series projects, these are the featurelists that can be used for modeling; in other projects, they behave the same as regular featurelists.

See the time series documentation for more information.

Parameters:
name : str

the name of the modeling featurelist to create. Names must be unique within the project, or the server will return an error.

features : list of str

the names of the features to include in the modeling featurelist. Each feature must be a modeling feature.

Returns:
featurelist : ModelingFeaturelist

the newly created featurelist

Examples

project = Project.get('1234deadbeeffeeddead4321')
modeling_features = project.get_modeling_features()
selected_features = [feat.name for feat in modeling_features][:5]  # select first five
new_flist = project.create_modeling_featurelist('Model This', selected_features)
get_metrics(feature_name)

Get the metrics recommended for modeling on the given feature.

Parameters:
feature_name : str

The name of the feature to query regarding which metrics are recommended for modeling.

Returns:
names : list of str

The names of the recommended metrics.

get_status()

Query the server for project status.

Returns:
status : dict

Contains:

  • autopilot_done : a boolean.
  • stage : a short string indicating which stage the project is in.
  • stage_description : a description of what stage means.

Examples

{"autopilot_done": False,
 "stage": "modeling",
 "stage_description": "Ready for modeling"}
pause_autopilot()

Pause autopilot, which stops processing the next jobs in the queue.

Returns:
paused : boolean

Whether the command was acknowledged

unpause_autopilot()

Unpause autopilot, which restarts processing the next jobs in the queue.

Returns:
unpaused : boolean

Whether the command was acknowledged.

start_autopilot(featurelist_id)

Starts autopilot on provided featurelist.

Only one autopilot can be running at the time. That’s why any ongoing autopilot on a different featurelist will be halted - modeling jobs in queue would not be affected but new jobs would not be added to queue by the halted autopilot.

Parameters:
featurelist_id : str

Identifier of featurelist that should be used for autopilot

Raises:
AppPlatformError

Raised if autopilot is currently running on or has already finished running on the provided featurelist. Also raised if project’s target was not selected.

train(trainable, sample_pct=None, featurelist_id=None, source_project_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)

Submit a job to the queue to train a model.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Note

If the project uses datetime partitioning, use train_datetime instead

Parameters:
trainable : str or Blueprint

For str, this is assumed to be a blueprint_id. If no source_project_id is provided, the project_id will be assumed to be the project that this instance represents.

Otherwise, for a Blueprint, it contains the blueprint_id and source_project_id that we want to use. featurelist_id will assume the default for this project if not provided, and sample_pct will default to using the maximum training value allowed for this project’s partition setup. source_project_id will be ignored if a Blueprint instance is used for this parameter

sample_pct : float, optional

The amount of data to use for training, as a percentage of the project dataset from 0 to 100.

featurelist_id : str, optional

The identifier of the featurelist to use. If not defined, the default for this project is used.

source_project_id : str, optional

Which project created this blueprint_id. If None, it defaults to looking in this project. Note that you must have read permissions in this project.

scoring_type : str, optional

Either SCORING_TYPE.validation or SCORING_TYPE.cross_validation. SCORING_TYPE.validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, SCORING_TYPE.cross_validation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.

training_row_count : int, optional

The number of rows to use to train the requested model.

monotonic_increasing_featurelist_id : str, optional

(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

monotonic_decreasing_featurelist_id : str, optional

(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns:
model_job_id : str

id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Examples

Use a Blueprint instance:

blueprint = project.get_blueprints()[0]
model_job_id = project.train(blueprint, training_row_count=project.max_train_rows)

Use a blueprint_id, which is a string. In the first case, it is assumed that the blueprint was created by this project. If you are using a blueprint used by another project, you will need to pass the id of that other project as well.

blueprint_id = 'e1c7fc29ba2e612a72272324b8a842af'
project.train(blueprint, training_row_count=project.max_train_rows)

another_project.train(blueprint, source_project_id=project.id)

You can also easily use this interface to train a new model using the data from an existing model:

model = project.get_models()[0]
model_job_id = project.train(model.blueprint.id,
                             sample_pct=100)
train_datetime(blueprint_id, featurelist_id=None, training_row_count=None, training_duration=None, source_project_id=None)

Create a new model in a datetime partitioned project

If the project is not datetime partitioned, an error will occur.

Parameters:
blueprint_id : str

the blueprint to use to train the model

featurelist_id : str, optional

the featurelist to use to train the model. If not specified, the project default will be used.

training_row_count : int, optional

the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.

training_duration : str, optional

a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.

source_project_id : str, optional

the id of the project this blueprint comes from, if not this project. If left unspecified, the blueprint must belong to this project.

Returns:
job : ModelJob

the created job to build the model

blend(model_ids, blender_method)

Submit a job for creating blender model. Upon success, the new job will be added to the end of the queue.

Parameters:
model_ids : list of str

List of model ids that will be used to create blender. These models should have completed validation stage without errors, and can’t be blenders, DataRobot Prime or scaleout models.

blender_method : str

Chosen blend method, one from datarobot.enums.BLENDER_METHOD

Returns:
model_job : ModelJob

New ModelJob instance for the blender creation job in queue.

See also

datarobot.models.Project.check_blendable
to confirm if models can be blended
check_blendable(model_ids, blender_method)

Check if the specified models can be successfully blended

Parameters:
model_ids : list of str

List of model ids that will be used to create blender. These models should have completed validation stage without errors, and can’t be blenders, DataRobot Prime or scaleout models.

blender_method : str

Chosen blend method, one from datarobot.enums.BLENDER_METHOD

Returns:
:class:`EligibilityResult <datarobot.helpers.eligibility_result.EligibilityResult>`
get_all_jobs(status=None)

Get a list of jobs

This will give Jobs representing any type of job, including modeling or predict jobs.

Parameters:
status : QUEUE_STATUS enum, optional

If called with QUEUE_STATUS.INPROGRESS, will return the jobs that are currently running.

If called with QUEUE_STATUS.QUEUE, will return the jobs that are waiting to be run.

If called with QUEUE_STATUS.ERROR, will return the jobs that have errored.

If no value is provided, will return all jobs currently running or waiting to be run.

Returns:
jobs : list

Each is an instance of Job

get_blenders()

Get a list of blender models.

Returns:
list of BlenderModel

list of all blender models in project.

get_frozen_models()

Get a list of frozen models

Returns:
list of FrozenModel

list of all frozen models in project.

get_model_jobs(status=None)

Get a list of modeling jobs

Parameters:
status : QUEUE_STATUS enum, optional

If called with QUEUE_STATUS.INPROGRESS, will return the modeling jobs that are currently running.

If called with QUEUE_STATUS.QUEUE, will return the modeling jobs that are waiting to be run.

If called with QUEUE_STATUS.ERROR, will return the modeling jobs that have errored.

If no value is provided, will return all modeling jobs currently running or waiting to be run.

Returns:
jobs : list

Each is an instance of ModelJob

get_predict_jobs(status=None)

Get a list of prediction jobs

Parameters:
status : QUEUE_STATUS enum, optional

If called with QUEUE_STATUS.INPROGRESS, will return the prediction jobs that are currently running.

If called with QUEUE_STATUS.QUEUE, will return the prediction jobs that are waiting to be run.

If called with QUEUE_STATUS.ERROR, will return the prediction jobs that have errored.

If called without a status, will return all prediction jobs currently running or waiting to be run.

Returns:
jobs : list

Each is an instance of PredictJob

wait_for_autopilot(check_interval=20.0, timeout=86400, verbosity=1)

Blocks until autopilot is finished. This will raise an exception if the autopilot mode is changed from AUTOPILOT_MODE.FULL_AUTO.

It makes API calls to sync the project state with the server and to look at which jobs are enqueued.

Parameters:
check_interval : float or int

The maximum time (in seconds) to wait between checks for whether autopilot is finished

timeout : float or int or None

After this long (in seconds), we give up. If None, never timeout.

verbosity:

This should be VERBOSITY_LEVEL.SILENT or VERBOSITY_LEVEL.VERBOSE. For VERBOSITY_LEVEL.SILENT, nothing will be displayed about progress. For VERBOSITY_LEVEL.VERBOSE, the number of jobs in progress or queued is shown. Note that new jobs are added to the queue along the way.

Raises:
AsyncTimeoutError

If autopilot does not finished in the amount of time specified

RuntimeError

If a condition is detected that indicates that autopilot will not complete on its own

rename(project_name)

Update the name of the project.

Parameters:
project_name : str

The new name

unlock_holdout()

Unlock the holdout for this project.

This will cause subsequent queries of the models of this project to contain the metric values for the holdout set, if it exists.

Take care, as this cannot be undone. Remember that best practice is to select a model before analyzing the model performance on the holdout set

set_worker_count(worker_count)

Sets the number of workers allocated to this project.

Note that this value is limited to the number allowed by your account. Lowering the number will not stop currently running jobs, but will cause the queue to wait for the appropriate number of jobs to finish before attempting to run more jobs.

Parameters:
worker_count : int

The number of concurrent workers to request from the pool of workers. (New in version v2.14) Setting this to -1 will update the number of workers to the maximum available to your account.

Returns:
url : str

Permanent static hyperlink to a project leaderboard.

open_leaderboard_browser()

Opens project leaderboard in web browser.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

get_rating_table_models()

Get a list of models with a rating table

Returns:
list of RatingTableModel

list of all models with a rating table in project.

get_rating_tables()

Get a list of rating tables

Returns:
list of RatingTable

list of rating tables in project.

get_access_list()

Retrieve users who have access to this project and their access levels

New in version v2.15.

Returns:
list of :class:`SharingAccess <datarobot.SharingAccess>`
share(access_list)

Modify the ability of users to access this project

New in version v2.15.

Parameters:
access_list : list of SharingAccess

the modifications to make.

Raises:
datarobot.ClientError :

if you do not have permission to share this project, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the project without an owner

Examples

Transfer access to the project from old_user@datarobot.com to new_user@datarobot.com

import datarobot as dr

new_access = dr.SharingAccess(new_user@datarobot.com,
                              dr.enums.SHARING_ROLE.OWNER, can_share=True)
access_list = [dr.SharingAccess(old_user@datarobot.com, None), new_access]

dr.Project.get('my-project-id').share(access_list)
batch_features_type_transform(parent_names, variable_type, prefix=None, suffix=None, max_wait=600)

Create new features by transforming the type of existing ones.

New in version v2.17.

Note

The following transformations are only supported in batch mode:

  1. Text to categorical or numeric
  2. Categorical to text or numeric
  3. Numeric to categorical

See here for special considerations when casting numeric to categorical. Date to categorical or numeric transformations are not currently supported for batch mode but can be performed individually using create_type_transform_feature.

Parameters:
parent_names : list

The list of variable names to be transformed.

variable_type : str

The type new columns should have. Can be one of ‘CATEGORICAL’, ‘CATEGORICAL_INT’, ‘NUMERIC’, and ‘TEXT’ - supported values can be found in datarobot.enums.VARIABLE_TYPE_TRANSFORM.

prefix : str, optional

Note

Either prefix, suffix, or both must be provided.

The string that will preface all feature names. At least one of prefix and suffix must be specified.

suffix : str, optional

Note

Either prefix, suffix, or both must be provided.

The string that will be appended at the end to all feature names. At least one of prefix and suffix must be specified.

max_wait : int, optional

The maximum amount of time to wait for DataRobot to finish processing the new column. This process can take more time with more data to process. If this operation times out, an AsyncTimeoutError will occur. DataRobot continues the processing and the new column may successfully be constructed.

Returns:
list of Features

all features for this project after transformation.

Raises:
TypeError:

If parent_names is not a list.

ValueError

If value of variable_type is not from datarobot.enums.VARIABLE_TYPE_TRANSFORM.

AsyncFailureError`

If any of the responses from the server are unexpected.

AsyncProcessUnsuccessfulError

If the job being waited for has failed or has been cancelled.

AsyncTimeoutError

If the resource did not resolve in time.

class datarobot.helpers.eligibility_result.EligibilityResult(supported, reason='', context='')

Represents whether a particular operation is supported

For instance, a function to check whether a set of models can be blended can return an EligibilityResult specifying whether or not blending is supported and why it may not be supported.

Attributes:
supported : bool

whether the operation this result represents is supported

reason : str

why the operation is or is not supported

context : str

what operation isn’t supported

Feature Association

class datarobot.models.feature_association.FeatureAssociation(metric=None, assoc_type=None)

Feature association statistics for a project.

Attributes:
type : str

Either ‘association’ or ‘correlation’ the class of the pairwise stats

metric : str

the metric of either class of pairwise stats ‘spearman’, ‘pearson’, etc for correlation, ‘mutualInfo’, ‘cramersV’ for association

Feature Association Matrix Details

class datarobot.models.feature_association.FeatureAssociationMatrixDetails(feature1=None, feature2=None)

Plotting details for a pair of passed features present in the feature association matrix

Attributes:
feature1 : str

Feature name for the first feature of interest

feature2 : str

Feature name for the second feature of interest

Rating Table

class datarobot.models.RatingTable(id, rating_table_name, original_filename, project_id, parent_model_id, model_id=None, model_job_id=None, validation_job_id=None, validation_error=None)

Interface to modify and download rating tables.

Attributes:
id : str

The id of the rating table.

project_id : str

The id of the project this rating table belongs to.

rating_table_name : str

The name of the rating table.

original_filename : str

The name of the file used to create the rating table.

parent_model_id : str

The model id of the model the rating table was validated against.

model_id : str

The model id of the model that was created from the rating table. Can be None if a model has not been created from the rating table.

model_job_id : str

The id of the job to create a model from this rating table. Can be None if a model has not been created from the rating table.

validation_job_id : str

The id of the created job to validate the rating table. Can be None if the rating table has not been validated.

validation_error : str

Contains a description of any errors caused during validation.

classmethod get(project_id, rating_table_id)

Retrieve a single rating table

Parameters:
project_id : str

The ID of the project the rating table is associated with.

rating_table_id : str

The ID of the rating table

Returns:
rating_table : RatingTable

The queried instance

classmethod create(project_id, parent_model_id, filename, rating_table_name='Uploaded Rating Table')

Uploads and validates a new rating table CSV

Parameters:
project_id : str

id of the project the rating table belongs to

parent_model_id : str

id of the model for which this rating table should be validated against

filename : str

The path of the CSV file containing the modified rating table.

rating_table_name : str, optional

A human friendly name for the new rating table. The string may be truncated and a suffix may be added to maintain unique names of all rating tables.

Returns:
job: Job

an instance of created async job

Raises:
InputNotUnderstoodError

Raised if filename isn’t one of supported types.

ClientError (400)

Raised if parent_model_id is invalid.

download(filepath)

Download a csv file containing the contents of this rating table

Parameters:
filepath : str

The path at which to save the rating table file.

rename(rating_table_name)

Renames a rating table to a different name.

Parameters:
rating_table_name : str

The new name to rename the rating table to.

create_model()

Creates a new model from this rating table record. This rating table must not already be associated with a model and must be valid.

Returns:
job: Job

an instance of created async job

Raises:
ClientError (422)

Raised if creating model from a RatingTable that failed validation

JobAlreadyRequested

Raised if creating model from a RatingTable that is already associated with a RatingTableModel

Reason Codes (Deprecated)

This interface is considered deprecated. Please use PredictionExplanations instead.

class datarobot.ReasonCodesInitialization(project_id, model_id, reason_codes_sample=None)

Represents a reason codes initialization of a model.

Attributes:
project_id : str

id of the project the model belongs to

model_id : str

id of the model reason codes initialization is for

reason_codes_sample : list of dict

a small sample of reason codes that could be generated for the model

classmethod get(project_id, model_id)

Retrieve the reason codes initialization for a model.

Reason codes initializations are a prerequisite for computing reason codes, and include a sample what the computed reason codes for a prediction dataset would look like.

Parameters:
project_id : str

id of the project the model belongs to

model_id : str

id of the model reason codes initialization is for

Returns:
reason_codes_initialization : ReasonCodesInitialization

The queried instance.

Raises:
ClientError (404)

If the project or model does not exist or the initialization has not been computed.

classmethod create(project_id, model_id)

Create a reason codes initialization for the specified model.

Parameters:
project_id : str

id of the project the model belongs to

model_id : str

id of the model for which initialization is requested

Returns:
job : Job

an instance of created async job

delete()

Delete this reason codes initialization.

class datarobot.ReasonCodes(id, project_id, model_id, dataset_id, max_codes, num_columns, finish_time, reason_codes_location, threshold_low=None, threshold_high=None)

Represents reason codes metadata and provides access to computation results.

Examples

reason_codes = dr.ReasonCodes.get(project_id, reason_codes_id)
for row in reason_codes.get_rows():
    print(row)  # row is an instance of ReasonCodesRow
Attributes:
id : str

id of the record and reason codes computation result

project_id : str

id of the project the model belongs to

model_id : str

id of the model reason codes initialization is for

dataset_id : str

id of the prediction dataset reason codes were computed for

max_codes : int

maximum number of reason codes to supply per row of the dataset

threshold_low : float

the lower threshold, below which a prediction must score in order for reason codes to be computed for a row in the dataset

threshold_high : float

the high threshold, above which a prediction must score in order for reason codes to be computed for a row in the dataset

num_columns : int

the number of columns reason codes were computed for

finish_time : float

timestamp referencing when computation for these reason codes finished

reason_codes_location : str

where to retrieve the reason codes

classmethod get(project_id, reason_codes_id)

Retrieve a specific reason codes.

Parameters:
project_id : str

id of the project the model belongs to

reason_codes_id : str

id of the reason codes

Returns:
reason_codes : ReasonCodes

The queried instance.

classmethod create(project_id, model_id, dataset_id, max_codes=None, threshold_low=None, threshold_high=None)

Create a reason codes for the specified dataset.

In order to create ReasonCodesPage for a particular model and dataset, you must first:

  • Compute feature impact for the model via datarobot.Model.get_feature_impact()
  • Compute a ReasonCodesInitialization for the model via datarobot.ReasonCodesInitialization.create(project_id, model_id)
  • Compute predictions for the model and dataset via datarobot.Model.request_predictions(dataset_id)

threshold_high and threshold_low are optional filters applied to speed up computation. When at least one is specified, only the selected outlier rows will have reason codes computed. Rows are considered to be outliers if their predicted value (in case of regression projects) or probability of being the positive class (in case of classification projects) is less than threshold_low or greater than thresholdHigh. If neither is specified, reason codes will be computed for all rows.

Parameters:
project_id : str

id of the project the model belongs to

model_id : str

id of the model for which reason codes are requested

dataset_id : str

id of the prediction dataset for which reason codes are requested

threshold_low : float, optional

the lower threshold, below which a prediction must score in order for reason codes to be computed for a row in the dataset. If neither threshold_high nor threshold_low is specified, reason codes will be computed for all rows.

threshold_high : float, optional

the high threshold, above which a prediction must score in order for reason codes to be computed. If neither threshold_high nor threshold_low is specified, reason codes will be computed for all rows.

max_codes : int, optional

the maximum number of reason codes to supply per row of the dataset, default: 3.

Returns:
job: Job

an instance of created async job

classmethod list(project_id, model_id=None, limit=None, offset=None)

List of reason codes for a specified project.

Parameters:
project_id : str

id of the project to list reason codes for

model_id : str, optional

if specified, only reason codes computed for this model will be returned

limit : int or None

at most this many results are returned, default: no limit

offset : int or None

this many results will be skipped, default: 0

Returns:
reason_codes : list[ReasonCodes]
get_rows(batch_size=None, exclude_adjusted_predictions=True)

Retrieve reason codes rows.

Parameters:
batch_size : int

maximum number of reason codes rows to retrieve per request

exclude_adjusted_predictions : bool

Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

Yields:
reason_codes_row : ReasonCodesRow

Represents reason codes computed for a prediction row.

get_all_as_dataframe(exclude_adjusted_predictions=True)

Retrieve all reason codes rows and return them as a pandas.DataFrame.

Returned dataframe has the following structure:

  • row_id : row id from prediction dataset
  • prediction : the output of the model for this row
  • adjusted_prediction : adjusted prediction values (only appears for projects that utilize prediction adjustments, e.g. projects with an exposure column)
  • class_0_label : a class level from the target (only appears for classification projects)
  • class_0_probability : the probability that the target is this class (only appears for classification projects)
  • class_1_label : a class level from the target (only appears for classification projects)
  • class_1_probability : the probability that the target is this class (only appears for classification projects)
  • reason_0_feature : the name of the feature contributing to the prediction for this reason
  • reason_0_feature_value : the value the feature took on
  • reason_0_label : the output being driven by this reason. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.
  • reason_0_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this reason
  • reason_0_strength : the amount this feature’s value affected the prediction
  • reason_N_feature : the name of the feature contributing to the prediction for this reason
  • reason_N_feature_value : the value the feature took on
  • reason_N_label : the output being driven by this reason. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.
  • reason_N_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this reason
  • reason_N_strength : the amount this feature’s value affected the prediction
Parameters:
exclude_adjusted_predictions : bool

Optional, defaults to True. Set this to False to include adjusted prediction values in the returned dataframe.

Returns:
dataframe: pandas.DataFrame
download_to_csv(filename, encoding='utf-8', exclude_adjusted_predictions=True)

Save reason codes rows into CSV file.

Parameters:
filename : str or file object

path or file object to save reason codes rows

encoding : string, optional

A string representing the encoding to use in the output file, defaults to ‘utf-8’

exclude_adjusted_predictions : bool

Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

get_reason_codes_page(limit=None, offset=None, exclude_adjusted_predictions=True)

Get reason codes.

If you don’t want use a generator interface, you can access paginated reason codes directly.

Parameters:
limit : int or None

the number of records to return, the server will use a (possibly finite) default if not specified

offset : int or None

the number of records to skip, default 0

exclude_adjusted_predictions : bool

Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

Returns:
reason_codes : ReasonCodesPage
delete()

Delete this reason codes.

class datarobot.models.reason_codes.ReasonCodesRow(row_id, prediction, prediction_values, reason_codes=None, adjusted_prediction=None, adjusted_prediction_values=None)

Represents reason codes computed for a prediction row.

Notes

PredictionValue contains:

  • label : describes what this model output corresponds to. For regression projects, it is the name of the target feature. For classification projects, it is a level from the target feature.
  • value : the output of the prediction. For regression projects, it is the predicted value of the target. For classification projects, it is the predicted probability the row belongs to the class identified by the label.

ReasonCode contains:

  • label : described what output was driven by this reason code. For regression projects, it is the name of the target feature. For classification projects, it is the class whose probability increasing would correspond to a positive strength of this reason code.
  • feature : the name of the feature contributing to the prediction
  • feature_value : the value the feature took on for this row
  • strength : the amount this feature’s value affected the prediction
  • qualitativate_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’)
Attributes:
row_id : int

which row this ReasonCodeRow describes

prediction : float

the output of the model for this row

adjusted_prediction : float or None

adjusted prediction value for projects that provide this information, None otherwise

prediction_values : list

an array of dictionaries with a schema described as PredictionValue

adjusted_prediction_values : list

same as prediction_values but for adjusted predictions

reason_codes : list

an array of dictionaries with a schema described as ReasonCode

class datarobot.models.reason_codes.ReasonCodesPage(id, count=None, previous=None, next=None, data=None, reason_codes_record_location=None, adjustment_method=None)

Represents batch of reason codes received by one request.

Attributes:
id : str

id of the reason codes computation result

data : list[dict]

list of raw reason codes, each row corresponds to a row of the prediction dataset

count : int

total number of rows computed

previous_page : str

where to retrieve previous page of reason codes, None if current page is the first

next_page : str

where to retrieve next page of reason codes, None if current page is the last

reason_codes_record_location : str

where to retrieve the reason codes metadata

adjustment_method : str

Adjustment method that was applied to predictions, or ‘N/A’ if no adjustments were done.

classmethod get(project_id, reason_codes_id, limit=None, offset=0, exclude_adjusted_predictions=True)

Retrieve reason codes.

Parameters:
project_id : str

id of the project the model belongs to

reason_codes_id : str

id of the reason codes

limit : int or None

the number of records to return, the server will use a (possibly finite) default if not specified

offset : int or None

the number of records to skip, default 0

exclude_adjusted_predictions : bool

Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

Returns:
reason_codes : ReasonCodesPage

The queried instance.

ROC Curve

class datarobot.models.roc_curve.RocCurve(source, roc_points, negative_class_predictions, positive_class_predictions, source_model_id)

ROC curve data for model.

Attributes:
source : str

ROC curve data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.

roc_points : list of dict

List of precalculated metrics associated with thresholds for ROC curve.

negative_class_predictions : list of float

List of predictions from example for negative class

positive_class_predictions : list of float

List of predictions from example for positive class

source_model_id : str

ID of the model this ROC curve represents; in some cases, insights from the parent of a frozen model may be used

estimate_threshold(threshold)

Return metrics estimation for given threshold.

Parameters:
threshold : float from [0, 1] interval

Threshold we want estimation for

Returns:
dict

Dictionary of estimated metrics in form of {metric_name: metric_value}. Metrics are ‘accuracy’, ‘f1_score’, ‘false_negative_score’, ‘true_negative_score’, ‘true_negative_rate’, ‘matthews_correlation_coefficient’, ‘true_positive_score’, ‘positive_predictive_value’, ‘false_positive_score’, ‘false_positive_rate’, ‘negative_predictive_value’, ‘true_positive_rate’.

Raises:
ValueError

Given threshold isn’t from [0, 1] interval

get_best_f1_threshold()

Return value of threshold that corresponds to max F1 score. This is threshold that will be preselected in DataRobot when you open “ROC curve” tab.

Returns:
float

Threhold with best F1 score.

SharingAccess

class datarobot.SharingAccess(username, role, can_share=None, user_id=None)

Represents metadata about whom a entity (e.g. a data store) has been shared with

New in version v2.14.

Currently DataStores, DataSources, Projects (new in version v2.15) and CalendarFiles (new in version 2.15) can be shared.

This class can represent either access that has already been granted, or be used to grant access to additional users.

Attributes:
username : str

a particular user

role : str or None

if a string, represents a particular level of access and should be one of datarobot.enums.SHARING_ROLE. For more information on the specific access levels, see the sharing documentation. If None, can be passed to a share function to revoke access for a specific user.

can_share : bool or None

if a bool, indicates whether this user is permitted to further share. When False, the user has access to the entity, but can only revoke their own access but not modify any user’s access role. When True, the user can share with any other user at a access role up to their own. May be None if the SharingAccess was not retrieved from the DataRobot server but intended to be passed into a share function; this will be equivalent to passing True.

user_id : str

the id of the user

Training Predictions

class datarobot.models.training_predictions.TrainingPredictionsIterator(client, path, limit=None)

Lazily fetches training predictions from DataRobot API in chunks of specified size and then iterates rows from responses as named tuples. Each row represents a training prediction computed for a dataset’s row. Each named tuple has the following structure:

Notes

Each PredictionValue dict contains these keys:

label
describes what this model output corresponds to. For regression projects, it is the name of the target feature. For classification and multiclass projects, it is a label from the target feature.
value
the output of the prediction. For regression projects, it is the predicted value of the target. For classification and multiclass projects, it is the predicted probability that the row belongs to the class identified by the label.

Examples

import datarobot as dr

# Fetch existing training predictions by their id
training_predictions = dr.TrainingPredictions.get(project_id, prediction_id)

# Iterate over predictions
for row in training_predictions.iterate_rows()
    print(row.row_id, row.prediction)
Attributes:
row_id : int

id of the record in original dataset for which training prediction is calculated

partition_id : str or float

id of the data partition that the row belongs to

prediction : float

the model’s prediction for this data row

prediction_values : list of dictionaries

an array of dictionaries with a schema described as PredictionValue

timestamp : str or None

(New in version v2.11) an ISO string representing the time of the prediction in time series project; may be None for non-time series projects

forecast_point : str or None

(New in version v2.11) an ISO string representing the point in time used as a basis to generate the predictions in time series project; may be None for non-time series projects

forecast_distance : str or None

(New in version v2.11) how many time steps are between the forecast point and the timestamp in time series project; None for non-time series projects

series_id : str or None

(New in version v2.11) the id of the series in a multiseries project; may be NaN for single series projects; None for non-time series projects

class datarobot.models.training_predictions.TrainingPredictions(project_id, prediction_id, model_id=None, data_subset=None)

Represents training predictions metadata and provides access to prediction results.

Examples

Compute training predictions for a model on the whole dataset

import datarobot as dr

# Request calculation of training predictions
training_predictions_job = model.request_training_predictions(dr.enums.DATA_SUBSET.ALL)
training_predictions = training_predictions_job.get_result_when_complete()
print('Training predictions {} are ready'.format(training_predictions.prediction_id))

# Iterate over actual predictions
for row in training_predictions.iterate_rows():
    print(row.row_id, row.partition_id, row.prediction)

List all training predictions for a project

import datarobot as dr

# Fetch all training predictions for a project
all_training_predictions = dr.TrainingPredictions.list(project_id)

# Inspect all calculated training predictions
for training_predictions in all_training_predictions:
    print(
        'Prediction {} is made for data subset "{}"'.format(
            training_predictions.prediction_id,
            training_predictions.data_subset,
        )
    )

Retrieve training predictions by id

import datarobot as dr

# Getting training predictions by id
training_predictions = dr