API Reference¶
Advanced Options¶
-
class
datarobot.helpers.
AdvancedOptions
(weights=None, response_cap=None, blueprint_threshold=None, seed=None, smart_downsampled=False, majority_downsampling_rate=None, offset=None, exposure=None, accuracy_optimized_mb=None, scaleout_modeling_mode=None, events_count=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, only_include_monotonic_blueprints=None, allowed_pairwise_interaction_groups=None, blend_best_models=None, scoring_code_only=None, prepare_model_for_deployment=None, min_secondary_validation_model_count=None)¶ Used when setting the target of a project to set advanced options of modeling process.
Parameters: - weights : string, optional
The name of a column indicating the weight of each row
- response_cap : float in [0.5, 1), optional
Quantile of the response distribution to use for response capping.
- blueprint_threshold : int, optional
Number of hours models are permitted to run before being excluded from later autopilot stages Minimum 1
- seed : int
a seed to use for randomization
- smart_downsampled : bool
whether to use smart downsampling to throw away excess rows of the majority class. Only applicable to classification and zero-boosted regression projects.
- majority_downsampling_rate : float
the percentage between 0 and 100 of the majority rows that should be kept. Specify only if using smart downsampling. May not cause the majority class to become smaller than the minority class.
- offset : list of str, optional
(New in version v2.6) the list of the names of the columns containing the offset of each row
- exposure : string, optional
(New in version v2.6) the name of a column containing the exposure of each row
- accuracy_optimized_mb : bool, optional
(New in version v2.6) Include additional, longer-running models that will be run by the autopilot and available to run manually.
- scaleout_modeling_mode : string, optional
(New in version v2.8) Specifies the behavior of Scaleout models for the project. This is one of
datarobot.enums.SCALEOUT_MODELING_MODE
. Ifdatarobot.enums.SCALEOUT_MODELING_MODE.DISABLED
, no models will run during autopilot or show in the list of available blueprints. Scaleout models must be disabled for some partitioning settings including projects using datetime partitioning or projects using offset or exposure columns. Ifdatarobot.enums.SCALEOUT_MODELING_MODE.REPOSITORY_ONLY
, scaleout models will be in the list of available blueprints but not run during autopilot. Ifdatarobot.enums.SCALEOUT_MODELING_MODE.AUTOPILOT
, scaleout models will run during autopilot and be in the list of available blueprints. Scaleout models are only supported in the Hadoop enviroment with the corresponding user permission set.- events_count : string, optional
(New in version v2.8) the name of a column specifying events count.
- monotonic_increasing_featurelist_id : string, optional
(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overriden at model submission time if desired.
- monotonic_decreasing_featurelist_id : string, optional
(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overriden at model submission time if desired.
- only_include_monotonic_blueprints : bool, optional
(new in version 2.11) when true, only blueprints that support enforcing monotonic constraints will be available in the project or selected for the autopilot.
- allowed_pairwise_interaction_groups : list of tuple, optional
(New in version v2.19) For GAM models - specify groups of columns for which pairwise interactions will be allowed. E.g. if set to [(A, B, C), (C, D)] then GAM models will allow interactions between columns AxB, BxC, AxC, CxD. All others (AxD, BxD) will not be considered.
- blend_best_models: bool, optional
(New in version v2.19) blend best models during Autopilot run
- scoring_code_only: bool, optional
(New in version v2.19) Keep only models that can be converted to scorable java code during Autopilot run.
- prepare_model_for_deployment: bool, optional
(New in version v2.19) Prepare model for deployment during Autopilot run. The preparation includes creating reduced feature list models, retraining best model on higher sample size, computing insights and assigning “RECOMMENDED FOR DEPLOYMENT” label.
- min_secondary_validation_model_count: int, optional
(New in version v2.19) Compute “All backtest” scores (datetime models) or cross validation scores for the specified number of highest ranking models on the Leaderboard, if over the Autopilot default.
Examples
import datarobot as dr advanced_options = dr.AdvancedOptions( weights='weights_column', offset=['offset_column'], exposure='exposure_column', response_cap=0.7, blueprint_threshold=2, smart_downsampled=True, majority_downsampling_rate=75.0)
Batch Predictions¶
-
class
datarobot.models.
BatchPredictionJob
(data, completed_resource_url=None)¶ A Batch Prediction Job is used to score large data sets on prediction servers using the Batch Prediction API.
Attributes: - id : str
the id of the job
-
classmethod
score
(deployment, intake_settings=None, output_settings=None, csv_settings=None, num_concurrent=None, passthrough_columns=None, passthrough_columns_set=None, max_explanations=None, threshold_high=None, threshold_low=None, prediction_warning_enabled=None, include_prediction_status=False, skip_drift_tracking=False, prediction_instance=None, abort_on_error=True, column_names_remapping=None, include_probabilities=True, include_probabilities_classes=None)¶ Create new batch prediction job, upload the scoring dataset and return a batch prediction job.
The default intake and output options are both local_file which requires the caller to pass the file parameter and either download the results using the download() method afterwards or pass a path to a file where the scored data will be downloaded to afterwards.
Returns: - BatchPredictionJob
Instance of BatchPredictonJob
Attributes: - deployment : Deployment or string ID
Deployment which will be used for scoring.
- intake_settings : dict (optional)
A dict configuring how data is coming from. Supported options:
- type : string, either local_file, s3, dataset or jdbc
To score from a local file, add the this parameter to the settings:
- file : file-like object, string path to file or a pandas.DataFrame of scoring data
To score from S3, add the next parameters to the settings:
- url : string, the URL to score (e.g.: s3://bucket/key)
- credential_id : string (optional)
To score from JDBC, add the next parameters to the settings:
- data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).
- query : string (optional if table and schema is specified), a self-supplied SELECT statement of the data set you wish to predict.
- table : string (optional if query is specified), the name of specified database table.
- schema : string (optional if query is specified), the name of specified database schema.
- fetch_size : int (optional), Changing the fetchSize can be used to balance throughput and memory usage.
- credential_id : string (optional) the ID of the credentials holding information about a user with read-access to the JDBC data source (see Credentials).
- output_settings : dict (optional)
A dict configuring how scored data is to be saved. Supported options:
- type : string, either local_file, s3 or jdbc
To save scored data to a local file, add this parameters to the settings:
- path : string (optional), path to save the scored data as CSV. If a path is not specified, you must download the scored data yourself with job.download(). If a path is specified, the call will block until the job is done. if there are no other jobs currently processing for the targeted prediction instance, uploading, scoring, downloading will happen in parallel without waiting for a full job to complete. Otherwise, it will still block, but start downloading the scored data as soon as it starts generating data. This is the fastest method to get predictions.
To save scored data to S3, add the next parameters to the settings:
- url : string, the URL for storing the results (e.g.: s3://bucket/key)
- credential_id : string (optional)
To save scored data to JDBC, add the next parameters to the settings:
- data_store_od : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).
- table : string, the name of specified database table.
- schema : string (optional), the name of specified database schema.
- statement_type : string, the type of insertion statement to create,
one of
datarobot.enums.AVAILABLE_STATEMENT_TYPES
. - update_columns : list(string) (optional), a list of strings containing those column names to be updated in case statement_type is set to a value related to update or upsert.
- where_columns : list(string) (optional), a list of strings containing those column names to be selected in case statement_type is set to a value related to insert or update.
- credential_id : string, the ID of the credentials holding information about a user with write-access to the JDBC data source (see Credentials).
- csv_settings : dict (optional)
CSV intake and output settings. Supported options:
- delimiter : string (optional, default ,), fields are delimited by this character. Use the string tab to denote TSV (TAB separated values). Must be either a one-character string or the string tab.
- quotechar : string (optional, default “), fields containing the delimiter must be quoted using this character.
- encoding : string (optional, default utf-8), encoding for the CSV files. For example (but not limited to): shift_jis, latin_1 or mskanji.
- num_concurrent : int (optional)
Number of concurrent chunks to score simultaneously. Defaults to the available number of cores of the deployment. Lower it to leave resources for real-time scoring.
- passthrough_columns : list[string] (optional)
Keep these columns from the scoring dataset in the scored dataset. This is useful for correlating predictions with source data.
- passthrough_columns_set : string (optional)
To pass through every column from the scoring dataset, set this to all. Takes precedence over passthrough_columns if set.
- max_explanations : int (optional)
Compute prediction explanations for this amount of features.
- threshold_high : float (optional)
Only compute prediction explanations for predictions above this threshold. Can be combined with threshold_low.
- threshold_low : float (optional)
Only compute prediction explanations for predictions below this threshold. Can be combined with threshold_high.
- prediction_warning_enabled : boolean (optional)
Add prediction warnings to the scored data. Currently only supported for regression models.
- include_prediction_status : boolean (optional)
Include the prediction_status column in the output, defaults to False.
- skip_drift_tracking : boolean (optional)
Skips drift tracking on any predictions made from this job. This is useful when running non-production workloads to not affect drift tracking and cause unnecessary alerts. Defaults to False.
- prediction_instance : dict (optional)
Defaults to instance specified by deployment or system configuration. Supported options:
- hostName : string
- sslEnabled : boolean (optional, default true). Set to false to run prediction requests from the batch prediction job without SSL.
- datarobotKey : string (optional), if running a job against a prediction instance in the Managed AI Cloud, you must provide the organization level DataRobot-Key
- apiKey : string (optional), by default, prediction requests will use the API key of the user that created the job. This allows you to make requests on behalf of other users.
- abort_on_error : boolean (optional)
Default behaviour is to abort the job if too many rows fail scoring. This will free up resources for other jobs that may score successfully. Set to false to unconditionally score every row no matter how many errors are encountered. Defaults to True.
- column_names_remapping : dict (optional)
Mapping with column renaming for output table. Defaults to {}.
- include_probabilities : boolean (optional)
Flag that enables returning of all probability columns. Defaults to True.
- include_probabilities_classes : list (optional)
List the subset of classes if a user doesn’t want all the classes. Defaults to [].
-
classmethod
score_to_file
(deployment, intake_path, output_path, **kwargs)¶ Create new batch prediction job, upload the scoring dataset and download the scored CSV file concurrently.
Will block until the entire file is scored.
Refer to the create method for details on the other kwargs parameters.
Returns: - BatchPredictionJob
Instance of BatchPredictonJob
Attributes: - deployment : Deployment or string ID
Deployment which will be used for scoring.
- intake_path : file-like object/string path to file/pandas.DataFrame
Scoring data
- output_path : str
Filename to save the result under
-
classmethod
score_s3
(deployment, source_url, destination_url, credential=None, **kwargs)¶ Create new batch prediction job, with a scoring dataset from S3 and writing the result back to S3.
This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion().
Refer to the create method for details on the other kwargs parameters.
Returns: - BatchPredictionJob
Instance of BatchPredictonJob
Attributes: - deployment : Deployment or string ID
Deployment which will be used for scoring.
- source_url : string
The URL for the prediction dataset (e.g.: s3://bucket/key)
- destination_url : string
The URL for the scored dataset (e.g.: s3://bucket/key)
- credential : string or Credential (optional)
The AWS Credential object or credential id
-
classmethod
get
(batch_prediction_job_id)¶ Get batch prediction job
Returns: - BatchPredictionJob
Instance of BatchPredictonJob
Attributes: - batch_prediction_job_id: str
ID of batch prediction job
-
download
(fileobj)¶ Downloads the CSV result of a prediction job
Attributes: - fileobj: file-like object
Write CSV data to this file-like object
-
delete
()¶ Cancel this job. If this job has not finished running, it will be removed and canceled.
-
get_status
()¶ Get status of batch prediction job
Returns: - BatchPredictionJob status data
Dict with job status
-
classmethod
list_by_status
(statuses=None)¶ Get jobs collection for specific set of statuses
Returns: - BatchPredictionJob statuses
List of job statses dicts with specific statuses
Attributes: - statuses
List of statuses to filter jobs ([ABORTED|COMPLETED…]) if statuses is not provided, returns all jobs for user
Blueprint¶
-
class
datarobot.models.
Blueprint
(id=None, processes=None, model_type=None, project_id=None, blueprint_category=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, recommended_featurelist_id=None)¶ A Blueprint which can be used to fit models
Attributes: - id : str
the id of the blueprint
- processes : list of str
the processes used by the blueprint
- model_type : str
the model produced by the blueprint
- project_id : str
the project the blueprint belongs to
- blueprint_category : str
(New in version v2.6) Describes the category of the blueprint and the kind of model it produces.
- recommended_featurelist_id: str or null
(New in v2.18) The ID of the feature list recommended for this blueprint. If this field is not present, then there is no recommended feature list.
-
classmethod
get
(project_id, blueprint_id)¶ Retrieve a blueprint.
Parameters: - project_id : str
The project’s id.
- blueprint_id : str
Id of blueprint to retrieve.
Returns: - blueprint : Blueprint
The queried blueprint.
-
get_chart
()¶ Retrieve a chart.
Returns: - BlueprintChart
The current blueprint chart.
-
get_documents
()¶ Get documentation for tasks used in the blueprint.
Returns: - list of BlueprintTaskDocument
All documents available for blueprint.
-
class
datarobot.models.
BlueprintTaskDocument
(title=None, task=None, description=None, parameters=None, links=None, references=None)¶ Document describing a task from a blueprint.
Attributes: - title : str
Title of document.
- task : str
Name of the task described in document.
- description : str
Task description.
- parameters : list of dict(name, type, description)
Parameters that task can receive in human-readable format.
- links : list of dict(name, url)
External links used in document
- references : list of dict(name, url)
References used in document. When no link available url equals None.
-
class
datarobot.models.
BlueprintChart
(nodes, edges)¶ A Blueprint chart that can be used to understand data flow in blueprint.
Attributes: - nodes : list of dict (id, label)
Chart nodes, id unique in chart.
- edges : list of tuple (id1, id2)
Directions of data flow between blueprint chart nodes.
-
classmethod
get
(project_id, blueprint_id)¶ Retrieve a blueprint chart.
Parameters: - project_id : str
The project’s id.
- blueprint_id : str
Id of blueprint to retrieve chart.
Returns: - BlueprintChart
The queried blueprint chart.
-
to_graphviz
()¶ Get blueprint chart in graphviz DOT format.
Returns: - unicode
String representation of chart in graphviz DOT language.
-
class
datarobot.models.
ModelBlueprintChart
(nodes, edges)¶ A Blueprint chart that can be used to understand data flow in model. Model blueprint chart represents reduced repository blueprint chart with only elements that used to build this particular model.
Attributes: - nodes : list of dict (id, label)
Chart nodes, id unique in chart.
- edges : list of tuple (id1, id2)
Directions of data flow between blueprint chart nodes.
-
classmethod
get
(project_id, model_id)¶ Retrieve a model blueprint chart.
Parameters: - project_id : str
The project’s id.
- model_id : str
Id of model to retrieve model blueprint chart.
Returns: - ModelBlueprintChart
The queried model blueprint chart.
-
to_graphviz
()¶ Get blueprint chart in graphviz DOT format.
Returns: - unicode
String representation of chart in graphviz DOT language.
Calendar File¶
-
class
datarobot.
CalendarFile
(calendar_end_date=None, calendar_start_date=None, created=None, id=None, name=None, num_event_types=None, num_events=None, project_ids=None, role=None, multiseries_id_columns=None)¶ Represents the data for a calendar file.
For more information about calendar files, see the calendar documentation.
Attributes: - id : str
The id of the calendar file.
- calendar_start_date : str
The earliest date in the calendar.
- calendar_end_date : str
The last date in the calendar.
- created : str
The date this calendar was created, i.e. uploaded to DR.
- name : str
The name of the calendar.
- num_event_types : int
The number of different event types.
- num_events : int
The number of events this calendar has.
- project_ids : list of strings
A list containing the projectIds of the projects using this calendar.
- multiseries_id_columns: list of str or None
A list of columns in calendar which uniquely identify events for different series. Currently, only one column is supported. If multiseries id columns are not provided, calendar is considered to be single series.
- role : str
The access role the user has for this calendar.
-
classmethod
create
(file_path, calendar_name=None, multiseries_id_columns=None)¶ Creates a calendar using the given file. For information about calendar files, see the calendar documentation
The provided file must be a CSV in the format:
Date, Event, Series ID <date>, <event_type>, <series id> <date>, <event_type>,
A header row is required, and the “Series ID” column is optional.
Once the CalendarFile has been created, pass its ID with the
DatetimePartitioningSpecification
when setting the target for a time series project in order to use it.Parameters: - file_path : string
A string representing a path to a local csv file.
- calendar_name : string, optional
A name to assign to the calendar. Defaults to the name of the file if not provided.
- multiseries_id_columns : list of str or None
a list of the names of multiseries id columns to define which series an event belongs to. Currently only one multiseries id column is supported.
Returns: - calendar_file : CalendarFile
Instance with initialized data.
Raises: - AsyncProcessUnsuccessfulError
Raised if there was an error processing the provided calendar file.
Examples
# Creating a calendar with a specified name cal = dr.CalendarFile.create('/home/calendars/somecalendar.csv', calendar_name='Some Calendar Name') cal.id >>> 5c1d4904211c0a061bc93013 cal.name >>> Some Calendar Name # Creating a calendar without specifying a name cal = dr.CalendarFile.create('/home/calendars/somecalendar.csv') cal.id >>> 5c1d4904211c0a061bc93012 cal.name >>> somecalendar.csv # Creating a calendar with multiseries id columns cal = dr.CalendarFile.create('/home/calendars/somemultiseriescalendar.csv', calendar_name='Some Multiseries Calendar Name', multiseries_id_columns=['series_id']) cal.id >>> 5da9bb21962d746f97e4daee cal.name >>> Some Multiseries Calendar Name cal.multiseries_id_columns >>> ['series_id']
-
classmethod
get
(calendar_id)¶ Gets the details of a calendar, given the id.
Parameters: - calendar_id : str
The identifier of the calendar.
Returns: - calendar_file : CalendarFile
The requested calendar.
Raises: - DataError
Raised if the calendar_id is invalid, i.e. the specified CalendarFile does not exist.
Examples
cal = dr.CalendarFile.get(some_calendar_id) cal.id >>> some_calendar_id
-
classmethod
list
(project_id=None, batch_size=None)¶ Gets the details of all calendars this user has view access for.
Parameters: - project_id : str, optional
If provided, will filter for calendars associated only with the specified project.
- batch_size : int, optional
The number of calendars to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of calendars. If not specified, an appropriate default will be chosen by the server.
Returns: - calendar_list : list of
CalendarFile
A list of CalendarFile objects.
Examples
calendars = dr.CalendarFile.list() len(calendars) >>> 10
-
classmethod
delete
(calendar_id)¶ Deletes the calendar specified by calendar_id.
Parameters: - calendar_id : str
The id of the calendar to delete. The requester must have OWNER access for this calendar.
Raises: - ClientError
Raised if an invalid calendar_id is provided.
Examples
# Deleting with a valid calendar_id status_code = dr.CalendarFile.delete(some_calendar_id) status_code >>> 204 dr.CalendarFile.get(some_calendar_id) >>> ClientError: Item not found
-
classmethod
update_name
(calendar_id, new_calendar_name)¶ Changes the name of the specified calendar to the specified name. The requester must have at least READ_WRITE permissions on the calendar.
Parameters: - calendar_id : str
The id of the calendar to update.
- new_calendar_name : str
The new name to set for the specified calendar.
Returns: - status_code : int
200 for success
Raises: - ClientError
Raised if an invalid calendar_id is provided.
Examples
response = dr.CalendarFile.update_name(some_calendar_id, some_new_name) response >>> 200 cal = dr.CalendarFile.get(some_calendar_id) cal.name >>> some_new_name
Shares the calendar with the specified users, assigning the specified roles.
Parameters: - calendar_id : str
The id of the calendar to update
- access_list:
A list of dr.SharingAccess objects. Specify None for the role to delete a user’s access from the specified CalendarFile. For more information on specific access levels, see the sharing documentation.
Returns: - status_code : int
200 for success
Raises: - ClientError
Raised if unable to update permissions for a user.
- AssertionError
Raised if access_list is invalid.
Examples
# assuming some_user is a valid user, share this calendar with some_user sharing_list = [dr.SharingAccess(some_user_username, dr.enums.SHARING_ROLE.READ_WRITE)] response = dr.CalendarFile.share(some_calendar_id, sharing_list) response.status_code >>> 200 # delete some_user from this calendar, assuming they have access of some kind already delete_sharing_list = [dr.SharingAccess(some_user_username, None)] response = dr.CalendarFile.share(some_calendar_id, delete_sharing_list) response.status_code >>> 200 # Attempt to add an invalid user to a calendar invalid_sharing_list = [dr.SharingAccess(invalid_username, dr.enums.SHARING_ROLE.READ_WRITE)] dr.CalendarFile.share(some_calendar_id, invalid_sharing_list) >>> ClientError: Unable to update access for this calendar
-
classmethod
get_access_list
(calendar_id, batch_size=None)¶ Retrieve a list of users that have access to this calendar.
Parameters: - calendar_id : str
The id of the calendar to retrieve the access list for.
- batch_size : int, optional
The number of access records to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of calendars. If not specified, an appropriate default will be chosen by the server.
Returns: - access_control_list : list of
SharingAccess
A list of
SharingAccess
objects.
Raises: - ClientError
Raised if user does not have access to calendar or calendar does not exist.
Compliance Documentation Templates¶
-
class
datarobot.models.compliance_doc_template.
ComplianceDocTemplate
(id, creator_id, creator_username, name, org_id=None, sections=None)¶ A compliance documentation template. Templates are used to customize contents of
ComplianceDocumentation
.New in version v2.14.
Notes
Each
section
dictionary has the following schema:title
: title of the sectiontype
: type of section. Must be one of “datarobot”, “user” or “table_of_contents”.
Each type of section has a different set of attributes described bellow.
Section of type
"datarobot"
represent a section owned by DataRobot. DataRobot sections have the following additional attributes:content_id
: The identifier of the content in this section. You can get the default template withget_default
for a complete list of possible DataRobot section content ids.sections
: list of sub-section dicts nested under the parent section.
Section of type
"user"
represent a section with user-defined content. Those sections may contain text generated by user and have the following additional fields:regularText
: regular text of the section, optionally separated by\n
to split paragraphs.highlightedText
: highlighted text of the section, optionally separated by\n
to split paragraphs.sections
: list of sub-section dicts nested under the parent section.
Section of type
"table_of_contents"
represent a table of contents and has no additional attributes.Attributes: - id : str
the id of the template
- name : str
the name of the template.
- creator_id : str
the id of the user who created the template
- creator_username : str
username of the user who created the template
- org_id : str
the id of the organization the template belongs to
- sections : list of dicts
the sections of the template describing the structure of the document. Section schema is described in Notes section above.
-
classmethod
get_default
(template_type=None)¶ Get a default DataRobot template. This template is used for generating compliance documentation when no template is specified.
Parameters: - template_type : str or None
Type of the template. Currently supported values are “normal” and “time_series”
Returns: - template : ComplianceDocTemplate
the default template object with
sections
attribute populated with default sections.
-
classmethod
create_from_json_file
(name, path)¶ Create a template with the specified name and sections in a JSON file.
This is useful when working with sections in a JSON file. Example:
default_template = ComplianceDocTemplate.get_default() default_template.sections_to_json_file('path/to/example.json') # ... edit example.json in your editor my_template = ComplianceDocTemplate.create_from_json_file( name='my template', path='path/to/example.json' )
Parameters: - name : str
the name of the template. Must be unique for your user.
- path : str
the path to find the JSON file at
Returns: - template : ComplianceDocTemplate
the created template
-
classmethod
create
(name, sections)¶ Create a template with the specified name and sections.
Parameters: - name : str
the name of the template. Must be unique for your user.
- sections : list
list of section objects
Returns: - template : ComplianceDocTemplate
the created template
-
classmethod
get
(template_id)¶ Retrieve a specific template.
Parameters: - template_id : str
the id of the template to retrieve
Returns: - template : ComplianceDocTemplate
the retrieved template
-
classmethod
list
(name_part=None, limit=None, offset=None)¶ Get a paginated list of compliance documentation template objects.
Parameters: - name_part : str or None
Return only the templates with names matching specified string. The matching is case-insensitive.
- limit : int
The number of records to return. The server will use a (possibly finite) default if not specified.
- offset : int
The number of records to skip.
Returns: - templates : list of ComplianceDocTemplate
the list of template objects
-
sections_to_json_file
(path, indent=2)¶ Save sections of the template to a json file at the specified path
Parameters: - path : str
the path to save the file to
- indent : int
indentation to use in the json file.
-
update
(name=None, sections=None)¶ Update the name or sections of an existing doc template.
Note that default or non-existent templates can not be updated.
Parameters: - name : str, optional
the new name for the template
- sections : list of dicts
list of sections
-
delete
()¶ Delete the compliance documentation template.
Compliance Documentation¶
-
class
datarobot.models.compliance_documentation.
ComplianceDocumentation
(project_id, model_id, template_id=None)¶ A compliance documentation object.
New in version v2.14.
Examples
doc = ComplianceDocumentation('project-id', 'model-id') job = doc.generate() job.wait_for_completion() doc.download('example.docx')
Attributes: - project_id : str
the id of the project
- model_id : str
the id of the model
- template_id : str or None
optional id of the template for the generated doc. See documentation for
ComplianceDocTemplate
for more info.
-
generate
()¶ Start a job generating model compliance documentation.
Returns: - Job
an instance of an async job
-
download
(filepath)¶ Download the generated compliance documentation file and save it to the specified path. The generated file has a DOCX format.
Parameters: - filepath : str
A file path, e.g. “/path/to/save/compliance_documentation.docx”
Confusion Chart¶
-
class
datarobot.models.confusion_chart.
ConfusionChart
(source, data, source_model_id)¶ Confusion Chart data for model.
Notes
ClassMetrics
is a dict containing the following:class_name
(string) name of the classactual_count
(int) number of times this class is seen in the validation datapredicted_count
(int) number of times this class has been predicted for the validation dataf1
(float) F1 scorerecall
(float) recall scoreprecision
(float) precision scorewas_actual_percentages
(list of dict) one vs all actual percentages in format specified below.other_class_name
(string) the name of the other classpercentage
(float) the percentage of the times this class was predicted when is was actually class (from 0 to 1)
was_predicted_percentages
(list of dict) one vs all predicted percentages in format specified below.other_class_name
(string) the name of the other classpercentage
(float) the percentage of the times this class was actual predicted (from 0 to 1)
confusion_matrix_one_vs_all
(list of list) 2d list representing 2x2 one vs all matrix.- This represents the True/False Negative/Positive rates as integer for each class. The data structure looks like:
[ [ True Negative, False Positive ], [ False Negative, True Positive ] ]
Attributes: - source : str
Confusion Chart data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.
- raw_data : dict
All of the raw data for the Confusion Chart
- confusion_matrix : list of list
The NxN confusion matrix
- classes : list
The names of each of the classes
- class_metrics : list of dicts
List of dicts with schema described as
ClassMetrics
above.- source_model_id : str
ID of the model this Confusion chart represents; in some cases, insights from the parent of a frozen model may be used
Credentials¶
-
class
datarobot.models.
Credential
(credential_id=None, name=None, credential_type=None, creation_date=None, description=None)¶ -
classmethod
list
()¶ Returns list of available credentials.
Returns: - credentials : list of Credential instances
contains a list of available credentials.
Examples
>>> import datarobot as dr >>> data_sources = dr.Credential.list() >>> data_sources [ Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'), Credential('5e42cc4dcf8a5f3256865840', 'my_jdbc_cred', 'jdbc'), ]
-
classmethod
get
(credential_id)¶ Gets the Credential.
Parameters: - credential_id : str
the identifier of the credential.
Returns: - credential : Credential
the requested credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.get('5a8ac9ab07a57a0001be501f') >>> cred Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'),
-
delete
()¶ Deletes the Credential the store.
Parameters: - credential_id : str
the identifier of the credential.
Returns: - credential : Credential
the requested credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.get('5a8ac9ab07a57a0001be501f') >>> cred.delete()
-
classmethod
create_basic
(name, user, password, description=None)¶ Creates the credentials.
Parameters: - name : str
the name to use for this set of credentials.
- user : str
the username to store for this set of credentials.
- password : str
the password to store for this set of credentials.
- description : str
the description to use for this set of credentials.
Returns: - credential : Credential
the created credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.create_basic( ... name='my_basic_cred', ... user='username', ... password='password', ... ) >>> cred Credential('5e429d6ecf8a5f36c5693e03', 'my_basic_cred', 'basic'),
-
classmethod
create_oauth
(name, token, refresh_token, description=None)¶ Creates the OAUTH credentials.
Parameters: - name : str
the name to use for this set of credentials.
- token: str
the OAUTH token
- refresh_token: str
The OAUTH token
- description : str
the description to use for this set of credentials.
Returns: - credential : Credential
the created credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.create_oauth( ... name='my_oauth_cred', ... token='XXX', ... refresh_token='YYY', ... ) >>> cred Credential('5e429d6ecf8a5f36c5693e03', 'my_oauth_cred', 'oauth'),
-
classmethod
create_s3
(name, aws_access_key_id=None, aws_secret_access_key=None, aws_session_token=None, description=None)¶ Creates the S3 credentials.
Parameters: - name : str
the name to use for this set of credentials.
- aws_access_key_id : str, optional
the AWS access key id.
- aws_secret_access_key : str, optional
the AWS secret access key.
- aws_session_token : str, optional
the AWS session token.
- description : str
the description to use for this set of credentials.
Returns: - credential : Credential
the created credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.create_s3( ... name='my_s3_cred', ... aws_access_key_id='XXX', ... aws_secret_access_key='YYY', ... aws_session_token='ZZZ', ... ) >>> cred Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'),
-
classmethod
Database Connectivity¶
-
class
datarobot.
DataDriver
(id=None, creator=None, base_names=None, class_name=None, canonical_name=None)¶ A data driver
Attributes: - id : str
the id of the driver.
- class_name : str
the Java class name for the driver.
- canonical_name : str
the user-friendly name of the driver.
- creator : str
the id of the user who created the driver.
- base_names : list of str
a list of the file name(s) of the jar files.
-
classmethod
list
()¶ Returns list of available drivers.
Returns: - drivers : list of DataDriver instances
contains a list of available drivers.
Examples
>>> import datarobot as dr >>> drivers = dr.DataDriver.list() >>> drivers [DataDriver('mysql'), DataDriver('RedShift'), DataDriver('PostgreSQL')]
-
classmethod
get
(driver_id)¶ Gets the driver.
Parameters: - driver_id : str
the identifier of the driver.
Returns: - driver : DataDriver
the required driver.
Examples
>>> import datarobot as dr >>> driver = dr.DataDriver.get('5ad08a1889453d0001ea7c5c') >>> driver DataDriver('PostgreSQL')
-
classmethod
create
(class_name, canonical_name, files)¶ Creates the driver. Only available to admin users.
Parameters: - class_name : str
the Java class name for the driver.
- canonical_name : str
the user-friendly name of the driver.
- files : list of str
a list of the file paths on file system file_path(s) for the driver.
Returns: - driver : DataDriver
the created driver.
Raises: - ClientError
raised if user is not granted for Can manage JDBC database drivers feature
Examples
>>> import datarobot as dr >>> driver = dr.DataDriver.create( ... class_name='org.postgresql.Driver', ... canonical_name='PostgreSQL', ... files=['/tmp/postgresql-42.2.2.jar'] ... ) >>> driver DataDriver('PostgreSQL')
-
update
(class_name=None, canonical_name=None)¶ Updates the driver. Only available to admin users.
Parameters: - class_name : str
the Java class name for the driver.
- canonical_name : str
the user-friendly name of the driver.
Raises: - ClientError
raised if user is not granted for Can manage JDBC database drivers feature
Examples
>>> import datarobot as dr >>> driver = dr.DataDriver.get('5ad08a1889453d0001ea7c5c') >>> driver.canonical_name 'PostgreSQL' >>> driver.update(canonical_name='postgres') >>> driver.canonical_name 'postgres'
-
delete
()¶ Removes the driver. Only available to admin users.
Raises: - ClientError
raised if user is not granted for Can manage JDBC database drivers feature
-
class
datarobot.
DataStore
(data_store_id=None, data_store_type=None, canonical_name=None, creator=None, updated=None, params=None, role=None)¶ A data store. Represents database
Attributes: - id : str
the id of the data store.
- data_store_type : str
the type of data store.
- canonical_name : str
the user-friendly name of the data store.
- creator : str
the id of the user who created the data store.
- updated : datetime.datetime
the time of the last update
- params : DataStoreParameters
a list specifying data store parameters.
-
classmethod
list
()¶ Returns list of available data stores.
Returns: - data_stores : list of DataStore instances
contains a list of available data stores.
Examples
>>> import datarobot as dr >>> data_stores = dr.DataStore.list() >>> data_stores [DataStore('Demo'), DataStore('Airlines')]
-
classmethod
get
(data_store_id)¶ Gets the data store.
Parameters: - data_store_id : str
the identifier of the data store.
Returns: - data_store : DataStore
the required data store.
Examples
>>> import datarobot as dr >>> data_store = dr.DataStore.get('5a8ac90b07a57a0001be501e') >>> data_store DataStore('Demo')
-
classmethod
create
(data_store_type, canonical_name, driver_id, jdbc_url)¶ Creates the data store.
Parameters: - data_store_type : str
the type of data store.
- canonical_name : str
the user-friendly name of the data store.
- driver_id : str
the identifier of the DataDriver.
- jdbc_url : str
the full JDBC url, for example jdbc:postgresql://my.dbaddress.org:5432/my_db.
Returns: - data_store : DataStore
the created data store.
Examples
>>> import datarobot as dr >>> data_store = dr.DataStore.create( ... data_store_type='jdbc', ... canonical_name='Demo DB', ... driver_id='5a6af02eb15372000117c040', ... jdbc_url='jdbc:postgresql://my.db.address.org:5432/perftest' ... ) >>> data_store DataStore('Demo DB')
-
update
(canonical_name=None, driver_id=None, jdbc_url=None)¶ Updates the data store.
Parameters: - canonical_name : str
optional, the user-friendly name of the data store.
- driver_id : str
optional, the identifier of the DataDriver.
- jdbc_url : str
optional, the full JDBC url, for example jdbc:postgresql://my.dbaddress.org:5432/my_db.
Examples
>>> import datarobot as dr >>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae') >>> data_store DataStore('Demo DB') >>> data_store.update(canonical_name='Demo DB updated') >>> data_store DataStore('Demo DB updated')
-
delete
()¶ Removes the DataStore
-
test
(username, password)¶ Tests database connection.
Parameters: - username : str
the username for database authentication.
- password : str
the password for database authentication. The password is encrypted at server side and never saved / stored
Returns: - message : dict
message with status.
Examples
>>> import datarobot as dr >>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae') >>> data_store.test(username='db_username', password='db_password') {'message': 'Connection successful'}
-
schemas
(username, password)¶ Returns list of available schemas.
Parameters: - username : str
the username for database authentication.
- password : str
the password for database authentication. The password is encrypted at server side and never saved / stored
Returns: - response : dict
dict with database name and list of str - available schemas
Examples
>>> import datarobot as dr >>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae') >>> data_store.schemas(username='db_username', password='db_password') {'catalog': 'perftest', 'schemas': ['demo', 'information_schema', 'public']}
-
tables
(username, password, schema=None)¶ Returns list of available tables in schema.
Parameters: - username : str
optional, the username for database authentication.
- password : str
optional, the password for database authentication. The password is encrypted at server side and never saved / stored
- schema : str
optional, the schema name.
Returns: - response : dict
dict with catalog name and tables info
Examples
>>> import datarobot as dr >>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae') >>> data_store.tables(username='db_username', password='db_password', schema='demo') {'tables': [{'type': 'TABLE', 'name': 'diagnosis', 'schema': 'demo'}, {'type': 'TABLE', 'name': 'kickcars', 'schema': 'demo'}, {'type': 'TABLE', 'name': 'patient', 'schema': 'demo'}, {'type': 'TABLE', 'name': 'transcript', 'schema': 'demo'}], 'catalog': 'perftest'}
-
classmethod
from_server_data
(data, keep_attrs=None)¶ Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
Parameters: - data : dict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrs : list
List of the dotted namespace notations for attributes to keep within the object structure even if their values are None
-
get_access_list
()¶ Retrieve what users have access to this data store
New in version v2.14.
Returns: - list of :class:`SharingAccess <datarobot.SharingAccess>`
Modify the ability of users to access this data store
New in version v2.14.
Parameters: - access_list : list of
SharingAccess
the modifications to make.
Raises: - datarobot.ClientError :
if you do not have permission to share this data store, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the data store without an owner.
Examples
Transfer access to the data store from old_user@datarobot.com to new_user@datarobot.com
import datarobot as dr new_access = dr.SharingAccess(new_user@datarobot.com, dr.enums.SHARING_ROLE.OWNER, can_share=True) access_list = [dr.SharingAccess(old_user@datarobot.com, None), new_access] dr.DataStore.get('my-data-store-id').share(access_list)
- access_list : list of
-
class
datarobot.
DataSource
(data_source_id=None, data_source_type=None, canonical_name=None, creator=None, updated=None, params=None, role=None)¶ A data source. Represents data request
Attributes: - data_source_id : str
the id of the data source.
- data_source_type : str
the type of data source.
- canonical_name : str
the user-friendly name of the data source.
- creator : str
the id of the user who created the data source.
- updated : datetime.datetime
the time of the last update.
- params : DataSourceParameters
a list specifying data source parameters.
-
classmethod
list
()¶ Returns list of available data sources.
Returns: - data_sources : list of DataSource instances
contains a list of available data sources.
Examples
>>> import datarobot as dr >>> data_sources = dr.DataSource.list() >>> data_sources [DataSource('Diagnostics'), DataSource('Airlines 100mb'), DataSource('Airlines 10mb')]
-
classmethod
get
(data_source_id)¶ Gets the data source.
Parameters: - data_source_id : str
the identifier of the data source.
Returns: - data_source : DataSource
the requested data source.
Examples
>>> import datarobot as dr >>> data_source = dr.DataSource.get('5a8ac9ab07a57a0001be501f') >>> data_source DataSource('Diagnostics')
-
classmethod
create
(data_source_type, canonical_name, params)¶ Creates the data source.
Parameters: - data_source_type : str
the type of data source.
- canonical_name : str
the user-friendly name of the data source.
- params : DataSourceParameters
a list specifying data source parameters.
Returns: - data_source : DataSource
the created data source.
Examples
>>> import datarobot as dr >>> params = dr.DataSourceParameters( ... data_store_id='5a8ac90b07a57a0001be501e', ... query='SELECT * FROM airlines10mb WHERE "Year" >= 1995;' ... ) >>> data_source = dr.DataSource.create( ... data_source_type='jdbc', ... canonical_name='airlines stats after 1995', ... params=params ... ) >>> data_source DataSource('airlines stats after 1995')
-
update
(canonical_name=None, params=None)¶ Creates the data source.
Parameters: - canonical_name : str
optional, the user-friendly name of the data source.
- params : DataSourceParameters
optional, the identifier of the DataDriver.
Examples
>>> import datarobot as dr >>> data_source = dr.DataSource.get('5ad840cc613b480001570953') >>> data_source DataSource('airlines stats after 1995') >>> params = dr.DataSourceParameters( ... query='SELECT * FROM airlines10mb WHERE "Year" >= 1990;' ... ) >>> data_source.update( ... canonical_name='airlines stats after 1990', ... params=params ... ) >>> data_source DataSource('airlines stats after 1990')
-
delete
()¶ Removes the DataSource
-
classmethod
from_server_data
(data, keep_attrs=None)¶ Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
Parameters: - data : dict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrs : list
List of the dotted namespace notations for attributes to keep within the object structure even if their values are None
-
get_access_list
()¶ Retrieve what users have access to this data source
New in version v2.14.
Returns: - list of :class:`SharingAccess <datarobot.SharingAccess>`
Modify the ability of users to access this data source
New in version v2.14.
Parameters: - access_list : list of
SharingAccess
the modifications to make.
Raises: - datarobot.ClientError :
if you do not have permission to share this data source, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the data source without an owner
Examples
Transfer access to the data source from old_user@datarobot.com to new_user@datarobot.com
import datarobot as dr new_access = dr.SharingAccess(new_user@datarobot.com, dr.enums.SHARING_ROLE.OWNER, can_share=True) access_list = [dr.SharingAccess(old_user@datarobot.com, None), new_access] dr.DataSource.get('my-data-source-id').share(access_list)
- access_list : list of
-
class
datarobot.
DataSourceParameters
(data_store_id=None, table=None, schema=None, partition_column=None, query=None, fetch_size=None)¶ Data request configuration
Attributes: - data_store_id : str
the id of the DataStore.
- table : str
optional, the name of specified database table.
- schema : str
optional, the name of the schema associated with the table.
- partition_column : str
optional, the name of the partition column.
- query : str
optional, the user specified SQL query.
- fetch_size : int
optional, a user specified fetch size in the range [1, 20000]. By default a fetchSize will be assigned to balance throughput and memory usage
Datasets¶
-
class
datarobot.
Dataset
(dataset_id, version_id, name, categories, created_at, created_by, is_data_engine_eligible, is_latest_version, is_snapshot, processing_state, data_persisted=None, size=None, row_count=None)¶ Represents a Dataset returned from the api/v2/datasets/ endpoints.
Attributes: - id: string
The ID of this dataset
- name: string
The name of this dataset in the catalog
- is_latest_version: bool
Whether this dataset version is the latest version of this dataset
- version_id: string
The object ID of the catalog_version the dataset belongs to
- categories: list(string)
An array of strings describing the intended use of the dataset. The supported options are “TRAINING” and “PREDICTION”.
- created_at: string
The date when the dataset was created
- created_by: string
Username of the user who created the dataset
- is_snapshot: bool
Whether the dataset version is an immutable snapshot of data which has previously been retrieved and saved to Data_robot
- data_persisted: bool, optional
If true, user is allowed to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.) and download data. If false, download is not allowed and only the data schema (feature names and types) will be available.
- is_data_engine_eligible: bool
Whether this dataset can be a data source of a data engine query.
- processing_state: string
Current ingestion process state of the dataset
- row_count: int, optional
The number of rows in the dataset.
- size: int, optional
The size of the dataset as a CSV in bytes.
-
classmethod
create_from_file
(file_path=None, filelike=None, categories=None)¶ A blocking call that creates a new Dataset from a file. Returns when the dataset has been successfully uploaded and processed.
Warning: This function does not clean up it’s open files. If you pass a filelike, you are responsible for closing it. If you pass a file_path, this will create a file object from the file_path but will not close it.
Parameters: - file_path: string, optional
The path to the file. This will create a file object pointing to that file but will not close it.
- filelike: file, optional
An open and readable file object.
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
Returns: - response: Dataset
A fully armed and operational Dataset
-
classmethod
create_from_in_memory_data
(data_frame=None, records=None, categories=None)¶ A blocking call that creates a new Dataset from in-memory data. Returns when the dataset has been successfully uploaded and processed.
The data can be either a pandas DataFrame or a list of dictionaries with identical keys.
Parameters: - data_frame: DataFrame, optional
The data frame to upload
- records: list[dict], optional
A list of dictionaries with identical keys to upload
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
Returns: - response: Dataset
The Dataset created from the uploaded data
-
classmethod
create_from_url
(url, do_snapshot=None, persist_data_after_ingestion=None, categories=None)¶ A blocking call that creates a new Dataset from data stored at a url. Returns when the dataset has been successfully uploaded and processed.
Parameters: - url: string
The URL to use as the source of data for the dataset being created.
- do_snapshot: bool, optional
If unset, uses the server default: True. If true, creates a snapshot dataset; if false, creates a remote dataset. Creating snapshots from non-file sources requires an additional permission, Enable Create Snapshot Data Source.
- persist_data_after_ingestion: bool, optional
If unset, uses the server default: True. If true, will enforce saving all data (for download and sampling) and will allow a user to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.). If false, will not enforce saving data. The data schema (feature names and types) still will be available. Specifying this parameter to false and doSnapshot to true will result in an error.
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
Returns: - response: Dataset
The Dataset created from the uploaded data
-
classmethod
get
(dataset_id)¶ Get information about a dataset.
Parameters: - dataset_id : string
the id of the dataset
Returns: - dataset : Dataset
the queried dataset
-
classmethod
delete
(dataset_id)¶ Soft deletes a dataset. You cannot get it or list it or do actions with it, except for un-deleting it.
Parameters: - dataset_id: string
The id of the dataset to mark for deletion
Returns: - None
-
classmethod
un_delete
(dataset_id)¶ Un-deletes a previously deleted dataset. If the dataset was not deleted, nothing happens.
Parameters: - dataset_id: string
The id of the dataset to un-delete
Returns: - None
-
classmethod
list
(category=None, filter_failed=None, order_by=None)¶ List all datasets a user can view.
Parameters: - category: string, optional
Optional. If specified, only dataset versions that have the specified category will be included in the results. Categories identify the intended use of the dataset; supported categories are “TRAINING” and “PREDICTION”.
- filter_failed: bool, optional
If unset, uses the server default: False. Whether datasets that failed during import should be excluded from the results. If True invalid datasets will be excluded.
- order_by: string, optional
If unset, uses the server default: ‘-created’. Sorting order which will be applied to catalog list, valid options are: - “created” – ascending order by creation datetime; - “-created” – descending order by creation datetime.
Returns: - datasets: list[Dataset]
a list of datasets the user can view
-
update
()¶ Updates the Dataset attributes in place with the latest information from the server.
Returns: - None
-
modify
(name=None, categories=None)¶ Modifies the Dataset name and/or categories. Updates the object in place.
Parameters: - name: string, optional
The new name of the dataset
- categories: list[string], optional
A list of strings describing the intended use of the dataset. The supported options are “TRAINING” and “PREDICTION”. If any categories were previously specified for the dataset, they will be overwritten.
Returns: - None
-
get_details
()¶ Gets the details for this Dataset
Returns: - DatasetDetails
-
get_all_features
(order_by=None)¶ Parameters: - order_by: string, optional
If unset, uses the server default: ‘name’. How the features should be ordered. Can be ‘name’ or ‘featureType’.
Returns: - feature_list: list[DatasetFeature]
-
get_featurelists
()¶ Get DatasetFeaturelists created on this Dataset
Returns: - feature_lists: list[DatasetFeaturelist]
-
create_featurelist
(name, features)¶ Create a new dataset featurelist
Parameters: - name : str
the name of the modeling featurelist to create. Names must be unique within the dataset, or the server will return an error.
- features : list of str
the names of the features to include in the dataset featurelist. Each feature must be a dataset feature.
Returns: - featurelist : DatasetFeaturelist
the newly created featurelist
Examples
dataset = Dataset.get('1234deadbeeffeeddead4321') dataset_features = dataset.get_all_features() selected_features = [feat.name for feat in dataset_features][:5] # select first five new_flist = dataset.create_featurelist('Simple Features', selected_features)
-
get_file
(file_path=None, filelike=None)¶ Retrieves all the originally uploaded data in CSV form. Writes it to either the file or a filelike object that can write bytes.
Only one of file_path or filelike can be provided and it must be provided as a keyword argument (i.e. file_path=’path-to-write-to’). If a file-like object is provided, the user is responsible for closing it when they are done.
The user must also have permission to download data.
Parameters: - file_path: string, optional
The destination to write the file to.
- filelike: file, optional
A file-like object to write to. The object must be able to write bytes. The user is responsible for closing the object
Returns: - None
-
get_projects
()¶ Retrieves the Dataset’s projects as ProjectLocation named tuples.
Returns: - locations: list[ProjectLocation]
-
create_project
(project_name=None, user=None, password=None, credential_id=None, use_kerberos=None)¶ Create a
datarobot.models.Project
from this datasetParameters: - project_name: string, optional
The name of the project to be created. If not specified, will be “Untitled Project” for database connections, otherwise the project name will be based on the file used.
- user: string, optional
The username for database authentication.
- password: string, optional
The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored
- credential_id: string, optional
The ID of the set of credentials to use instead of user and password.
- use_kerberos: bool, optional
Server default is False. If true, use kerberos authentication for database authentication.
Returns: - Project
-
class
datarobot.
DatasetDetails
(dataset_id, version_id, categories, created_by, created_at, data_source_type, error, is_latest_version, is_snapshot, is_data_engine_eligible, last_modification_date, last_modifier_full_name, name, uri, data_persisted=None, data_engine_query_id=None, data_source_id=None, description=None, eda1_modification_date=None, eda1_modifier_full_name=None, feature_count=None, feature_count_by_type=None, processing_state=None, row_count=None, size=None, tags=None)¶ Represents a detailed view of a Dataset. The to_dataset method creates a Dataset from this details view.
Attributes: - dataset_id: string
The ID of this dataset
- name: string
The name of this dataset in the catalog
- is_latest_version: bool
Whether this dataset version is the latest version of this dataset
- version_id: string
The object ID of the catalog_version the dataset belongs to
- categories: list(string)
An array of strings describing the intended use of the dataset. The supported options are “TRAINING” and “PREDICTION”.
- created_at: string
The date when the dataset was created
- created_by: string
Username of the user who created the dataset
- is_snapshot: bool
Whether the dataset version is an immutable snapshot of data which has previously been retrieved and saved to Data_robot
- data_persisted: bool, optional
If true, user is allowed to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.) and download data. If false, download is not allowed and only the data schema (feature names and types) will be available.
- is_data_engine_eligible: bool
Whether this dataset can be a data source of a data engine query.
- processing_state: string
Current ingestion process state of the dataset
- row_count: int, optional
The number of rows in the dataset.
- size: int, optional
The size of the dataset as a CSV in bytes.
- data_engine_query_id: string, optional
ID of the source data engine query
- data_source_id: string, optional
ID of the datasource used as the source of the dataset
- data_source_type: string
the type of the datasource that was used as the source of the dataset
- description: string, optional
the description of the dataset
- eda1_modification_date: string, optional
the ISO 8601 formatted date and time when the EDA1 for the dataset was updated
- eda1_modifier_full_name: string, optional
the user who was the last to update EDA1 for the dataset
- error: string
details of exception raised during ingestion process, if any
- feature_count: int, optional
total number of features in the dataset
- feature_count_by_type: list[FeatureTypeCount]
number of features in the dataset grouped by feature type
- last_modification_date: string
the ISO 8601 formatted date and time when the dataset was last modified
- last_modifier_full_name: string
full name of user who was the last to modify the dataset
- tags: list[string]
list of tags attached to the item
- uri: string
the uri to datasource like: - ‘file_name.csv’ - ‘jdbc:DATA_SOURCE_GIVEN_NAME/SCHEMA.TABLE_NAME’ - ‘jdbc:DATA_SOURCE_GIVEN_NAME/<query>’ - for query based datasources - ‘https://s3.amazonaws.com/datarobot_test/kickcars-sample-200.csv’ - etc.
-
classmethod
get
(dataset_id)¶ Get details for a Dataset from the server
Parameters: - dataset_id: str
The id for the Dataset from which to get details
Returns: - DatasetDetails
-
to_dataset
()¶ Build a Dataset object from the information in this object
Returns: - Dataset
Deployment¶
-
class
datarobot.
Deployment
(id=None, label=None, description=None, default_prediction_server=None, model=None, capabilities=None, prediction_usage=None, permissions=None, service_health=None, model_health=None, accuracy_health=None)¶ A deployment created from a DataRobot model.
Attributes: - id : str
the id of the deployment
- label : str
the label of the deployment
- description : str
the description of the deployment
- default_prediction_server : dict
information on the default prediction server of the deployment
- model : dict
information on the model of the deployment
- capabilities : dict
information on the capabilities of the deployment
- prediction_usage : dict
information on the prediction usage of the deployment
- permissions : list
(New in version v2.18) user’s permissions on the deployment
- service_health : dict
information on the service health of the deployment
- model_health : dict
information on the model health of the deployment
- accuracy_health : dict
information on the accuracy health of the deployment
-
classmethod
create_from_learning_model
(model_id, label, description=None, default_prediction_server_id=None)¶ Create a deployment from a DataRobot model.
New in version v2.17.
Parameters: - model_id : str
id of the DataRobot model to deploy
- label : str
a human readable label of the deployment
- description : str, optional
a human readable description of the deployment
- default_prediction_server_id : str
an identifier of a prediction server to be used as the default prediction server
Returns: - deployment : Deployment
The created deployment
Examples
from datarobot import Project, Deployment project = Project.get('5506fcd38bd88f5953219da0') model = project.get_models()[0] deployment = Deployment.create_from_learning_model(model.id, 'New Deployment') deployment >>> Deployment('New Deployment')
-
classmethod
list
(order_by=None, search=None, filters=None)¶ List all deployments a user can view.
New in version v2.17.
Parameters: - order_by : str, optional
(New in version v2.18) the order to sort the deployment list by, defaults to label
Allowed attributes to sort by are:
label
serviceHealth
modelHealth
accuracyHealth
recentPredictions
lastPredictionTimestamp
If the sort attribute is preceded by a hyphen, deployments will be sorted in descending order, otherwise in ascending order.
For health related sorting, ascending means failing, warning, passing, unknown.
- search : str, optional
(New in version v2.18) case insensitive search against deployment’s label and description.
- filters : datarobot.models.deployment.DeploymentListFilters, optional
(New in version v2.20) an object containing all filters that you’d like to apply to the resulting list of deployments. See
DeploymentListFilters
for details on usage.
Returns: - deployments : list
a list of deployments the user can view
Examples
from datarobot import Deployment deployments = Deployment.list() deployments >>> [Deployment('New Deployment'), Deployment('Previous Deployment')]
from datarobot import Deployment from datarobot.enums import DEPLOYMENT_SERVICE_HEALTH filters = DeploymentListFilters( role='OWNER', service_health=[DEPLOYMENT_SERVICE_HEALTH.FAILING] ) filtered_deployments = Deployment.list(filters=filters) filtered_deployments >>> [Deployment('Deployment I Own w/ Failing Service Health')]
-
classmethod
get
(deployment_id)¶ Get information about a deployment.
New in version v2.17.
Parameters: - deployment_id : str
the id of the deployment
Returns: - deployment : Deployment
the queried deployment
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') deployment.id >>>'5c939e08962d741e34f609f0' deployment.label >>>'New Deployment'
-
update
(label=None, description=None)¶ Update the label and description of this deployment.
New in version v2.19.
-
delete
()¶ Delete this deployment.
New in version v2.17.
-
replace_model
(new_model_id, reason)¶ - Replace the model used in this deployment. To confirm model replacement eligibility, use
validate_replacement_model()
beforehand.
New in version v2.17.
Model replacement is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.
Predictions made against this deployment will start using the new model as soon as the initial request is completed. There will be no interruption for predictions throughout the process.
Parameters: - new_model_id : str
The id of the new model to use
- reason : MODEL_REPLACEMENT_REASON
The reason for the model replacement. Must be one of ‘ACCURACY’, ‘DATA_DRIFT’, ‘ERRORS’, ‘SCHEDULED_REFRESH’, ‘SCORING_SPEED’, or ‘OTHER’. This value will be stored in the model history to keep track of why a model was replaced
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') deployment.model['id'], deployment.model['type'] >>>('5c0a979859b00004ba52e431', 'Decision Tree Classifier (Gini)') deployment.replace_model('5c0a969859b00004ba52e41b', MODEL_REPLACEMENT_REASON.ACCURACY) deployment.model['id'], deployment.model['type'] >>>('5c0a969859b00004ba52e41b', 'Support Vector Classifier (Linear Kernel)')
-
validate_replacement_model
(new_model_id)¶ Validate a model can be used as the replacement model of the deployment.
New in version v2.17.
Parameters: - new_model_id : str
the id of the new model to validate
Returns: - status : str
status of the validation, will be one of ‘passing’, ‘warning’ or ‘failing’. If the status is passing or warning, use
replace_model()
to perform a model replacement. If the status is failing, refer tochecks
for more detail on why the new model cannot be used as a replacement.- message : str
message for the validation result
- checks : dict
explain why the new model can or cannot replace the deployment’s current model
-
get_features
()¶ Retrieve the list of features needed to make predictions on this deployment.
Returns: - features: list
a list of feature dict
Notes
Each feature dict contains the following structure:
name
: str, feature namefeature_type
: str, feature typeimportance
: float, numeric measure of the relationship strength between the feature and target (independent of model or other features)date_format
: str or None, the date format string for how this feature was interpreted, null if not a date feature, compatible with https://docs.python.org/2/library/time.html#time.strftime.known_in_advance
: bool, whether the feature was selected as known in advance in a time-series model, false for non-time-series models.
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') features = deployment.get_features() features[0]['feature_type'] >>>'Categorical' features[0]['importance'] >>>0.133
-
submit_actuals
(data)¶ Submit actuals for processing. The acutals submitted will be used to calculate accuracy metrics.
Parameters: - data: list or pandas.DataFrame
- If `data` is a list, each item should be a dict-like object with the following keys and
- values; if `data` is a pandas.DataFrame, it should contain the following columns:
- - association_id: str, a unique identifier used with a prediction,
max length 128 characters
- - actual_value: str or int or float, the actual value of a prediction;
should be numeric for deployments with regression models or string for deployments with classification model
- - was_acted_on: bool, optional, indicates if the prediction was acted on in a way that
could have affected the actual outcome
- - timestamp: datetime or string in RFC3339 format. If the datetime provided does not
have a timezone, we assume it is UTC.
Raises: - ValueError
if input data is not a list of dict-like objects or a pandas.DataFrame if input data is empty
Examples
from datarobot import Deployment, AccuracyOverTime deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') data = [{ 'association_id': '439917', 'actual_value': 'True', 'was_acted_on': True }] deployment.submit_actuals(data)
-
get_drift_tracking_settings
()¶ Retrieve drift tracking settings of this deployment.
New in version v2.17.
Returns: - settings : dict
Drift tracking settings of the deployment containing two nested dicts with key
target_drift
andfeature_drift
, which are further described below.Target drift
setting contains:- enabled : bool
If target drift tracking is enabled for this deployment. To create or update existing ‘’target_drift’’ settings, see
update_drift_tracking_settings()
Feature drift
setting contains:- enabled : bool
If feature drift tracking is enabled for this deployment. To create or update existing ‘’feature_drift’’ settings, see
update_drift_tracking_settings()
-
update_drift_tracking_settings
(target_drift_enabled=None, feature_drift_enabled=None, max_wait=600)¶ Update drift tracking settings of this deployment.
New in version v2.17.
Updating drift tracking setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.
Parameters: - target_drift_enabled : bool, optional
if target drift tracking is to be turned on
- feature_drift_enabled : bool, optional
if feature drift tracking is to be turned on
- max_wait : int, optional
seconds to wait for successful resolution
-
get_association_id_settings
()¶ Retrieve association ID setting for this deployment.
New in version v2.19.
Returns: - association_id_settings : dict in the following format:
- column_names : list[string], optional
name of the columns to be used as association ID,
- required_in_prediction_requests : bool, optional
whether the association ID column is required in prediction requests
-
update_association_id_settings
(column_names=None, required_in_prediction_requests=None, max_wait=600)¶ Update association ID setting for this deployment.
New in version v2.19.
Parameters: - column_names : list[string], optional
name of the columns to be used as association ID, currently only support a list of one string
- required_in_prediction_requests : bool, optional
whether the association ID column is required in prediction requests
- max_wait : int, optional
seconds to wait for successful resolution
-
get_prediction_warning_settings
()¶ Retrieve prediction warning settings of this deployment.
New in version v2.19.
Returns: - settings : dict in the following format:
- enabled : bool
If target prediction_warning is enabled for this deployment. To create or update existing ‘’prediction_warning’’ settings, see
update_prediction_warning_settings()
- custom_boundaries : dict or None
- If None default boundaries for a model are used. Otherwise has following keys:
- upper : float
All predictions greater than provided value are considered anomalous
- lower : float
All predictions less than provided value are considered anomalous
-
update_prediction_warning_settings
(prediction_warning_enabled, use_default_boundaries=None, lower_boundary=None, upper_boundary=None, max_wait=600)¶ Update prediction warning settings of this deployment.
New in version v2.19.
Parameters: - prediction_warning_enabled : bool
If prediction warnings should be turned on.
- use_default_boundaries : bool, optional
If default boundaries of the model should be used for the deployment.
- upper_boundary : float, optional
All predictions greater than provided value will be considered anomalous
- lower_boundary : float, optional
All predictions less than provided value will be considered anomalous
- max_wait : int, optional
seconds to wait for successful resolution
-
get_prediction_intervals_settings
()¶ Retrieve prediction intervals settings for this deployment.
New in version v2.19.
Returns: - dict in the following format:
- enabled : bool
Whether prediction intervals are enabled for this deployment
- percentiles : list[int]
List of enabled prediction intervals sizes for this deployment. Currently we only support one percentile at a time.
Notes
Note that prediction intervals are only supported for time series deployments.
-
update_prediction_intervals_settings
(percentiles, enabled=True, max_wait=600)¶ Update prediction intervals settings for this deployment.
New in version v2.19.
Parameters: - percentiles : list[int]
The prediction intervals percentiles to enable for this deployment. Currently we only support setting one percentile at a time.
- enabled : bool, optional (defaults to True)
Whether to enable showing prediction intervals in the results of predictions requested using this deployment.
- max_wait : int, optional
seconds to wait for successful resolution
Raises: - AssertionError
If
percentiles
is in an invalid format- AsyncFailureError
If any of the responses from the server are unexpected
- AsyncProcessUnsuccessfulError
If the prediction intervals calculation job has failed or has been cancelled.
- AsyncTimeoutError
If the prediction intervals calculation job did not resolve in time
Notes
Updating prediction intervals settings is an asynchronous process, which means some preparatory work may be performed before the settings request is completed. This function will not return until all work is fully finished.
Note that prediction intervals are only supported for time series deployments.
-
get_service_stats
(model_id=None, start_time=None, end_time=None, execution_time_quantile=None, response_time_quantile=None, slow_requests_threshold=None)¶ Retrieve value of service stat metrics over a certain time period.
New in version v2.18.
Parameters: - model_id : str, optional
the id of the model
- start_time : datetime, optional
start of the time period
- end_time : datetime, optional
end of the time period
- execution_time_quantile : float, optional
quantile for executionTime, defaults to 0.5
- response_time_quantile : float, optional
quantile for responseTime, defaults to 0.5
- slow_requests_threshold : float, optional
threshold for slowRequests, defaults to 1000
Returns: - service_stats : ServiceStats
the queried service stats metrics information
-
get_service_stats_over_time
(metric=None, model_id=None, start_time=None, end_time=None, bucket_size=None, quantile=None, threshold=None)¶ Retrieve information about how a service stat metric changes over a certain time period.
New in version v2.18.
Parameters: - metric : SERVICE_STAT_METRIC, optional
the service stat metric to retrieve
- model_id : str, optional
the id of the model
- start_time : datetime, optional
start of the time period
- end_time : datetime, optional
end of the time period
- bucket_size : str, optional
time duration of a bucket, in ISO 8601 time duration format
- quantile : float, optional
quantile for ‘executionTime’ or ‘responseTime’, ignored when querying other metrics
- threshold : int, optional
threshold for ‘slowQueries’, ignored when querying other metrics
Returns: - service_stats_over_time : ServiceStatsOverTime
the queried service stats metric over time information
-
get_accuracy
(model_id=None, start_time=None, end_time=None, start=None, end=None)¶ Retrieve values of accuracy metrics over a certain time period.
New in version v2.18.
Parameters: - model_id : str
the id of the model
- start_time : datetime
start of the time period
- end_time : datetime
end of the time period
Returns: - accuracy : Accuracy
the queried accuracy metrics information
-
get_accuracy_over_time
(metric=None, model_id=None, start_time=None, end_time=None, bucket_size=None)¶ Retrieve information about how an accuracy metric changes over a certain time period.
New in version v2.18.
Parameters: - metric : ACCURACY_METRIC
the accuracy metric to retrieve
- model_id : str
the id of the model
- start_time : datetime
start of the time period
- end_time : datetime
end of the time period
- bucket_size : str
time duration of a bucket, in ISO 8601 time duration format
Returns: - accuracy_over_time : AccuracyOverTime
the queried accuracy metric over time information
-
class
datarobot.models.deployment.
DeploymentListFilters
(role=None, service_health=None, model_health=None, accuracy_health=None, execution_environment_type=None, materiality=None)¶ Construct a set of filters to pass to
Deployment.list()
New in version v2.20.
Parameters: - role : str
A user role. If specified, then take those deployments that the user can view, then filter them down to those that the user has the specified role for, and return only them. Allowed options are
OWNER
andUSER
.- service_health : list of str
A list of service health status values. If specified, then only deployments whose service health status is one of these will be returned. See
datarobot.enums.DEPLOYMENT_SERVICE_HEALTH_STATUS
for allowed values. Supports comma-separated lists.- model_health : list of str
A list of model health status values. If specified, then only deployments whose model health status is one of these will be returned. See
datarobot.enums.DEPLOYMENT_MODEL_HEALTH_STATUS
for allowed values. Supports comma-separated lists.- accuracy_health : list of str
A list of accuracy health status values. If specified, then only deployments whose accuracy health status is one of these will be returned. See
datarobot.enums.DEPLOYMENT_ACCURACY_HEALTH_STATUS
for allowed values. Supports comma-separated lists.- execution_environment_type : list of str
A list of strings representing the type of the deployments’ execution environment. If provided, then only return those deployments whose execution environment type is one of those provided. See
datarobot.enums.DEPLOYMENT_EXECUTION_ENVIRONMENT_TYPE
for allowed values. Supports comma-separated lists.- materiality : list of str
A list of strings representing the deployments’ “materiality” (also known as “importance” in the UI). If provided, then only return those deployments whose materiality is one of those provided. See
datarobot.enums.DEPLOYMENT_MATERIALITY
for allowed values. Supports comma-separated lists. Note that Approval Workflows must be enabled for your account to use this filter, otherwise the API will return a 403.
Examples
Multiple filters can be combined in interesting ways to return very specific subsets of deployments.
Performing AND logic
Providing multiple different parameters will result in AND logic between them. For example, the following will return all deployments that I own whose service health status is failing.
from datarobot import Deployment from datarobot.models.deployment import DeploymentListFilters from datarobot.enums import DEPLOYMENT_SERVICE_HEALTH filters = DeploymentListFilters( role='OWNER', service_health=[DEPLOYMENT_SERVICE_HEALTH.FAILING] ) deployments = Deployment.list(filters=filters)
Performing OR logic
Some filters support comma-separated lists (and will say so if they do). Providing a comma-separated list of values to a single filter performs OR logic between those values. For example, the following will return all deployments whose service health is either
warning
ORfailing
.from datarobot import Deployment from datarobot.models.deployment import DeploymentListFilters from datarobot.enums import DEPLOYMENT_SERVICE_HEALTH filters = DeploymentListFilters( service_health=[ DEPLOYMENT_SERVICE_HEALTH.WARNING, DEPLOYMENT_SERVICE_HEALTH.FAILING, ] ) deployments = Deployment.list(filters=filters)
Performing OR logic across different filter types is not supported.
Note
In all cases, you may only retrieve deployments for which you have at least the USER role for. Deployments for which you are a CONSUMER of will not be returned, regardless of the filters applied.
-
class
datarobot.models.
ServiceStats
(period=None, metrics=None, model_id=None)¶ Deployment service stats information.
Attributes: - model_id : str
the model used to retrieve service stats metrics
- period : dict
the time period used to retrieve service stats metrics
- metrics : dict
the service stats metrics
-
classmethod
get
(deployment_id, model_id=None, start_time=None, end_time=None, execution_time_quantile=None, response_time_quantile=None, slow_requests_threshold=None)¶ Retrieve value of service stat metrics over a certain time period.
New in version v2.18.
Parameters: - deployment_id : str
the id of the deployment
- model_id : str, optional
the id of the model
- start_time : datetime, optional
start of the time period
- end_time : datetime, optional
end of the time period
- execution_time_quantile : float, optional
quantile for executionTime, defaults to 0.5
- response_time_quantile : float, optional
quantile for responseTime, defaults to 0.5
- slow_requests_threshold : float, optional
threshold for slowRequests, defaults to 1000
Returns: - service_stats : ServiceStats
the queried service stats metrics
-
class
datarobot.models.
ServiceStatsOverTime
(buckets=None, summary=None, metric=None, model_id=None)¶ Deployment service stats over time information.
Attributes: - model_id : str
the model used to retrieve accuracy metric
- metric : str
the service stat metric being retrieved
- buckets : dict
how the service stat metric changes over time
- summary : dict
summary for the service stat metric
-
classmethod
get
(deployment_id, metric=None, model_id=None, start_time=None, end_time=None, bucket_size=None, quantile=None, threshold=None)¶ Retrieve information about how a service stat metric changes over a certain time period.
New in version v2.18.
Parameters: - deployment_id : str
the id of the deployment
- metric : SERVICE_STAT_METRIC, optional
the service stat metric to retrieve
- model_id : str, optional
the id of the model
- start_time : datetime, optional
start of the time period
- end_time : datetime, optional
end of the time period
- bucket_size : str, optional
time duration of a bucket, in ISO 8601 time duration format
- quantile : float, optional
quantile for ‘executionTime’ or ‘responseTime’, ignored when querying other metrics
- threshold : int, optional
threshold for ‘slowQueries’, ignored when querying other metrics
Returns: - service_stats_over_time : ServiceStatsOverTime
the queried service stat over time information
-
bucket_values
¶ The metric value for all time buckets, keyed by start time of the bucket.
Returns: - bucket_values: OrderedDict
-
class
datarobot.models.
Accuracy
(period=None, metrics=None, model_id=None)¶ Deployment accuracy information.
Attributes: - model_id : str
the model used to retrieve accuracy metrics
- period : dict
the time period used to retrieve accuracy metrics
- metrics : dict
the accuracy metrics
-
classmethod
get
(deployment_id, model_id=None, start_time=None, end_time=None)¶ Retrieve values of accuracy metrics over a certain time period.
New in version v2.18.
Parameters: - deployment_id : str
the id of the deployment
- model_id : str
the id of the model
- start_time : datetime
start of the time period
- end_time : datetime
end of the time period
Returns: - accuracy : Accuracy
the queried accuracy metrics information
Examples
from datarobot import Deployment, AccuracyOverTime deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') accuracy = Accuracy.get(deployment.id) accuracy.period['end'] >>>'2019-08-01 00:00:00+00:00' accuracy.metric['LogLoss']['value'] >>>0.7533 accuracy.metric_values['LogLoss'] >>>0.7533
-
metric_values
¶ The value for all metrics, keyed by metric name.
Returns: - metric_values: OrderedDict
-
metric_baselines
¶ The baseline value for all metrics, keyed by metric name.
Returns: - metric_baselines: OrderedDict
-
percent_changes
¶ The percent change of value over baseline for all metrics, keyed by metric name.
Returns: - percent_changes: OrderedDict
-
class
datarobot.models.
AccuracyOverTime
(buckets=None, summary=None, baseline=None, metric=None, model_id=None)¶ Deployment accuracy over time information.
Attributes: - model_id : str
the model used to retrieve accuracy metric
- metric : str
the accuracy metric being retrieved
- buckets : dict
how the accuracy metric changes over time
- summary : dict
summary for the accuracy metric
- baseline : dict
baseline for the accuracy metric
-
classmethod
get
(deployment_id, metric=None, model_id=None, start_time=None, end_time=None, bucket_size=None)¶ Retrieve information about how an accuracy metric changes over a certain time period.
New in version v2.18.
Parameters: - deployment_id : str
the id of the deployment
- metric : ACCURACY_METRIC
the accuracy metric to retrieve
- model_id : str
the id of the model
- start_time : datetime
start of the time period
- end_time : datetime
end of the time period
- bucket_size : str
time duration of a bucket, in ISO 8601 time duration format
Returns: - accuracy_over_time : AccuracyOverTime
the queried accuracy metric over time information
Examples
from datarobot import Deployment, AccuracyOverTime from datarobot.enums import ACCURACY_METRICS deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') accuracy_over_time = AccuracyOverTime.get(deployment.id, metric=ACCURACY_METRIC.LOGLOSS) accuracy_over_time.metric >>>'LogLoss' accuracy_over_time.metric_values >>>{datetime.datetime(2019, 8, 1): 0.73, datetime.datetime(2019, 8, 2): 0.55}
-
classmethod
get_as_dataframe
(deployment_id, metrics, model_id=None, start_time=None, end_time=None, bucket_size=None)¶ Retrieve information about how a list of accuracy metrics change over a certain time period as pandas DataFrame.
In the returned DataFrame, the columns corresponds to the metrics being retrieved; the rows are labeled with the start time of each bucket.
Parameters: - deployment_id : str
the id of the deployment
- metrics : [ACCURACY_METRIC]
the accuracy metrics to retrieve
- model_id : str
the id of the model
- start_time : datetime
start of the time period
- end_time : datetime
end of the time period
- bucket_size : str
time duration of a bucket, in ISO 8601 time duration format
Returns: - accuracy_over_time: pd.DataFrame
-
bucket_values
¶ The metric value for all time buckets, keyed by start time of the bucket.
Returns: - bucket_values: OrderedDict
-
bucket_sample_sizes
¶ The sample size for all time buckets, keyed by start time of the bucket.
Returns: - bucket_sample_sizes: OrderedDict
Feature¶
-
class
datarobot.models.
Feature
(id, project_id=None, name=None, feature_type=None, importance=None, low_information=None, unique_count=None, na_count=None, date_format=None, min=None, max=None, mean=None, median=None, std_dev=None, time_series_eligible=None, time_series_eligibility_reason=None, time_step=None, time_unit=None, target_leakage=None, key_summary=None)¶ A feature from a project’s dataset
These are features either included in the originally uploaded dataset or added to it via feature transformations. In time series projects, these will be distinct from the
ModelingFeature
s created during partitioning; otherwise, they will correspond to the same features. For more information about input and modeling features, see the time series documentation.The
min
,max
,mean
,median
, andstd_dev
attributes provide information about the distribution of the feature in the EDA sample data. For non-numeric features or features created prior to these summary statistics becoming available, they will be None. For features where the summary statistics are available, they will be in a format compatible with the data type, i.e. date type features will have their summary statistics expressed as ISO-8601 formatted date strings.Attributes: - id : int
the id for the feature - note that name is used to reference the feature instead of id
- project_id : str
the id of the project the feature belongs to
- name : str
the name of the feature
- feature_type : str
the type of the feature, e.g. ‘Categorical’, ‘Text’
- importance : float or None
numeric measure of the strength of relationship between the feature and target (independent of any model or other features); may be None for non-modeling features such as partition columns
- low_information : bool
whether a feature is considered too uninformative for modeling (e.g. because it has too few values)
- unique_count : int
number of unique values
- na_count : int or None
number of missing values
- date_format : str or None
For Date features, the date format string for how this feature was interpreted, compatible with https://docs.python.org/2/library/time.html#time.strftime . For other feature types, None.
- min : str, int, float, or None
The minimum value of the source data in the EDA sample
- max : str, int, float, or None
The maximum value of the source data in the EDA sample
- mean : str, int, or, float
The arithmetic mean of the source data in the EDA sample
- median : str, int, float, or None
The median of the source data in the EDA sample
- std_dev : str, int, float, or None
The standard deviation of the source data in the EDA sample
- time_series_eligible : bool
Whether this feature can be used as the datetime partition column in a time series project.
- time_series_eligibility_reason : str
Why the feature is ineligible for the datetime partition column in a time series project, or ‘suitable’ when it is eligible.
- time_step : int or None
For time series eligible features, a positive integer determining the interval at which windows can be specified. If used as the datetime partition column on a time series project, the feature derivation and forecast windows must start and end at an integer multiple of this value. None for features that are not time series eligible.
- time_unit : str or None
For time series eligible features, the time unit covered by a single time step, e.g. ‘HOUR’, or None for features that are not time series eligible.
- target_leakage : str
Whether a feature is considered to have target leakage or not. A value of ‘SKIPPED_DETECTION’ indicates that target leakage detection was not run on the feature. ‘FALSE’ indicates no leakage, ‘MODERATE’ indicates a moderate risk of target leakage, and ‘HIGH_RISK’ indicates a high risk of target leakage
- key_summary: list of dict
Statistics for top 50 keys (truncated to 103 characters) of Summarized Categorical column example:
{‘key’:’DataRobot’, ‘summary’:{‘min’:0, ‘max’:29815.0, ‘stdDev’:6498.029, ‘mean’:1490.75, ‘median’:0.0, ‘pctRows’:5.0}}
- where,
- key: string or None
name of the key
- summary: dict
statistics of the key
max: maximum value of the key. min: minimum value of the key. mean: mean value of the key. median: median value of the key. stdDev: standard deviation of the key. pctRows: percentage occurrence of key in the EDA sample of the feature.
-
classmethod
get
(project_id, feature_name)¶ Retrieve a single feature
Parameters: - project_id : str
The ID of the project the feature is associated with.
- feature_name : str
The name of the feature to retrieve
Returns: - feature : Feature
The queried instance
-
get_multiseries_properties
(multiseries_id_columns, max_wait=600)¶ Retrieve time series properties for a potential multiseries datetime partition column
Multiseries time series projects use multiseries id columns to model multiple distinct series within a single project. This function returns the time series properties (time step and time unit) of this column if it were used as a datetime partition column with the specified multiseries id columns, running multiseries detection automatically if it had not previously been successfully ran.
Parameters: - multiseries_id_columns : list of str
the name(s) of the multiseries id columns to use with this datetime partition column. Currently only one multiseries id column is supported.
- max_wait : int, optional
if a multiseries detection task is run, the maximum amount of time to wait for it to complete before giving up
Returns: - properties : dict
A dict with three keys:
- time_series_eligible : bool, whether the column can be used as a partition column
- time_unit : str or null, the inferred time unit if used as a partition column
- time_step : int or null, the inferred time step if used as a partition column
-
get_cross_series_properties
(datetime_partition_column, cross_series_group_by_columns, max_wait=600)¶ Retrieve cross-series properties for multiseries ID column.
This function returns the cross-series properties (eligibility as group-by column) of this column if it were used with specified datetime partition column and with current multiseries id column, running cross-series group-by validation automatically if it had not previously been successfully ran.
Parameters: - datetime_partition_column : datetime partition column
- cross_series_group_by_columns : list of str
the name(s) of the columns to use with this multiseries ID column. Currently only one cross-series group-by column is supported.
- max_wait : int, optional
if a multiseries detection task is run, the maximum amount of time to wait for it to complete before giving up
Returns: - properties : dict
A dict with three keys:
- name : str, column name
- eligibility : str, reason for column eligibility
- isEligible : bool, is column eligible as cross-series group-by
-
class
datarobot.models.
ModelingFeature
(project_id=None, name=None, feature_type=None, importance=None, low_information=None, unique_count=None, na_count=None, date_format=None, min=None, max=None, mean=None, median=None, std_dev=None, parent_feature_names=None, key_summary=None)¶ A feature used for modeling
In time series projects, a new set of modeling features is created after setting the partitioning options. These features are automatically derived from those in the project’s dataset and are the features used for modeling. Modeling features are only accessible once the target and partitioning options have been set. In projects that don’t use time series modeling, once the target has been set, ModelingFeatures and Features will behave the same.
For more information about input and modeling features, see the time series documentation.
As with the
Feature
object, the min, max, `mean, median, and std_dev attributes provide information about the distribution of the feature in the EDA sample data. For non-numeric features, they will be None. For features where the summary statistics are available, they will be in a format compatible with the data type, i.e. date type features will have their summary statistics expressed as ISO-8601 formatted date strings.Attributes: - project_id : str
the id of the project the feature belongs to
- name : str
the name of the feature
- feature_type : str
the type of the feature, e.g. ‘Categorical’, ‘Text’
- importance : float or None
numeric measure of the strength of relationship between the feature and target (independent of any model or other features); may be None for non-modeling features such as partition columns
- low_information : bool
whether a feature is considered too uninformative for modeling (e.g. because it has too few values)
- unique_count : int
number of unique values
- na_count : int or None
number of missing values
- date_format : str or None
For Date features, the date format string for how this feature was interpreted, compatible with https://docs.python.org/2/library/time.html#time.strftime . For other feature types, None.
- min : str, int, float, or None
The minimum value of the source data in the EDA sample
- max : str, int, float, or None
The maximum value of the source data in the EDA sample
- mean : str, int, or, float
The arithmetic mean of the source data in the EDA sample
- median : str, int, float, or None
The median of the source data in the EDA sample
- std_dev : str, int, float, or None
The standard deviation of the source data in the EDA sample
- parent_feature_names : list of str
A list of the names of input features used to derive this modeling feature. In cases where the input features and modeling features are the same, this will simply contain the feature’s name. Note that if a derived feature was used to create this modeling feature, the values here will not necessarily correspond to the features that must be supplied at prediction time.
- key_summary: list of dict
Statistics for top 50 keys (truncated to 103 characters) of Summarized Categorical column example:
{‘key’:’DataRobot’, ‘summary’:{‘min’:0, ‘max’:29815.0, ‘stdDev’:6498.029, ‘mean’:1490.75, ‘median’:0.0, ‘pctRows’:5.0}}
- where,
- key: string or None
name of the key
- summary: dict
statistics of the key
max: maximum value of the key. min: minimum value of the key. mean: mean value of the key. median: median value of the key. stdDev: standard deviation of the key. pctRows: percentage occurrence of key in the EDA sample of the feature.
-
classmethod
get
(project_id, feature_name)¶ Retrieve a single modeling feature
Parameters: - project_id : str
The ID of the project the feature is associated with.
- feature_name : str
The name of the feature to retrieve
Returns: - feature : ModelingFeature
The requested feature
-
class
datarobot.models.
DatasetFeature
(id_, dataset_id=None, dataset_version_id=None, name=None, feature_type=None, low_information=None, unique_count=None, na_count=None, date_format=None, min_=None, max_=None, mean=None, median=None, std_dev=None, time_series_eligible=None, time_series_eligibility_reason=None, time_step=None, time_unit=None, target_leakage=None, target_leakage_reason=None)¶ A feature from a project’s dataset
These are features either included in the originally uploaded dataset or added to it via feature transformations.
The
min
,max
,mean
,median
, andstd_dev
attributes provide information about the distribution of the feature in the EDA sample data. For non-numeric features or features created prior to these summary statistics becoming available, they will be None. For features where the summary statistics are available, they will be in a format compatible with the data type, i.e. date type features will have their summary statistics expressed as ISO-8601 formatted date strings.Attributes: - id : int
the id for the feature - note that name is used to reference the feature instead of id
- dataset_id : str
the id of the dataset the feature belongs to
- dataset_version_id : str
the id of the dataset version the feature belongs to
- name : str
the name of the feature
- feature_type : str
the type of the feature, e.g. ‘Categorical’, ‘Text’
- low_information : bool
whether a feature is considered too uninformative for modeling (e.g. because it has too few values)
- unique_count : int
number of unique values
- na_count : int or None
number of missing values
- date_format : str or None
For Date features, the date format string for how this feature was interpreted, compatible with https://docs.python.org/2/library/time.html#time.strftime . For other feature types, None.
- min : str, int, float, or None
The minimum value of the source data in the EDA sample
- max : str, int, float, or None
The maximum value of the source data in the EDA sample
- mean : str, int, or, float
The arithmetic mean of the source data in the EDA sample
- median : str, int, float, or None
The median of the source data in the EDA sample
- std_dev : str, int, float, or None
The standard deviation of the source data in the EDA sample
- time_series_eligible : bool
Whether this feature can be used as the datetime partition column in a time series project.
- time_series_eligibility_reason : str
Why the feature is ineligible for the datetime partition column in a time series project, or ‘suitable’ when it is eligible.
- time_step : int or None
For time series eligible features, a positive integer determining the interval at which windows can be specified. If used as the datetime partition column on a time series project, the feature derivation and forecast windows must start and end at an integer multiple of this value. None for features that are not time series eligible.
- time_unit : str or None
For time series eligible features, the time unit covered by a single time step, e.g. ‘HOUR’, or None for features that are not time series eligible.
- target_leakage : str
Whether a feature is considered to have target leakage or not. A value of ‘SKIPPED_DETECTION’ indicates that target leakage detection was not run on the feature. ‘FALSE’ indicates no leakage, ‘MODERATE’ indicates a moderate risk of target leakage, and ‘HIGH_RISK’ indicates a high risk of target leakage
- target_leakage_reason: string, optional
The descriptive text explaining the reason for target leakage, if any.
-
get_histogram
(bin_limit=None)¶ Retrieve a feature histogram
Parameters: - bin_limit : int or None
Desired max number of histogram bins. If omitted, by default endpoint will use 60.
Returns: - featureHistogram : DatasetFeatureHistogram
The requested histogram with desired number or bins
-
class
datarobot.models.
DatasetFeatureHistogram
(plot)¶ -
classmethod
get
(dataset_id, feature_name, bin_limit=None, key_name=None)¶ Retrieve a single feature histogram
Parameters: - dataset_id : str
The ID of the Dataset the feature is associated with.
- feature_name : str
The name of the feature to retrieve
- bin_limit : int or None
Desired max number of histogram bins. If omitted, by default the endpoint will use 60.
- key_name: string or None
(Only required for summarized categorical feature) Name of the top 50 keys for which plot to be retrieved
Returns: - featureHistogram : FeatureHistogram
The queried instance with plot attribute in it.
-
classmethod
-
class
datarobot.models.
FeatureHistogram
(plot)¶ -
classmethod
get
(project_id, feature_name, bin_limit=None, key_name=None)¶ Retrieve a single feature histogram
Parameters: - project_id : str
The ID of the project the feature is associated with.
- feature_name : str
The name of the feature to retrieve
- bin_limit : int or None
Desired max number of histogram bins. If omitted, by default endpoint will use 60.
- key_name: string or None
(Only required for summarized categorical feature) Name of the top 50 keys for which plot to be retrieved
Returns: - featureHistogram : FeatureHistogram
The queried instance with plot attribute in it.
-
classmethod
Feature Engineering¶
-
class
datarobot.models.
FeatureEngineeringGraph
(id=None, name=None, description=None, created=None, last_modified=None, creator_full_name=None, modifier_full_name=None, creator_user_id=None, last_modified_user_id=None, number_of_projects=None, linkage_keys=None, table_definitions=None, relationships=None, time_unit=None, feature_derivation_window_start=None, feature_derivation_window_end=None, is_draft=True)¶ A Feature Engineering Graph for the Project. A Feature Engineering Graph is graph which allow to specify relationships between two or more tables so it can automatically generate features from that
Attributes: - id : str
the id of the created feature engineering graph
- name: str
name of the feature engineering graph
- description: str
description of the feature engineering graph
- created: datetime.datetime
creation date of the feature engineering graph
- creator_user_id: str
id of the user who created the feature engineering graph
- creator_full_name: str
full name of the user who created the feature engineering graph
- last_modified: datetime.datetime
last modification date of the feature engineering graph
- last_modified_user_id: str
id of the user who last modified the feature engineering graph
- modifier_full_name: str
full name of the user who last modified the feature engineering graph
- number_of_projects: int
number of projects that are used in the feature engineering graph
- linkage_keys: list os str
a list of strings specifying the name of the columns that link the feature engineering graph with the primary table.
- table_definitions: list
each element is a table_definition for a table.
- relationships: list
each element is a relationship between two tables
- time_unit: str, or None
time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR. If present, the feature engineering Graph will perform time-aware joins.
- feature_derivation_window_start: int, or None
how many time_units of each table’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.
- feature_derivation_window_end: int, or None
how many timeUnits of each table’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.
- is_draft: bool (default=True)
a draft (is_draft=True) feature engineering graph can be updated, while a non-draft(is_draft=False) feature engineering graph is immutable
- The `table_defintions` structure is
- identifier: str
alias of the table (used directly as part of the generated feature names)
- catalog_id: str, or None
identifier of the catalog item
- catalog_version_id: str
identifier of the catalog item version
- feature_list_id: str, or None
identifier of the feature list. This decides which columns in the table are used for feature generation
- primary_temporal_key: str, or None
name of the column indicating time of record creation
- snapshot_policy: str
policy to use when creating a project or making predictions. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)
- feature_lists: list
list of feature list info
- data_source: dict
data source info if the table is from data source
- is_deleted: bool or None
whether the table is deleted or not
- The `relationship` structure is
- table1_identifier: str or None
identifier of the first table in this relationship. This is specified in the indentifier field of table_definition structure. If None, then the relationship is with the primary dataset.
- table2_identifier: str
identifier of the second table in this relationship. This is specified in the identifier field of table_definition schema.
- table1_keys: list of str (max length: 10 min length: 1)
column(s) from the first table which are used to join to the second table
- table2_keys: list of str (max length: 10 min length: 1)
column(s) from the second table that are used to join to the first table
- The `feature list info` structure is
- id : str
the id of the featurelist
- name : str
the name of the featurelist
- features : list of str
the names of all the Features in the featurelist
- dataset_id : str
the project the featurelist belongs to
- creation_date : datetime.datetime
when the featurelist was created
- user_created : bool
whether the featurelist was created by a user or by DataRobot automation
- created_by: str
the name of user who created it
- description : str
the description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.
- dataset_id: str
dataset which is associated with the feature list
- dataset_version_id: str or None
version of the dataset which is associated with feature list. Only relevant for Informative features
- The `data source info` structured is
- data_store_id: str
the id of the data store.
- data_store_name : str
the user-friendly name of the data store.
- url : str
the url used to connect to the data store.
- dbtable : str
the name of table from the data store.
- schema: str
schema definition of the table from the data store
-
classmethod
create
(name, description, table_definitions, relationships, time_unit=None, feature_derivation_window_start=None, feature_derivation_window_end=None, is_draft=True)¶ Create a feature engineering graph.
Parameters: - name : str
the name of the feature engineering graph
- description : str
the description of the feature engineering graph
- table_definitions: list of dict
each element is a TableDefinition for a table. The TableDefinition schema is
- identifier: str
alias of the table (used directly as part of the generated feature names)
- catalog_id: str, or None
identifier of the catalog item
- catalog_version_id: str
identifier of the catalog item version
- feature_list_id: str, or None
identifier of the feature list. This decides which columns in the table are used for feature generation
- primary_temporal_key: str, or None
name of the column indicating time of record creation
- snapshot_policy: str
policy to use when creating a project or making predictions. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)
- relationships: list of dict
each element is a Relationship between two tables The Relationship schema is
- table1_identifier: str or None
identifier of the first table in this relationship. This is specified in the indentifier field of table_definition structure. If None, then the relationship is with the primary dataset.
- table2_identifier: str
identifier of the second table in this relationship. This is specified in the identifier field of table_definition schema.
- table1_keys: list of str (max length: 10 min length: 1)
column(s) from the first table which are used to join to the second table
- table2_keys: list of str (max length: 10 min length: 1)
column(s) from the second table that are used to join to the first table
- time_unit: str, or None
time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR. If present, the feature engineering Graph will perform time-aware joins.
- feature_derivation_window_start: int, or None
how many time_units of each table’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.
- feature_derivation_window_end: int, or None
how many timeUnits of each table’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.
- is_draft: bool (default=True)
a draft (is_draft=True) feature engineering graph can be updated, while a non-draft(is_draft=False) feature engineering graph is immutable
Returns: - feature_engineering_graphs: FeatureEngineeringGraph
the created feature engineering graph
-
replace
(id, name, description, table_definitions, relationships, time_unit=None, feature_derivation_window_start=None, feature_derivation_window_end=None, is_draft=True)¶ Replace a feature engineering graph.
Parameters: - id : str
the id of the created feature engineering graph
- name : str
the name of the feature engineering graph
- description : str
the description of the feature engineering graph
- items: list of dict
each element is a TableDefinition for a table. The TableDefinition schema is
- identifier: str
alias of the table (used directly as part of the generated feature names)
- catalog_id: str, or None
identifier of the catalog item
- catalog_version_id: str
identifier of the catalog item version
- feature_list_id: str, or None
identifier of the feature list. This decides which columns in the table are used for feature generation
- primary_temporal_key: str, or None
name of the column indicating time of record creation
- snapshot_policy: str
policy to use when creating a project or making predictions. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)
- relationships: list of dict
each element is a Relationship between two tables The Relationship schema is
- table1_identifier: str or None
identifier of the first table in this relationship. This is specified in the indentifier field of table_definition structure. If None, then the relationship is with the primary dataset.
- table2_identifier: str
identifier of the second table in this relationship. This is specified in the identifier field of table_definition schema.
- table1_keys: list of str (max length: 10 min length: 1)
column(s) from the first table which are used to join to the second table
- table2_keys: list of str (max length: 10 min length: 1)
column(s) from the second table that are used to join to the first table
- time_unit: str, or None
time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR. If present, the feature engineering Graph will perform time-aware joins.
- feature_derivation_window_start: int, or None
how many time_units of each table’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.
- feature_derivation_window_end: int, or None
how many timeUnits of each table’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.
- is_draft: bool (default=True)
a draft (is_draft=True) feature engineering graph can be updated, while a non-draft(is_draft=False) feature engineering graph is immutable
Returns: - feature_engineering_graphs: FeatureEngineeringGraph
the updated feature engineering graph
-
update
(name, description)¶ Update the Feature engineering graph name and description.
Parameters: - name : str
the name of the feature engineering graph
- description : str
the description of the feature engineering graph
-
classmethod
get
(feature_engineering_graph_id)¶ Retrieve a single feature engineering graph
Parameters: - feature_engineering_graph_id : str
The ID of the feature engineering graph to retrieve.
Returns: - feature_engineering_graph : FeatureEngineeringGraph
The requested feature engineering graph
-
classmethod
list
(project_id=None, secondary_dataset_id=None, include_drafts=None)¶ Returns list of feature engineering graphs.
Parameters: - project_id: str, optional
The Id of project to filter the feature engineering graph list for returning only those feature engineering Graphs which are related to this project If not specified, it will return all the feature engineering graphs irrespective of the project
- secondary_dataset_id: str, optional
ID of the dataset to filter feature engineering graphs which use the dataset as the secondary dataset If not specified, return all the feature engineering graphs without filtering on secondary dataset id.
- include_drafts: bool (default=False)
include draft feature engineering graphs If True, return all the draft (mutable) as well as non-draft (immutable) feature engineering graphs
Returns: - feature_engineering_graphs : list of FeatureEngineeringGraph instances
a list of available feature engineering graphs.
-
delete
()¶ Delete the Feature Engineering Graph
Modify the ability of users to access this feature engineering graph
Parameters: - access_list : list of
SharingAccess
the modifications to make.
Raises: - datarobot.ClientError :
if you do not have permission to share this feature engineering graph or if the user you’re sharing with doesn’t exist
- access_list : list of
-
get_access_list
()¶ Retrieve what users have access to this feature engineering graph
Returns: - list of :class:`SharingAccess <datarobot.SharingAccess>`
Feature List¶
-
class
datarobot.
DatasetFeaturelist
(id=None, name=None, features=None, dataset_id=None, dataset_version_id=None, creation_date=None, created_by=None, user_created=None, description=None)¶ A set of features attached to a dataset in the AI Catalog
Attributes: - id : str
the id of the dataset featurelist
- dataset_id : str
the id of the dataset the featurelist belongs to
- dataset_version_id: str, optional
the version id of the dataset this featurelist belongs to
- name : str
the name of the dataset featurelist
- features : list of str
a list of the names of features included in this dataset featurelist
- creation_date : datetime.datetime
when the featurelist was created
- created_by : str
the user name of the user who created this featurelist
- user_created : bool
whether the featurelist was created by a user or by DataRobot automation
- description : basestring, optional
the description of the featurelist. Only present on DataRobot-created featurelists.
-
classmethod
get
(dataset_id, featurelist_id)¶ Retrieve a dataset featurelist
Parameters: - dataset_id : str
the id of the dataset the featurelist belongs to
- featurelist_id : str
the id of the dataset featurelist to retrieve
Returns: - featurelist : DatasetFeatureList
the specified featurelist
-
delete
()¶ Delete a dataset featurelist
Featurelists configured into the dataset as a default featurelist cannot be deleted.
-
update
(name=None)¶ Update the name of an existing featurelist
Note that only user-created featurelists can be renamed, and that names must not conflict with names used by other featurelists.
Parameters: - name : str, optional
the new name for the featurelist
-
class
datarobot.models.
Featurelist
(id=None, name=None, features=None, project_id=None, created=None, is_user_created=None, num_models=None, description=None)¶ A set of features used in modeling
Attributes: - id : str
the id of the featurelist
- name : str
the name of the featurelist
- features : list of str
the names of all the Features in the featurelist
- project_id : str
the project the featurelist belongs to
- created : datetime.datetime
(New in version v2.13) when the featurelist was created
- is_user_created : bool
(New in version v2.13) whether the featurelist was created by a user or by DataRobot automation
- num_models : int
(New in version v2.13) the number of models currently using this featurelist. A model is considered to use a featurelist if it is used to train the model or as a monotonic constraint featurelist, or if the model is a blender with at least one component model using the featurelist.
- description : basestring
(New in version v2.13) the description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.
-
classmethod
get
(project_id, featurelist_id)¶ Retrieve a known feature list
Parameters: - project_id : str
The id of the project the featurelist is associated with
- featurelist_id : str
The ID of the featurelist to retrieve
Returns: - featurelist : Featurelist
The queried instance
-
delete
(dry_run=False, delete_dependencies=False)¶ Delete a featurelist, and any models and jobs using it
All models using a featurelist, whether as the training featurelist or as a monotonic constraint featurelist, will also be deleted when the deletion is executed and any queued or running jobs using it will be cancelled. Similarly, predictions made on these models will also be deleted. All the entities that are to be deleted with a featurelist are described as “dependencies” of it. To preview the results of deleting a featurelist, call delete with dry_run=True
When deleting a featurelist with dependencies, users must specify delete_dependencies=True to confirm they want to delete the featurelist and all its dependencies. Without that option, only featurelists with no dependencies may be successfully deleted and others will error.
Featurelists configured into the project as a default featurelist or as a default monotonic constraint featurelist cannot be deleted.
Featurelists used in a model deployment cannot be deleted until the model deployment is deleted.
Parameters: - dry_run : bool, optional
specify True to preview the result of deleting the featurelist, instead of actually deleting it.
- delete_dependencies : bool, optional
specify True to successfully delete featurelists with dependencies; if left False by default, featurelists without dependencies can be successfully deleted and those with dependencies will error upon attempting to delete them.
Returns: - result : dict
- A dictionary describing the result of deleting the featurelist, with the following keys
- dry_run : bool, whether the deletion was a dry run or an actual deletion
- can_delete : bool, whether the featurelist can actually be deleted
- deletion_blocked_reason : str, why the featurelist can’t be deleted (if it can’t)
- num_affected_models : int, the number of models using this featurelist
- num_affected_jobs : int, the number of jobs using this featurelist
-
update
(name=None, description=None)¶ Update the name or description of an existing featurelist
Note that only user-created featurelists can be renamed, and that names must not conflict with names used by other featurelists.
Parameters: - name : str, optional
the new name for the featurelist
- description : str, optional
the new description for the featurelist
-
class
datarobot.models.
ModelingFeaturelist
(id=None, name=None, features=None, project_id=None, created=None, is_user_created=None, num_models=None, description=None)¶ A set of features that can be used to build a model
In time series projects, a new set of modeling features is created after setting the partitioning options. These features are automatically derived from those in the project’s dataset and are the features used for modeling. Modeling features are only accessible once the target and partitioning options have been set. In projects that don’t use time series modeling, once the target has been set, ModelingFeaturelists and Featurelists will behave the same.
For more information about input and modeling features, see the time series documentation.
Attributes: - id : str
the id of the modeling featurelist
- project_id : str
the id of the project the modeling featurelist belongs to
- name : str
the name of the modeling featurelist
- features : list of str
a list of the names of features included in this modeling featurelist
- created : datetime.datetime
(New in version v2.13) when the featurelist was created
- is_user_created : bool
(New in version v2.13) whether the featurelist was created by a user or by DataRobot automation
- num_models : int
(New in version v2.13) the number of models currently using this featurelist. A model is considered to use a featurelist if it is used to train the model or as a monotonic constraint featurelist, or if the model is a blender with at least one component model using the featurelist.
- description : basestring
(New in version v2.13) the description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.
-
classmethod
get
(project_id, featurelist_id)¶ Retrieve a modeling featurelist
Modeling featurelists can only be retrieved once the target and partitioning options have been set.
Parameters: - project_id : str
the id of the project the modeling featurelist belongs to
- featurelist_id : str
the id of the modeling featurelist to retrieve
Returns: - featurelist : ModelingFeaturelist
the specified featurelist
-
delete
(dry_run=False, delete_dependencies=False)¶ Delete a featurelist, and any models and jobs using it
All models using a featurelist, whether as the training featurelist or as a monotonic constraint featurelist, will also be deleted when the deletion is executed and any queued or running jobs using it will be cancelled. Similarly, predictions made on these models will also be deleted. All the entities that are to be deleted with a featurelist are described as “dependencies” of it. To preview the results of deleting a featurelist, call delete with dry_run=True
When deleting a featurelist with dependencies, users must specify delete_dependencies=True to confirm they want to delete the featurelist and all its dependencies. Without that option, only featurelists with no dependencies may be successfully deleted and others will error.
Featurelists configured into the project as a default featurelist or as a default monotonic constraint featurelist cannot be deleted.
Featurelists used in a model deployment cannot be deleted until the model deployment is deleted.
Parameters: - dry_run : bool, optional
specify True to preview the result of deleting the featurelist, instead of actually deleting it.
- delete_dependencies : bool, optional
specify True to successfully delete featurelists with dependencies; if left False by default, featurelists without dependencies can be successfully deleted and those with dependencies will error upon attempting to delete them.
Returns: - result : dict
- A dictionary describing the result of deleting the featurelist, with the following keys
- dry_run : bool, whether the deletion was a dry run or an actual deletion
- can_delete : bool, whether the featurelist can actually be deleted
- deletion_blocked_reason : str, why the featurelist can’t be deleted (if it can’t)
- num_affected_models : int, the number of models using this featurelist
- num_affected_jobs : int, the number of jobs using this featurelist
-
update
(name=None, description=None)¶ Update the name or description of an existing featurelist
Note that only user-created featurelists can be renamed, and that names must not conflict with names used by other featurelists.
Parameters: - name : str, optional
the new name for the featurelist
- description : str, optional
the new description for the featurelist
Job¶
-
class
datarobot.models.
Job
(data, completed_resource_url=None)¶ Tracks asynchronous work being done within a project
Attributes: - id : int
the id of the job
- project_id : str
the id of the project the job belongs to
- status : str
the status of the job - will be one of
datarobot.enums.QUEUE_STATUS
- job_type : str
what kind of work the job is doing - will be one of
datarobot.enums.JOB_TYPE
- is_blocked : bool
if true, the job is blocked (cannot be executed) until its dependencies are resolved
-
classmethod
get
(project_id, job_id)¶ Fetches one job.
Parameters: - project_id : str
The identifier of the project in which the job resides
- job_id : str
The job id
Returns: - job : Job
The job
Raises: - AsyncFailureError
Querying this resource gave a status code other than 200 or 303
-
cancel
()¶ Cancel this job. If this job has not finished running, it will be removed and canceled.
-
get_result
(params=None)¶ Parameters: - params : dict, optional
- Query parameters to be added to request to get results.
- For featureEffects and featureFit, source param is required to define source,
- otherwise the default is `training`
Returns: - result : object
- Return type depends on the job type:
- for model jobs, a Model is returned
- for predict jobs, a pandas.DataFrame (with predictions) is returned
- for featureImpact jobs, a list of dicts (see
Model.get_feature_impact
for more detail) - for primeRulesets jobs, a list of Rulesets
- for primeModel jobs, a PrimeModel
- for primeDownloadValidation jobs, a PrimeFile
- for reasonCodesInitialization jobs, a ReasonCodesInitialization
- for reasonCodes jobs, a ReasonCodes
- for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
- for predictionExplanations jobs, a PredictionExplanations
- for featureEffects, a FeatureEffects
- for featureFit, a FeatureFit
Raises: - JobNotFinished
If the job is not finished, the result is not available.
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
-
get_result_when_complete
(max_wait=600, params=None)¶ Parameters: - max_wait : int, optional
How long to wait for the job to finish.
- params : dict, optional
Query parameters to be added to request.
Returns: - result: object
Return type is the same as would be returned by Job.get_result.
Raises: - AsyncTimeoutError
If the job does not finish in time
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
-
refresh
()¶ Update this object with the latest job data from the server.
-
wait_for_completion
(max_wait=600)¶ Waits for job to complete.
Parameters: - max_wait : int, optional
How long to wait for the job to finish.
-
class
datarobot.models.
TrainingPredictionsJob
(data, model_id, data_subset, **kwargs)¶ -
classmethod
get
(project_id, job_id, model_id=None, data_subset=None)¶ Fetches one training predictions job.
The resulting
TrainingPredictions
object will be annotated with model_id and data_subset.Parameters: - project_id : str
The identifier of the project in which the job resides
- job_id : str
The job id
- model_id : str
The identifier of the model used for computing training predictions
- data_subset : dr.enums.DATA_SUBSET, optional
Data subset used for computing training predictions
Returns: - job : TrainingPredictionsJob
The job
-
refresh
()¶ Update this object with the latest job data from the server.
-
cancel
()¶ Cancel this job. If this job has not finished running, it will be removed and canceled.
-
get_result
(params=None)¶ Parameters: - params : dict, optional
- Query parameters to be added to request to get results.
- For featureEffects and featureFit, source param is required to define source,
- otherwise the default is `training`
Returns: - result : object
- Return type depends on the job type:
- for model jobs, a Model is returned
- for predict jobs, a pandas.DataFrame (with predictions) is returned
- for featureImpact jobs, a list of dicts (see
Model.get_feature_impact
for more detail) - for primeRulesets jobs, a list of Rulesets
- for primeModel jobs, a PrimeModel
- for primeDownloadValidation jobs, a PrimeFile
- for reasonCodesInitialization jobs, a ReasonCodesInitialization
- for reasonCodes jobs, a ReasonCodes
- for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
- for predictionExplanations jobs, a PredictionExplanations
- for featureEffects, a FeatureEffects
- for featureFit, a FeatureFit
Raises: - JobNotFinished
If the job is not finished, the result is not available.
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
-
get_result_when_complete
(max_wait=600, params=None)¶ Parameters: - max_wait : int, optional
How long to wait for the job to finish.
- params : dict, optional
Query parameters to be added to request.
Returns: - result: object
Return type is the same as would be returned by Job.get_result.
Raises: - AsyncTimeoutError
If the job does not finish in time
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
-
wait_for_completion
(max_wait=600)¶ Waits for job to complete.
Parameters: - max_wait : int, optional
How long to wait for the job to finish.
-
classmethod
Lift Chart¶
-
class
datarobot.models.lift_chart.
LiftChart
(source, bins, source_model_id, target_class)¶ Lift chart data for model.
Notes
LiftChartBin
is a dict containing the following:actual
(float) Sum of actual target values in binpredicted
(float) Sum of predicted target values in binbin_weight
(float) The weight of the bin. For weighted projects, it is the sum of the weights of the rows in the bin. For unweighted projects, it is the number of rows in the bin.
Attributes: - source : str
Lift chart data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.
- bins : list of dict
List of dicts with schema described as
LiftChartBin
above.- source_model_id : str
ID of the model this lift chart represents; in some cases, insights from the parent of a frozen model may be used
- target_class : str, optional
For multiclass lift - target class for this lift chart data.
Missing Values Report¶
-
class
datarobot.models.missing_report.
MissingValuesReport
(missing_values_report)¶ Missing values report for model, contains list of reports per feature sorted by missing count in descending order.
Notes
Report per feature
contains:feature
: feature name.type
: feature type – ‘Numeric’ or ‘Categorical’.missing_count
: missing values count in training data.missing_percentage
: missing values percentage in training data.tasks
: list of information per each task, which was applied to feature.
task information
contains:id
: a number of task in the blueprint diagram.name
: task name.descriptions
: human readable aggregated information about how the task handles missing values. The following descriptions may be present: what value is imputed for missing values, whether the feature being missing is treated as a feature by the task, whether missing values are treated as infrequent values, whether infrequent values are treated as missing values, and whether missing values are ignored.
-
classmethod
get
(project_id, model_id)¶ Retrieve a missing report.
Parameters: - project_id : str
The project’s id.
- model_id : str
The model’s id.
Returns: - MissingValuesReport
The queried missing report.
Models¶
Model¶
-
class
datarobot.models.
Model
(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, project=None, data=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, use_project_settings=None)¶ A model trained on a project’s dataset capable of making predictions
Attributes: - id : str
the id of the model
- project_id : str
the id of the project the model belongs to
- processes : list of str
the processes used by the model
- featurelist_name : str
the name of the featurelist used by the model
- featurelist_id : str
the id of the featurelist used by the model
- sample_pct : float or None
the percentage of the project dataset used in training the model. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date and training_end_date instead.
- training_row_count : int or None
the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
- training_duration : str or None
only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_date : datetime or None
only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_date : datetime or None
only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- model_type : str
what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
- model_category : str
what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
- is_frozen : bool
whether this model is a frozen model
- blueprint_id : str
the id of the blueprint used in this model
- metrics : dict
a mapping from each metric to the model’s scores for that metric
- monotonic_increasing_featurelist_id : str
optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_id : str
optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraints : bool
optinonal, whether this model supports enforcing monotonic constraints
- is_starred : bool
whether this model marked as starred
- prediction_threshold : float
for binary classification projects, the threshold used for predictions
- prediction_threshold_read_only : bool
indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- model_number : integer
model number assigned to a model
- parent_model_id : str or None
(New in version v2.20) the id of the model that tuning parameters are derived from
- use_project_settings : bool or None
(New in version v2.20) Only present for models in datetime-partitioned projects. If
True
, indicates that the custom backtest partitioning settings specified by the user were used to train the model and evaluate backtest scores.
-
classmethod
get
(project, model_id)¶ Retrieve a specific model.
Parameters: - project : str
The project’s id.
- model_id : str
The
model_id
of the leaderboard item to retrieve.
Returns: - model : Model
The queried instance.
Raises: - ValueError
passed
project
parameter value is of not supported type
-
classmethod
fetch_resource_data
(url, join_endpoint=True)¶ (Deprecated.) Used to acquire model data directly from its url.
Consider using get instead, as this is a convenience function used for development of datarobot
Parameters: - url : str
The resource we are acquiring
- join_endpoint : boolean, optional
Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint
Returns: - model_data : dict
The queried model’s data
-
get_features_used
()¶ Query the server to determine which features were used.
Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.
Returns: - features : list of str
The names of the features used in the model.
-
get_supported_capabilities
()¶ Retrieves a summary of the capabilities supported by a model.
New in version v2.14.
Returns: - supportsBlending: bool
whether the model supports blending
- supportsMonotonicConstraints: bool
whether the model supports monotonic constraints
- hasWordCloud: bool
whether the model has word cloud data available
- eligibleForPrime: bool
whether the model is eligible for Prime
- hasParameters: bool
whether the model has parameters that can be retrieved
- supportsCodeGeneration: bool
(New in version v2.18) whether the model supports code generation
-
delete
()¶ Delete a model from the project’s leaderboard.
-
get_leaderboard_ui_permalink
()¶ Returns: - url : str
Permanent static hyperlink to this model at leaderboard.
-
open_model_browser
()¶ Opens model at project leaderboard in web browser.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
-
train
(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)¶ Train the blueprint used in model on a particular featurelist or amount of data.
This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.
Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.
In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.
Note
For datetime partitioned projects, see
train_datetime
instead.Parameters: - sample_pct : float, optional
The amount of data to use for training, as a percentage of the project dataset from 0 to 100.
- featurelist_id : str, optional
The identifier of the featurelist to use. If not defined, the featurelist of this model is used.
- scoring_type : str, optional
Either
SCORING_TYPE.validation
orSCORING_TYPE.cross_validation
.SCORING_TYPE.validation
is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning,SCORING_TYPE.cross_validation
can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.- training_row_count : int, optional
The number of rows to use to train the requested model.
- monotonic_increasing_featurelist_id : str
(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_id : str
(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.
Returns: - model_job_id : str
id of created job, can be used as parameter to
ModelJob.get
method orwait_for_async_model_creation
function
Examples
project = Project.get('p-id') model = Model.get('p-id', 'l-id') model_job_id = model.train(training_row_count=project.max_train_rows)
-
train_datetime
(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False)¶ Train this model on a different featurelist or amount of data
Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.
Parameters: - featurelist_id : str, optional
the featurelist to use to train the model. If not specified, the featurelist of this model is used.
- training_row_count : int, optional
the number of rows of data that should be used to train the model. If specified, neither
training_duration
noruse_project_settings
may be specified.- training_duration : str, optional
a duration string specifying what time range the data used to train the model should span. If specified, neither
training_row_count
noruse_project_settings
may be specified.- use_project_settings : bool, optional
(New in version v2.20) defaults to
False
. IfTrue
, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neithertraining_row_count
nortraining_duration
may be specified.- time_window_sample_pct : int, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- monotonic_increasing_featurelist_id : str, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_id : str, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.
Returns: - job : ModelJob
the created job to build the model
-
retrain
(sample_pct=None, featurelist_id=None, training_row_count=None)¶ Submit a job to the queue to train a blender model.
Parameters: - sample_pct: str, optional
The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
- featurelist_id : str, optional
The featurelist id
- training_row_count : str, optional
The number of rows to train the model. If this parameter is used then sample_pct should not be given.
Returns: - job : ModelJob
The created job that is retraining the model
-
request_predictions
(dataset_id, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None)¶ Request predictions against a previously uploaded dataset
Parameters: - dataset_id : string
The dataset to make predictions against (as uploaded from Project.upload_dataset)
- include_prediction_intervals : bool, optional
(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_size : int, optional
(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_point : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_date : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_date : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.
Returns: - job : PredictJob
The job computing the predictions
-
get_feature_impact
()¶ Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.
Elsewhere this technique is sometimes called ‘Permutation Importance’.
Requires that Feature Impact has already been computed with
request_feature_impact
.Returns: - feature_impacts : list of dict
The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Raises: - ClientError (404)
If the feature impacts have not been computed.
-
get_multiclass_feature_impact
()¶ For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.
Requires that Feature Impact has already been computed with
request_feature_impact
.Returns: - feature_impacts : list of dict
The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Raises: - ClientError (404)
If the multiclass feature impacts have not been computed.
-
request_feature_impact
()¶ Request feature impacts to be computed for the model.
See
get_feature_impact
for more information on the result of the job.Returns: - job : Job
A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.
Raises: - JobAlreadyRequested (422)
If the feature impacts have already been requested.
-
get_or_request_feature_impact
(max_wait=600)¶ Retrieve feature impact for the model, requesting a job if it hasn’t been run previously
Parameters: - max_wait : int, optional
The maximum time to wait for a requested feature impact job to complete before erroring
Returns: - feature_impacts : list of dict
The feature impact data. See
get_feature_impact
for the exact schema.
-
get_feature_effect_metadata
()¶ - Retrieve Feature Effect metadata. Response contains status and available model sources.
- Feature Fit of training is always available (except for the old project which supports only Feature Fit for validation).
- When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Effect is not available for validation or holdout.
- Feature Effect for holdout is not available when there is no holdout configured for the project.
source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.Returns: - feature_effect_metadata: FeatureEffectMetadata
-
get_feature_fit_metadata
()¶ - Retrieve Feature Fit metadata. Response contains status and available model sources.
- Feature Fit of training is always available (except for the old project which supports only Feature Fit for validation).
- When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Fit is not available for validation or holdout.
- Feature Fit for holdout is not available when there is no holdout configured for the project.
source is expected parameter to retrieve Feature Fit. One of provided sources shall be used.Returns: - feature_effect_metadata: FeatureFitMetadata
-
request_feature_effect
()¶ Request feature effects to be computed for the model.
See
get_feature_effect
for more information on the result of the job.Returns: - job : Job
A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
Raises: - JobAlreadyRequested (422)
If the feature effect have already been requested.
-
get_feature_effect
(source)¶ Retrieve Feature Effects for the model.
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the availiable sources.Parameters: - source : string
The source Feature Effects are retrieved for.
Returns: - feature_effects : FeatureEffects
The feature effects data.
Raises: - ClientError (404)
If the feature effects have not been computed or source is not valid value.
-
get_or_request_feature_effect
(source, max_wait=600)¶ Retrieve feature effect for the model, requesting a job if it hasn’t been run previously
See
get_feature_effect_metadata
for retrieving information of source.Parameters: - max_wait : int, optional
The maximum time to wait for a requested feature effect job to complete before erroring
- source : string
The source Feature Effects are retrieved for.
Returns: - feature_effects : FeatureEffects
The feature effects data.
-
request_feature_fit
()¶ Request feature fit to be computed for the model.
See
get_feature_effect
for more information on the result of the job.Returns: - job : Job
A Job representing the feature fit computation. To get the completed feature fit data, use job.get_result or job.get_result_when_complete.
Raises: - JobAlreadyRequested (422)
If the feature effect have already been requested.
-
get_feature_fit
(source)¶ Retrieve Feature Fit for the model.
Feature Fit provides partial dependence and predicted vs actual values for top-500 features ordered by feature importance score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Fit has already been computed with
request_feature_effect
.See
get_feature_fit_metadata
for retrieving information the availiable sources.Parameters: - source : string
The source Feature Fit are retrieved for. One value of [FeatureFitMetadata.sources].
Returns: - feature_fit : FeatureFit
The feature fit data.
Raises: - ClientError (404)
If the feature fit have not been computed or source is not valid value.
-
get_or_request_feature_fit
(source, max_wait=600)¶ Retrieve feature fit for the model, requesting a job if it hasn’t been run previously
See
get_feature_fit_metadata
for retrieving information of source.Parameters: - max_wait : int, optional
The maximum time to wait for a requested feature fit job to complete before erroring
- source : string
The source Feature Fit are retrieved for. One value of [FeatureFitMetadata.sources].
Returns: - feature_effects : FeatureFit
The feature fit data.
-
get_prime_eligibility
()¶ Check if this model can be approximated with DataRobot Prime
Returns: - prime_eligibility : dict
a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
-
request_approximation
()¶ Request an approximation of this model using DataRobot Prime
This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.
Returns: - job : Job
the job generating the rulesets
-
get_rulesets
()¶ List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.
Returns: - rulesets : list of Ruleset
-
download_export
(filepath)¶ Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.
This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.
Parameters: - filepath : str
The path at which to save the exported model file.
-
request_transferable_export
(prediction_intervals_size=None)¶ Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.
This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.
This function does not download the exported file. Use download_export for that.
Parameters: - prediction_intervals_size : int, optional
(New in v2.19) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Prediction intervals size must be between 1 and 100 (inclusive).
Examples
model = datarobot.Model.get('p-id', 'l-id') job = model.request_transferable_export() job.wait_for_completion() model.download_export('my_exported_model.drmodel') # Client must be configured to use standalone prediction server for import: datarobot.Client(token='my-token-at-standalone-server', endpoint='standalone-server-url/api/v2') imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
-
request_frozen_model
(sample_pct=None, training_row_count=None)¶ Train a new frozen model with parameters from this model
Note
This method only works if project the model belongs to is not datetime partitioned. If it is, use
request_frozen_datetime_model
instead.Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
Parameters: - sample_pct : float
optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.
- training_row_count : int
(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.
Returns: - model_job : ModelJob
the modeling job training a frozen model
-
request_frozen_datetime_model
(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None)¶ Train a new frozen model with parameters from this model
Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.
Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.
Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).
Parameters: - training_row_count : int, optional
the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
- training_duration : str, optional
a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
- training_start_date : datetime.datetime, optional
the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
- training_end_date : datetime.datetime, optional
the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
- time_window_sample_pct : int, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
Returns: - model_job : ModelJob
the modeling job training a frozen model
-
get_parameters
()¶ Retrieve model parameters.
Returns: - ModelParameters
Model parameters for this model.
-
get_lift_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model lift chart for the specified source.
Parameters: - source : str
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: - LiftChart
Model lift chart data
Raises: - ClientError
If the insight is not available for this model
-
get_multiclass_lift_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model lift chart for the specified source.
Parameters: - source : str
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: - list of LiftChart
Model lift chart data for each saved target class
Raises: - ClientError
If the insight is not available for this model
-
get_all_lift_charts
(fallback_to_parent_insights=False)¶ Retrieve a list of all lift charts available for the model.
Parameters: - fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of LiftChart
Data for all available model lift charts.
-
get_residuals_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model residuals chart for the specified source.
Parameters: - source : str
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
Returns: - ResidualsChart
Model residuals chart data
Raises: - ClientError
If the insight is not available for this model
-
get_all_residuals_charts
(fallback_to_parent_insights=False)¶ Retrieve a list of all lift charts available for the model.
Parameters: - fallback_to_parent_insights : bool
Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of ResidualsChart
Data for all available model residuals charts.
-
get_pareto_front
()¶ Retrieve the Pareto Front for a Eureqa model.
This method is only supported for Eureqa models.
Returns: - ParetoFront
Model ParetoFront data
-
get_confusion_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model’s confusion chart for the specified source.
Parameters: - source : str
Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: - ConfusionChart
Model ConfusionChart data
Raises: - ClientError
If the insight is not available for this model
-
get_all_confusion_charts
(fallback_to_parent_insights=False)¶ Retrieve a list of all confusion charts available for the model.
Parameters: - fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of ConfusionChart
Data for all available confusion charts for model.
-
get_roc_curve
(source, fallback_to_parent_insights=False)¶ Retrieve model ROC curve for the specified source.
Parameters: - source : str
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
Returns: - RocCurve
Model ROC curve data
Raises: - ClientError
If the insight is not available for this model
-
get_all_roc_curves
(fallback_to_parent_insights=False)¶ Retrieve a list of all ROC curves available for the model.
Parameters: - fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of RocCurve
Data for all available model ROC curves.
-
get_word_cloud
(exclude_stop_words=False)¶ Retrieve a word cloud data for the model.
Parameters: - exclude_stop_words : bool, optional
Set to True if you want stopwords filtered out of response.
Returns: - WordCloud
Word cloud data for the model.
-
download_scoring_code
(file_name, source_code=False)¶ Download scoring code JAR.
Parameters: - file_name : str
File path where scoring code will be saved.
- source_code : bool, optional
Set to True to download source code archive. It will not be executable.
-
get_model_blueprint_documents
()¶ Get documentation for tasks used in this model.
Returns: - list of BlueprintTaskDocument
All documents available for the model.
-
get_model_blueprint_chart
()¶ Retrieve a model blueprint chart that can be used to understand data flow in blueprint.
Returns: - ModelBlueprintChart
The queried model blueprint chart.
-
get_missing_report_info
()¶ Retrieve a model missing data report on training data that can be used to understand missing values treatment in a model. Report consists of missing values reports for features which took part in modelling and are numeric or categorical.
Returns: - An iterable of MissingReportPerFeature
The queried model missing report, sorted by missing count (DESCENDING order).
-
get_frozen_child_models
()¶ Retrieves the ids for all the models that are frozen from this model
Returns: - A list of Models
-
request_training_predictions
(data_subset)¶ Start a job to build training predictions
Parameters: - data_subset : str
data set definition to build predictions on. Choices are:
- dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
- models in datetime partitioned projects
- dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
- all data except training set. Not valid for models in datetime partitioned projects
- dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
- dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
- the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
Returns: - Job
an instance of created async job
-
cross_validate
()¶ Run Cross Validation on this model.
Note
To perform Cross Validation on a new model with new parameters, use
train
instead.Returns: - ModelJob
The created job to build the model
-
get_cross_validation_scores
(partition=None, metric=None)¶ Returns a dictionary keyed by metric showing cross validation scores per partition.
Cross Validation should already have been performed using
cross_validate
ortrain
.Note
Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.
Parameters: - partition : float
optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value.
- metric: unicode
optional name of the metric to filter to resulting cross validation scores by
Returns: - cross_validation_scores: dict
A dictionary keyed by metric showing cross validation scores per partition.
-
advanced_tune
(params, description=None)¶ Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.
Parameters: - params : dict
Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- description : unicode
Human-readable string describing the newly advanced-tuned model
Returns: - ModelJob
The created job to build the model
-
get_advanced_tuning_parameters
()¶ Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.
Returns: - dict
A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuningDescription and tuningParameters.
tuningDescription an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.
tuningParameters is a list of a dicts, each has the following keys
- parameterName : (unicode) name of the parameter (unique per task, see below)
- parameterId : (unicode) opaque ID string uniquely identifying parameter
- defaultValue : (*) default value of the parameter for the blueprint
- currentValue : (*) value of the parameter that was used for this model
- taskName : (unicode) name of the task that this parameter belongs to
- constraints: (dict) see the notes below
Notes
The type of defaultValue and currentValue is defined by the constraints structure. It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.
"constraints": { "select": { "values": [<list(basestring or number) : possible values>] }, "ascii": {}, "unicode": {}, "int": { "min": <int : minimum valid value>, "max": <int : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "float": { "min": <float : minimum valid value>, "max": <float : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "intList": { "length": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <int : minimum valid value>, "max_val": <int : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "floatList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <float : minimum valid value>, "max_val": <float : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> } }
The keys have meaning as follows:
- select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
- ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
- unicode: The parameter may be any Python unicode object.
- int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
- float: The value may be an object of type float within the specified range (inclusive).
- intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.
-
start_advanced_tuning_session
()¶ Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.
As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.
Returns: - AdvancedTuningSession
Session for setting up and running Advanced Tuning on a model
-
star_model
()¶ Mark the model as starred
Model stars propagate to the web application and the API, and can be used to filter when listing models.
-
unstar_model
()¶ Unmark the model as starred
Model stars propagate to the web application and the API, and can be used to filter when listing models.
-
set_prediction_threshold
(threshold)¶ Set a custom prediction threshold for the model
May not be used once
prediction_threshold_read_only
is True for this model.Parameters: - threshold : float
only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).
PrimeModel¶
-
class
datarobot.models.
PrimeModel
(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, parent_model_id=None, ruleset_id=None, rule_count=None, score=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None)¶ A DataRobot Prime model approximating a parent model with downloadable code
Attributes: - id : str
the id of the model
- project_id : str
the id of the project the model belongs to
- processes : list of str
the processes used by the model
- featurelist_name : str
the name of the featurelist used by the model
- featurelist_id : str
the id of the featurelist used by the model
- sample_pct : float
the percentage of the project dataset used in training the model
- training_row_count : int or None
the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
- training_duration : str or None
only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_date : datetime or None
only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_date : datetime or None
only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- model_type : str
what model this is, e.g. ‘DataRobot Prime’
- model_category : str
what kind of model this is - always ‘prime’ for DataRobot Prime models
- is_frozen : bool
whether this model is a frozen model
- blueprint_id : str
the id of the blueprint used in this model
- metrics : dict
a mapping from each metric to the model’s scores for that metric
- ruleset : Ruleset
the ruleset used in the Prime model
- parent_model_id : str
the id of the model that this Prime model approximates
- monotonic_increasing_featurelist_id : str
optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_id : str
optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraints : bool
optional, whether this model supports enforcing monotonic constraints
- is_starred : bool
whether this model is marked as starred
- prediction_threshold : float
for binary classification projects, the threshold used for predictions
- prediction_threshold_read_only : bool
indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
-
classmethod
get
(project_id, model_id)¶ Retrieve a specific prime model.
Parameters: - project_id : str
The id of the project the prime model belongs to
- model_id : str
The
model_id
of the prime model to retrieve.
Returns: - model : PrimeModel
The queried instance.
-
request_download_validation
(language)¶ Prep and validate the downloadable code for the ruleset associated with this model
Parameters: - language : str
the language the code should be downloaded in - see
datarobot.enums.PRIME_LANGUAGE
for available languages
Returns: - job : Job
A job tracking the code preparation and validation
-
advanced_tune
(params, description=None)¶ Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.
Parameters: - params : dict
Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- description : unicode
Human-readable string describing the newly advanced-tuned model
Returns: - ModelJob
The created job to build the model
-
cross_validate
()¶ Run Cross Validation on this model.
Note
To perform Cross Validation on a new model with new parameters, use
train
instead.Returns: - ModelJob
The created job to build the model
-
delete
()¶ Delete a model from the project’s leaderboard.
-
download_export
(filepath)¶ Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.
This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.
Parameters: - filepath : str
The path at which to save the exported model file.
-
download_scoring_code
(file_name, source_code=False)¶ Download scoring code JAR.
Parameters: - file_name : str
File path where scoring code will be saved.
- source_code : bool, optional
Set to True to download source code archive. It will not be executable.
-
classmethod
fetch_resource_data
(url, join_endpoint=True)¶ (Deprecated.) Used to acquire model data directly from its url.
Consider using get instead, as this is a convenience function used for development of datarobot
Parameters: - url : str
The resource we are acquiring
- join_endpoint : boolean, optional
Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint
Returns: - model_data : dict
The queried model’s data
-
get_advanced_tuning_parameters
()¶ Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.
Returns: - dict
A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuningDescription and tuningParameters.
tuningDescription an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.
tuningParameters is a list of a dicts, each has the following keys
- parameterName : (unicode) name of the parameter (unique per task, see below)
- parameterId : (unicode) opaque ID string uniquely identifying parameter
- defaultValue : (*) default value of the parameter for the blueprint
- currentValue : (*) value of the parameter that was used for this model
- taskName : (unicode) name of the task that this parameter belongs to
- constraints: (dict) see the notes below
Notes
The type of defaultValue and currentValue is defined by the constraints structure. It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.
"constraints": { "select": { "values": [<list(basestring or number) : possible values>] }, "ascii": {}, "unicode": {}, "int": { "min": <int : minimum valid value>, "max": <int : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "float": { "min": <float : minimum valid value>, "max": <float : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "intList": { "length": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <int : minimum valid value>, "max_val": <int : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "floatList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <float : minimum valid value>, "max_val": <float : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> } }
The keys have meaning as follows:
- select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
- ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
- unicode: The parameter may be any Python unicode object.
- int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
- float: The value may be an object of type float within the specified range (inclusive).
- intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.
-
get_all_confusion_charts
(fallback_to_parent_insights=False)¶ Retrieve a list of all confusion charts available for the model.
Parameters: - fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of ConfusionChart
Data for all available confusion charts for model.
-
get_all_lift_charts
(fallback_to_parent_insights=False)¶ Retrieve a list of all lift charts available for the model.
Parameters: - fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of LiftChart
Data for all available model lift charts.
-
get_all_residuals_charts
(fallback_to_parent_insights=False)¶ Retrieve a list of all lift charts available for the model.
Parameters: - fallback_to_parent_insights : bool
Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of ResidualsChart
Data for all available model residuals charts.
-
get_all_roc_curves
(fallback_to_parent_insights=False)¶ Retrieve a list of all ROC curves available for the model.
Parameters: - fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of RocCurve
Data for all available model ROC curves.
-
get_confusion_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model’s confusion chart for the specified source.
Parameters: - source : str
Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: - ConfusionChart
Model ConfusionChart data
Raises: - ClientError
If the insight is not available for this model
-
get_cross_validation_scores
(partition=None, metric=None)¶ Returns a dictionary keyed by metric showing cross validation scores per partition.
Cross Validation should already have been performed using
cross_validate
ortrain
.Note
Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.
Parameters: - partition : float
optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value.
- metric: unicode
optional name of the metric to filter to resulting cross validation scores by
Returns: - cross_validation_scores: dict
A dictionary keyed by metric showing cross validation scores per partition.
-
get_feature_effect
(source)¶ Retrieve Feature Effects for the model.
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the availiable sources.Parameters: - source : string
The source Feature Effects are retrieved for.
Returns: - feature_effects : FeatureEffects
The feature effects data.
Raises: - ClientError (404)
If the feature effects have not been computed or source is not valid value.
-
get_feature_effect_metadata
()¶ - Retrieve Feature Effect metadata. Response contains status and available model sources.
- Feature Fit of training is always available (except for the old project which supports only Feature Fit for validation).
- When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Effect is not available for validation or holdout.
- Feature Effect for holdout is not available when there is no holdout configured for the project.
source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.Returns: - feature_effect_metadata: FeatureEffectMetadata
-
get_feature_fit
(source)¶ Retrieve Feature Fit for the model.
Feature Fit provides partial dependence and predicted vs actual values for top-500 features ordered by feature importance score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Fit has already been computed with
request_feature_effect
.See
get_feature_fit_metadata
for retrieving information the availiable sources.Parameters: - source : string
The source Feature Fit are retrieved for. One value of [FeatureFitMetadata.sources].
Returns: - feature_fit : FeatureFit
The feature fit data.
Raises: - ClientError (404)
If the feature fit have not been computed or source is not valid value.
-
get_feature_fit_metadata
()¶ - Retrieve Feature Fit metadata. Response contains status and available model sources.
- Feature Fit of training is always available (except for the old project which supports only Feature Fit for validation).
- When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Fit is not available for validation or holdout.
- Feature Fit for holdout is not available when there is no holdout configured for the project.
source is expected parameter to retrieve Feature Fit. One of provided sources shall be used.Returns: - feature_effect_metadata: FeatureFitMetadata
-
get_feature_impact
()¶ Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.
Elsewhere this technique is sometimes called ‘Permutation Importance’.
Requires that Feature Impact has already been computed with
request_feature_impact
.Returns: - feature_impacts : list of dict
The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Raises: - ClientError (404)
If the feature impacts have not been computed.
-
get_features_used
()¶ Query the server to determine which features were used.
Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.
Returns: - features : list of str
The names of the features used in the model.
-
get_frozen_child_models
()¶ Retrieves the ids for all the models that are frozen from this model
Returns: - A list of Models
-
get_leaderboard_ui_permalink
()¶ Returns: - url : str
Permanent static hyperlink to this model at leaderboard.
-
get_lift_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model lift chart for the specified source.
Parameters: - source : str
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: - LiftChart
Model lift chart data
Raises: - ClientError
If the insight is not available for this model
-
get_missing_report_info
()¶ Retrieve a model missing data report on training data that can be used to understand missing values treatment in a model. Report consists of missing values reports for features which took part in modelling and are numeric or categorical.
Returns: - An iterable of MissingReportPerFeature
The queried model missing report, sorted by missing count (DESCENDING order).
-
get_model_blueprint_chart
()¶ Retrieve a model blueprint chart that can be used to understand data flow in blueprint.
Returns: - ModelBlueprintChart
The queried model blueprint chart.
-
get_model_blueprint_documents
()¶ Get documentation for tasks used in this model.
Returns: - list of BlueprintTaskDocument
All documents available for the model.
-
get_multiclass_feature_impact
()¶ For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.
Requires that Feature Impact has already been computed with
request_feature_impact
.Returns: - feature_impacts : list of dict
The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Raises: - ClientError (404)
If the multiclass feature impacts have not been computed.
-
get_multiclass_lift_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model lift chart for the specified source.
Parameters: - source : str
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: - list of LiftChart
Model lift chart data for each saved target class
Raises: - ClientError
If the insight is not available for this model
-
get_or_request_feature_effect
(source, max_wait=600)¶ Retrieve feature effect for the model, requesting a job if it hasn’t been run previously
See
get_feature_effect_metadata
for retrieving information of source.Parameters: - max_wait : int, optional
The maximum time to wait for a requested feature effect job to complete before erroring
- source : string
The source Feature Effects are retrieved for.
Returns: - feature_effects : FeatureEffects
The feature effects data.
-
get_or_request_feature_fit
(source, max_wait=600)¶ Retrieve feature fit for the model, requesting a job if it hasn’t been run previously
See
get_feature_fit_metadata
for retrieving information of source.Parameters: - max_wait : int, optional
The maximum time to wait for a requested feature fit job to complete before erroring
- source : string
The source Feature Fit are retrieved for. One value of [FeatureFitMetadata.sources].
Returns: - feature_effects : FeatureFit
The feature fit data.
-
get_or_request_feature_impact
(max_wait=600)¶ Retrieve feature impact for the model, requesting a job if it hasn’t been run previously
Parameters: - max_wait : int, optional
The maximum time to wait for a requested feature impact job to complete before erroring
Returns: - feature_impacts : list of dict
The feature impact data. See
get_feature_impact
for the exact schema.
-
get_parameters
()¶ Retrieve model parameters.
Returns: - ModelParameters
Model parameters for this model.
-
get_pareto_front
()¶ Retrieve the Pareto Front for a Eureqa model.
This method is only supported for Eureqa models.
Returns: - ParetoFront
Model ParetoFront data
-
get_prime_eligibility
()¶ Check if this model can be approximated with DataRobot Prime
Returns: - prime_eligibility : dict
a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
-
get_residuals_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model residuals chart for the specified source.
Parameters: - source : str
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
Returns: - ResidualsChart
Model residuals chart data
Raises: - ClientError
If the insight is not available for this model
-
get_roc_curve
(source, fallback_to_parent_insights=False)¶ Retrieve model ROC curve for the specified source.
Parameters: - source : str
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
Returns: - RocCurve
Model ROC curve data
Raises: - ClientError
If the insight is not available for this model
-
get_rulesets
()¶ List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.
Returns: - rulesets : list of Ruleset
-
get_supported_capabilities
()¶ Retrieves a summary of the capabilities supported by a model.
New in version v2.14.
Returns: - supportsBlending: bool
whether the model supports blending
- supportsMonotonicConstraints: bool
whether the model supports monotonic constraints
- hasWordCloud: bool
whether the model has word cloud data available
- eligibleForPrime: bool
whether the model is eligible for Prime
- hasParameters: bool
whether the model has parameters that can be retrieved
- supportsCodeGeneration: bool
(New in version v2.18) whether the model supports code generation
-
get_word_cloud
(exclude_stop_words=False)¶ Retrieve a word cloud data for the model.
Parameters: - exclude_stop_words : bool, optional
Set to True if you want stopwords filtered out of response.
Returns: - WordCloud
Word cloud data for the model.
-
open_model_browser
()¶ Opens model at project leaderboard in web browser.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
-
request_feature_effect
()¶ Request feature effects to be computed for the model.
See
get_feature_effect
for more information on the result of the job.Returns: - job : Job
A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
Raises: - JobAlreadyRequested (422)
If the feature effect have already been requested.
-
request_feature_fit
()¶ Request feature fit to be computed for the model.
See
get_feature_effect
for more information on the result of the job.Returns: - job : Job
A Job representing the feature fit computation. To get the completed feature fit data, use job.get_result or job.get_result_when_complete.
Raises: - JobAlreadyRequested (422)
If the feature effect have already been requested.
-
request_feature_impact
()¶ Request feature impacts to be computed for the model.
See
get_feature_impact
for more information on the result of the job.Returns: - job : Job
A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.
Raises: - JobAlreadyRequested (422)
If the feature impacts have already been requested.
-
request_predictions
(dataset_id, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None)¶ Request predictions against a previously uploaded dataset
Parameters: - dataset_id : string
The dataset to make predictions against (as uploaded from Project.upload_dataset)
- include_prediction_intervals : bool, optional
(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_size : int, optional
(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_point : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_date : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_date : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.
Returns: - job : PredictJob
The job computing the predictions
-
request_training_predictions
(data_subset)¶ Start a job to build training predictions
Parameters: - data_subset : str
data set definition to build predictions on. Choices are:
- dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
- models in datetime partitioned projects
- dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
- all data except training set. Not valid for models in datetime partitioned projects
- dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
- dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
- the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
Returns: - Job
an instance of created async job
-
request_transferable_export
(prediction_intervals_size=None)¶ Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.
This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.
This function does not download the exported file. Use download_export for that.
Parameters: - prediction_intervals_size : int, optional
(New in v2.19) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Prediction intervals size must be between 1 and 100 (inclusive).
Examples
model = datarobot.Model.get('p-id', 'l-id') job = model.request_transferable_export() job.wait_for_completion() model.download_export('my_exported_model.drmodel') # Client must be configured to use standalone prediction server for import: datarobot.Client(token='my-token-at-standalone-server', endpoint='standalone-server-url/api/v2') imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
-
retrain
(sample_pct=None, featurelist_id=None, training_row_count=None)¶ Submit a job to the queue to train a blender model.
Parameters: - sample_pct: str, optional
The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
- featurelist_id : str, optional
The featurelist id
- training_row_count : str, optional
The number of rows to train the model. If this parameter is used then sample_pct should not be given.
Returns: - job : ModelJob
The created job that is retraining the model
-
set_prediction_threshold
(threshold)¶ Set a custom prediction threshold for the model
May not be used once
prediction_threshold_read_only
is True for this model.Parameters: - threshold : float
only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).
-
star_model
()¶ Mark the model as starred
Model stars propagate to the web application and the API, and can be used to filter when listing models.
-
start_advanced_tuning_session
()¶ Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.
As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.
Returns: - AdvancedTuningSession
Session for setting up and running Advanced Tuning on a model
-
unstar_model
()¶ Unmark the model as starred
Model stars propagate to the web application and the API, and can be used to filter when listing models.
BlenderModel¶
-
class
datarobot.models.
BlenderModel
(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, model_ids=None, blender_method=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None)¶ Blender model that combines prediction results from other models.
Attributes: - id : str
the id of the model
- project_id : str
the id of the project the model belongs to
- processes : list of str
the processes used by the model
- featurelist_name : str
the name of the featurelist used by the model
- featurelist_id : str
the id of the featurelist used by the model
- sample_pct : float
the percentage of the project dataset used in training the model
- training_row_count : int or None
the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
- training_duration : str or None
only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_date : datetime or None
only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_date : datetime or None
only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- model_type : str
what model this is, e.g. ‘DataRobot Prime’
- model_category : str
what kind of model this is - always ‘prime’ for DataRobot Prime models
- is_frozen : bool
whether this model is a frozen model
- blueprint_id : str
the id of the blueprint used in this model
- metrics : dict
a mapping from each metric to the model’s scores for that metric
- model_ids : list of str
List of model ids used in blender
- blender_method : str
Method used to blend results from underlying models
- monotonic_increasing_featurelist_id : str
optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_id : str
optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraints : bool
optional, whether this model supports enforcing monotonic constraints
- is_starred : bool
whether this model marked as starred
- prediction_threshold : float
for binary classification projects, the threshold used for predictions
- prediction_threshold_read_only : bool
indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- model_number : integer
model number assigned to a model
- parent_model_id : str or None
(New in version v2.20) the id of the model that tuning parameters are derived from
-
classmethod
get
(project_id, model_id)¶ Retrieve a specific blender.
Parameters: - project_id : str
The project’s id.
- model_id : str
The
model_id
of the leaderboard item to retrieve.
Returns: - model : BlenderModel
The queried instance.
-
advanced_tune
(params, description=None)¶ Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.
Parameters: - params : dict
Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- description : unicode
Human-readable string describing the newly advanced-tuned model
Returns: - ModelJob
The created job to build the model
-
cross_validate
()¶ Run Cross Validation on this model.
Note
To perform Cross Validation on a new model with new parameters, use
train
instead.Returns: - ModelJob
The created job to build the model
-
delete
()¶ Delete a model from the project’s leaderboard.
-
download_export
(filepath)¶ Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.
This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.
Parameters: - filepath : str
The path at which to save the exported model file.
-
download_scoring_code
(file_name, source_code=False)¶ Download scoring code JAR.
Parameters: - file_name : str
File path where scoring code will be saved.
- source_code : bool, optional
Set to True to download source code archive. It will not be executable.
-
classmethod
fetch_resource_data
(url, join_endpoint=True)¶ (Deprecated.) Used to acquire model data directly from its url.
Consider using get instead, as this is a convenience function used for development of datarobot
Parameters: - url : str
The resource we are acquiring
- join_endpoint : boolean, optional
Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint
Returns: - model_data : dict
The queried model’s data
-
get_advanced_tuning_parameters
()¶ Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.
Returns: - dict
A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuningDescription and tuningParameters.
tuningDescription an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.
tuningParameters is a list of a dicts, each has the following keys
- parameterName : (unicode) name of the parameter (unique per task, see below)
- parameterId : (unicode) opaque ID string uniquely identifying parameter
- defaultValue : (*) default value of the parameter for the blueprint
- currentValue : (*) value of the parameter that was used for this model
- taskName : (unicode) name of the task that this parameter belongs to
- constraints: (dict) see the notes below
Notes
The type of defaultValue and currentValue is defined by the constraints structure. It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.
"constraints": { "select": { "values": [<list(basestring or number) : possible values>] }, "ascii": {}, "unicode": {}, "int": { "min": <int : minimum valid value>, "max": <int : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "float": { "min": <float : minimum valid value>, "max": <float : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "intList": { "length": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <int : minimum valid value>, "max_val": <int : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "floatList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <float : minimum valid value>, "max_val": <float : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> } }
The keys have meaning as follows:
- select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
- ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
- unicode: The parameter may be any Python unicode object.
- int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
- float: The value may be an object of type float within the specified range (inclusive).
- intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.
-
get_all_confusion_charts
(fallback_to_parent_insights=False)¶ Retrieve a list of all confusion charts available for the model.
Parameters: - fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of ConfusionChart
Data for all available confusion charts for model.
-
get_all_lift_charts
(fallback_to_parent_insights=False)¶ Retrieve a list of all lift charts available for the model.
Parameters: - fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of LiftChart
Data for all available model lift charts.
-
get_all_residuals_charts
(fallback_to_parent_insights=False)¶ Retrieve a list of all lift charts available for the model.
Parameters: - fallback_to_parent_insights : bool
Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of ResidualsChart
Data for all available model residuals charts.
-
get_all_roc_curves
(fallback_to_parent_insights=False)¶ Retrieve a list of all ROC curves available for the model.
Parameters: - fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of RocCurve
Data for all available model ROC curves.
-
get_confusion_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model’s confusion chart for the specified source.
Parameters: - source : str
Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: - ConfusionChart
Model ConfusionChart data
Raises: - ClientError
If the insight is not available for this model
-
get_cross_validation_scores
(partition=None, metric=None)¶ Returns a dictionary keyed by metric showing cross validation scores per partition.
Cross Validation should already have been performed using
cross_validate
ortrain
.Note
Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.
Parameters: - partition : float
optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value.
- metric: unicode
optional name of the metric to filter to resulting cross validation scores by
Returns: - cross_validation_scores: dict
A dictionary keyed by metric showing cross validation scores per partition.
-
get_feature_effect
(source)¶ Retrieve Feature Effects for the model.
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the availiable sources.Parameters: - source : string
The source Feature Effects are retrieved for.
Returns: - feature_effects : FeatureEffects
The feature effects data.
Raises: - ClientError (404)
If the feature effects have not been computed or source is not valid value.
-
get_feature_effect_metadata
()¶ - Retrieve Feature Effect metadata. Response contains status and available model sources.
- Feature Fit of training is always available (except for the old project which supports only Feature Fit for validation).
- When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Effect is not available for validation or holdout.
- Feature Effect for holdout is not available when there is no holdout configured for the project.
source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.Returns: - feature_effect_metadata: FeatureEffectMetadata
-
get_feature_fit
(source)¶ Retrieve Feature Fit for the model.
Feature Fit provides partial dependence and predicted vs actual values for top-500 features ordered by feature importance score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Fit has already been computed with
request_feature_effect
.See
get_feature_fit_metadata
for retrieving information the availiable sources.Parameters: - source : string
The source Feature Fit are retrieved for. One value of [FeatureFitMetadata.sources].
Returns: - feature_fit : FeatureFit
The feature fit data.
Raises: - ClientError (404)
If the feature fit have not been computed or source is not valid value.
-
get_feature_fit_metadata
()¶ - Retrieve Feature Fit metadata. Response contains status and available model sources.
- Feature Fit of training is always available (except for the old project which supports only Feature Fit for validation).
- When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Fit is not available for validation or holdout.
- Feature Fit for holdout is not available when there is no holdout configured for the project.
source is expected parameter to retrieve Feature Fit. One of provided sources shall be used.Returns: - feature_effect_metadata: FeatureFitMetadata
-
get_feature_impact
()¶ Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.
Elsewhere this technique is sometimes called ‘Permutation Importance’.
Requires that Feature Impact has already been computed with
request_feature_impact
.Returns: - feature_impacts : list of dict
The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Raises: - ClientError (404)
If the feature impacts have not been computed.
-
get_features_used
()¶ Query the server to determine which features were used.
Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.
Returns: - features : list of str
The names of the features used in the model.
-
get_frozen_child_models
()¶ Retrieves the ids for all the models that are frozen from this model
Returns: - A list of Models
-
get_leaderboard_ui_permalink
()¶ Returns: - url : str
Permanent static hyperlink to this model at leaderboard.
-
get_lift_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model lift chart for the specified source.
Parameters: - source : str
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: - LiftChart
Model lift chart data
Raises: - ClientError
If the insight is not available for this model
-
get_missing_report_info
()¶ Retrieve a model missing data report on training data that can be used to understand missing values treatment in a model. Report consists of missing values reports for features which took part in modelling and are numeric or categorical.
Returns: - An iterable of MissingReportPerFeature
The queried model missing report, sorted by missing count (DESCENDING order).
-
get_model_blueprint_chart
()¶ Retrieve a model blueprint chart that can be used to understand data flow in blueprint.
Returns: - ModelBlueprintChart
The queried model blueprint chart.
-
get_model_blueprint_documents
()¶ Get documentation for tasks used in this model.
Returns: - list of BlueprintTaskDocument
All documents available for the model.
-
get_multiclass_feature_impact
()¶ For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.
Requires that Feature Impact has already been computed with
request_feature_impact
.Returns: - feature_impacts : list of dict
The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Raises: - ClientError (404)
If the multiclass feature impacts have not been computed.
-
get_multiclass_lift_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model lift chart for the specified source.
Parameters: - source : str
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: - list of LiftChart
Model lift chart data for each saved target class
Raises: - ClientError
If the insight is not available for this model
-
get_or_request_feature_effect
(source, max_wait=600)¶ Retrieve feature effect for the model, requesting a job if it hasn’t been run previously
See
get_feature_effect_metadata
for retrieving information of source.Parameters: - max_wait : int, optional
The maximum time to wait for a requested feature effect job to complete before erroring
- source : string
The source Feature Effects are retrieved for.
Returns: - feature_effects : FeatureEffects
The feature effects data.
-
get_or_request_feature_fit
(source, max_wait=600)¶ Retrieve feature fit for the model, requesting a job if it hasn’t been run previously
See
get_feature_fit_metadata
for retrieving information of source.Parameters: - max_wait : int, optional
The maximum time to wait for a requested feature fit job to complete before erroring
- source : string
The source Feature Fit are retrieved for. One value of [FeatureFitMetadata.sources].
Returns: - feature_effects : FeatureFit
The feature fit data.
-
get_or_request_feature_impact
(max_wait=600)¶ Retrieve feature impact for the model, requesting a job if it hasn’t been run previously
Parameters: - max_wait : int, optional
The maximum time to wait for a requested feature impact job to complete before erroring
Returns: - feature_impacts : list of dict
The feature impact data. See
get_feature_impact
for the exact schema.
-
get_parameters
()¶ Retrieve model parameters.
Returns: - ModelParameters
Model parameters for this model.
-
get_pareto_front
()¶ Retrieve the Pareto Front for a Eureqa model.
This method is only supported for Eureqa models.
Returns: - ParetoFront
Model ParetoFront data
-
get_prime_eligibility
()¶ Check if this model can be approximated with DataRobot Prime
Returns: - prime_eligibility : dict
a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
-
get_residuals_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model residuals chart for the specified source.
Parameters: - source : str
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
Returns: - ResidualsChart
Model residuals chart data
Raises: - ClientError
If the insight is not available for this model
-
get_roc_curve
(source, fallback_to_parent_insights=False)¶ Retrieve model ROC curve for the specified source.
Parameters: - source : str
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
Returns: - RocCurve
Model ROC curve data
Raises: - ClientError
If the insight is not available for this model
-
get_rulesets
()¶ List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.
Returns: - rulesets : list of Ruleset
-
get_supported_capabilities
()¶ Retrieves a summary of the capabilities supported by a model.
New in version v2.14.
Returns: - supportsBlending: bool
whether the model supports blending
- supportsMonotonicConstraints: bool
whether the model supports monotonic constraints
- hasWordCloud: bool
whether the model has word cloud data available
- eligibleForPrime: bool
whether the model is eligible for Prime
- hasParameters: bool
whether the model has parameters that can be retrieved
- supportsCodeGeneration: bool
(New in version v2.18) whether the model supports code generation
-
get_word_cloud
(exclude_stop_words=False)¶ Retrieve a word cloud data for the model.
Parameters: - exclude_stop_words : bool, optional
Set to True if you want stopwords filtered out of response.
Returns: - WordCloud
Word cloud data for the model.
-
open_model_browser
()¶ Opens model at project leaderboard in web browser.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
-
request_approximation
()¶ Request an approximation of this model using DataRobot Prime
This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.
Returns: - job : Job
the job generating the rulesets
-
request_feature_effect
()¶ Request feature effects to be computed for the model.
See
get_feature_effect
for more information on the result of the job.Returns: - job : Job
A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
Raises: - JobAlreadyRequested (422)
If the feature effect have already been requested.
-
request_feature_fit
()¶ Request feature fit to be computed for the model.
See
get_feature_effect
for more information on the result of the job.Returns: - job : Job
A Job representing the feature fit computation. To get the completed feature fit data, use job.get_result or job.get_result_when_complete.
Raises: - JobAlreadyRequested (422)
If the feature effect have already been requested.
-
request_feature_impact
()¶ Request feature impacts to be computed for the model.
See
get_feature_impact
for more information on the result of the job.Returns: - job : Job
A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.
Raises: - JobAlreadyRequested (422)
If the feature impacts have already been requested.
-
request_frozen_datetime_model
(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None)¶ Train a new frozen model with parameters from this model
Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.
Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.
Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).
Parameters: - training_row_count : int, optional
the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
- training_duration : str, optional
a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
- training_start_date : datetime.datetime, optional
the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
- training_end_date : datetime.datetime, optional
the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
- time_window_sample_pct : int, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
Returns: - model_job : ModelJob
the modeling job training a frozen model
-
request_frozen_model
(sample_pct=None, training_row_count=None)¶ Train a new frozen model with parameters from this model
Note
This method only works if project the model belongs to is not datetime partitioned. If it is, use
request_frozen_datetime_model
instead.Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
Parameters: - sample_pct : float
optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.
- training_row_count : int
(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.
Returns: - model_job : ModelJob
the modeling job training a frozen model
-
request_predictions
(dataset_id, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None)¶ Request predictions against a previously uploaded dataset
Parameters: - dataset_id : string
The dataset to make predictions against (as uploaded from Project.upload_dataset)
- include_prediction_intervals : bool, optional
(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_size : int, optional
(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_point : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_date : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_date : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.
Returns: - job : PredictJob
The job computing the predictions
-
request_training_predictions
(data_subset)¶ Start a job to build training predictions
Parameters: - data_subset : str
data set definition to build predictions on. Choices are:
- dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
- models in datetime partitioned projects
- dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
- all data except training set. Not valid for models in datetime partitioned projects
- dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
- dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
- the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
Returns: - Job
an instance of created async job
-
request_transferable_export
(prediction_intervals_size=None)¶ Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.
This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.
This function does not download the exported file. Use download_export for that.
Parameters: - prediction_intervals_size : int, optional
(New in v2.19) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Prediction intervals size must be between 1 and 100 (inclusive).
Examples
model = datarobot.Model.get('p-id', 'l-id') job = model.request_transferable_export() job.wait_for_completion() model.download_export('my_exported_model.drmodel') # Client must be configured to use standalone prediction server for import: datarobot.Client(token='my-token-at-standalone-server', endpoint='standalone-server-url/api/v2') imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
-
retrain
(sample_pct=None, featurelist_id=None, training_row_count=None)¶ Submit a job to the queue to train a blender model.
Parameters: - sample_pct: str, optional
The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
- featurelist_id : str, optional
The featurelist id
- training_row_count : str, optional
The number of rows to train the model. If this parameter is used then sample_pct should not be given.
Returns: - job : ModelJob
The created job that is retraining the model
-
set_prediction_threshold
(threshold)¶ Set a custom prediction threshold for the model
May not be used once
prediction_threshold_read_only
is True for this model.Parameters: - threshold : float
only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).
-
star_model
()¶ Mark the model as starred
Model stars propagate to the web application and the API, and can be used to filter when listing models.
-
start_advanced_tuning_session
()¶ Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.
As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.
Returns: - AdvancedTuningSession
Session for setting up and running Advanced Tuning on a model
-
train
(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)¶ Train the blueprint used in model on a particular featurelist or amount of data.
This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.
Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.
In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.
Note
For datetime partitioned projects, see
train_datetime
instead.Parameters: - sample_pct : float, optional
The amount of data to use for training, as a percentage of the project dataset from 0 to 100.
- featurelist_id : str, optional
The identifier of the featurelist to use. If not defined, the featurelist of this model is used.
- scoring_type : str, optional
Either
SCORING_TYPE.validation
orSCORING_TYPE.cross_validation
.SCORING_TYPE.validation
is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning,SCORING_TYPE.cross_validation
can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.- training_row_count : int, optional
The number of rows to use to train the requested model.
- monotonic_increasing_featurelist_id : str
(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_id : str
(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.
Returns: - model_job_id : str
id of created job, can be used as parameter to
ModelJob.get
method orwait_for_async_model_creation
function
Examples
project = Project.get('p-id') model = Model.get('p-id', 'l-id') model_job_id = model.train(training_row_count=project.max_train_rows)
-
train_datetime
(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False)¶ Train this model on a different featurelist or amount of data
Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.
Parameters: - featurelist_id : str, optional
the featurelist to use to train the model. If not specified, the featurelist of this model is used.
- training_row_count : int, optional
the number of rows of data that should be used to train the model. If specified, neither
training_duration
noruse_project_settings
may be specified.- training_duration : str, optional
a duration string specifying what time range the data used to train the model should span. If specified, neither
training_row_count
noruse_project_settings
may be specified.- use_project_settings : bool, optional
(New in version v2.20) defaults to
False
. IfTrue
, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neithertraining_row_count
nortraining_duration
may be specified.- time_window_sample_pct : int, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- monotonic_increasing_featurelist_id : str, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_id : str, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.
Returns: - job : ModelJob
the created job to build the model
-
unstar_model
()¶ Unmark the model as starred
Model stars propagate to the web application and the API, and can be used to filter when listing models.
DatetimeModel¶
-
class
datarobot.models.
DatetimeModel
(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, training_info=None, holdout_score=None, holdout_status=None, data_selection_method=None, backtests=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, effective_feature_derivation_window_start=None, effective_feature_derivation_window_end=None, forecast_window_start=None, forecast_window_end=None, windows_basis_unit=None, model_number=None, parent_model_id=None, use_project_settings=None)¶ A model from a datetime partitioned project
Only one of training_row_count, training_duration, and training_start_date and training_end_date will be specified, depending on the data_selection_method of the model. Whichever method was selected determines the amount of data used to train on when making predictions and scoring the backtests and the holdout.
Attributes: - id : str
the id of the model
- project_id : str
the id of the project the model belongs to
- processes : list of str
the processes used by the model
- featurelist_name : str
the name of the featurelist used by the model
- featurelist_id : str
the id of the featurelist used by the model
- sample_pct : float
the percentage of the project dataset used in training the model
- training_row_count : int or None
If specified, an int specifying the number of rows used to train the model and evaluate backtest scores.
- training_duration : str or None
If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_date : datetime or None
only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_date : datetime or None
only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- time_window_sample_pct : int or None
An integer between 1 and 99 indicating the percentage of sampling within the training window. The points kept are determined by a random uniform sample. If not specified, no sampling was done.
- model_type : str
what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
- model_category : str
what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
- is_frozen : bool
whether this model is a frozen model
- blueprint_id : str
the id of the blueprint used in this model
- metrics : dict
a mapping from each metric to the model’s scores for that metric. The keys in metrics are the different metrics used to evaluate the model, and the values are the results. The dictionaries inside of metrics will be as described here: ‘validation’, the score for a single backtest; ‘crossValidation’, always None; ‘backtesting’, the average score for all backtests if all are available and computed, or None otherwise; ‘backtestingScores’, a list of scores for all backtests where the score is None if that backtest does not have a score available; and ‘holdout’, the score for the holdout or None if the holdout is locked or the score is unavailable.
- backtests : list of dict
describes what data was used to fit each backtest, the score for the project metric, and why the backtest score is unavailable if it is not provided.
- data_selection_method : str
which of training_row_count, training_duration, or training_start_data and training_end_date were used to determine the data used to fit the model. One of ‘rowCount’, ‘duration’, or ‘selectedDateRange’.
- training_info : dict
describes which data was used to train on when scoring the holdout and making predictions. training_info` will have the following keys: holdout_training_start_date, holdout_training_duration, holdout_training_row_count, holdout_training_end_date, prediction_training_start_date, prediction_training_duration, prediction_training_row_count, prediction_training_end_date. Start and end dates will be datetimes, durations will be duration strings, and rows will be integers.
- holdout_score : float or None
the score against the holdout, if available and the holdout is unlocked, according to the project metric.
- holdout_status : string or None
the status of the holdout score, e.g. “COMPLETED”, “HOLDOUT_BOUNDARIES_EXCEEDED”. Unavailable if the holdout fold was disabled in the partitioning configuration.
- monotonic_increasing_featurelist_id : str
optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_id : str
optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraints : bool
optional, whether this model supports enforcing monotonic constraints
- is_starred : bool
whether this model marked as starred
- prediction_threshold : float
for binary classification projects, the threshold used for predictions
- prediction_threshold_read_only : bool
indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- effective_feature_derivation_window_start : int or None
(New in v2.16) For time series projects only. How many units of the
windows_basis_unit
into the past relative to the forecast point the user needs to provide history for at prediction time. This can differ from thefeature_derivation_window_start
set on the project due to the differencing method and period selected, or if the model is a time series native model such as ARIMA. Will be a negative integer in time series projects andNone
otherwise.- effective_feature_derivation_window_end : int or None
(New in v2.16) For time series projects only. How many units of the
windows_basis_unit
into the past relative to the forecast point the feature derivation window should end. Will be a non-positive integer in time series projects andNone
otherwise.- forecast_window_start : int or None
(New in v2.16) For time series projects only. How many units of the
windows_basis_unit
into the future relative to the forecast point the forecast window should start. Note that this field will be the same as what is shown in the project settings. Will be a non-negative integer in time series projects and None otherwise.- forecast_window_end : int or None
(New in v2.16) For time series projects only. How many units of the
windows_basis_unit
into the future relative to the forecast point the forecast window should end. Note that this field will be the same as what is shown in the project settings. Will be a non-negative integer in time series projects and None otherwise.- windows_basis_unit : str or None
(New in v2.16) For time series projects only. Indicates which unit is the basis for the feature derivation window and the forecast window. Note that this field will be the same as what is shown in the project settings. In time series projects, will be either the detected time unit or “ROW”, and None otherwise.
- model_number : integer
model number assigned to a model
- parent_model_id : str or None
(New in version v2.20) the id of the model that tuning parameters are derived from
- use_project_settings : bool or None
(New in version v2.20) If
True
, indicates that the custom backtest partitioning settings specified by the user were used to train the model and evaluate backtest scores.
-
classmethod
get
(project, model_id)¶ Retrieve a specific datetime model
If the project does not use datetime partitioning, a ClientError will occur.
Parameters: - project : str
the id of the project the model belongs to
- model_id : str
the id of the model to retrieve
Returns: - model : DatetimeModel
the model
-
score_backtests
()¶ Compute the scores for all available backtests
Some backtests may be unavailable if the model is trained into their validation data.
Returns: - job : Job
a job tracking the backtest computation. When it is complete, all available backtests will have scores computed.
-
cross_validate
()¶ Inherited from Model - DatetimeModels cannot request Cross Validation,
Use score_backtests instead.
-
get_cross_validation_scores
(partition=None, metric=None)¶ Inherited from Model - DatetimeModels cannot request Cross Validation scores,
Use
backtests
instead.
-
request_training_predictions
(data_subset)¶ Start a job to build training predictions
Parameters: - data_subset : str
data set definition to build predictions on. Choices are:
- dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only
- dr.enums.DATA_SUBSET.ALL_BACKTESTS for downloading the predictions for all
- backtest validation folds. Requires the model to have successfully scored all backtests.
- Returns
- ——-
- Job
an instance of created async job
-
get_series_accuracy_as_dataframe
(offset=0, limit=100, metric=None, multiseries_value=None, order_by=None, reverse=False)¶ Retrieve the Series Accuracy for the specified model as a pandas.DataFrame.
Parameters: - offset : int, optional
The number of results to skip. Defaults to 0 if not specified.
- limit : int, optional
The maximum number of results to return. Defaults to 100 if not specified.
- metric : str, optional
The name of the metric to retrieve scores for. If omitted, the default project metric will be used.
- multiseries_value : str, optional
If specified, only the series containing the given value in one of the series ID columns will be returned.
- order_by : str, optional
Used for sorting the series. Attribute must be one of
datarobot.enums.SERIES_ACCURACY_ORDER_BY
.- reverse : bool, optional
Used for sorting the series. If
True
, will sort the series in descending order by the attribute specified byorder_by
.
Returns: - data
A pandas.DataFrame with the Series Accuracy for the specified model.
-
download_series_accuracy_as_csv
(filename, encoding='utf-8', offset=0, limit=100, metric=None, multiseries_value=None, order_by=None, reverse=False)¶ Save the Series Accuracy for the specified model into a csv file.
Parameters: - filename : str or file object
The path or file object to save the data to.
- encoding : str, optional
A string representing the encoding to use in the output csv file. Defaults to ‘utf-8’.
- offset : int, optional
The number of results to skip. Defaults to 0 if not specified.
- limit : int, optional
The maximum number of results to return. Defaults to 100 if not specified.
- metric : str, optional
The name of the metric to retrieve scores for. If omitted, the default project metric will be used.
- multiseries_value : str, optional
If specified, only the series containing the given value in one of the series ID columns will be returned.
- order_by : str, optional
Used for sorting the series. Attribute must be one of
datarobot.enums.SERIES_ACCURACY_ORDER_BY
.- reverse : bool, optional
Used for sorting the series. If
True
, will sort the series in descending order by the attribute specified byorder_by
.
-
compute_series_accuracy
()¶ Compute the Series Accuracy for this model
Returns: - Job
an instance of the created async job
-
retrain
(time_window_sample_pct=None, featurelist_id=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None)¶ Submit a job to the queue to train a blender model.
Parameters: - featurelist_id : str, optional
The featurelist id
- training_row_count : str, optional
The number of rows to train the model. If this parameter is used then sample_pct should not be given.
- time_window_sample_pct : int, optional
An int between 1 and 99 indicating the percentage of sampling within the time window. The points kept are determined by a random uniform sample. If specified, training_row_count must not be specified and training_duration or training_start_date and training_end_date must be specified.
- training_duration : str, optional
A duration string representing the training duration for the submitted model. If specified then training_row_count must not be specified.
- training_start_date : str, optional
A datetime string representing the start date of the data to use for training this model. If specified, training_end_date must also be specified. The value must be before the training_end_date value.
- training_end_date : str, optional
A datetime string representing the end date of the data to use for training this model. If specified, training_start_date must also be specified. The value must be after the training_start_date value.
Returns: - job : ModelJob
The created job that is retraining the model
-
get_feature_effect_metadata
()¶ Retrieve Feature Effect metadata for each backtest. Response contains status and available sources for each backtest of the model.
- Each backtest is available for training and validation
- If holdout is configured for the project it has holdout as backtestIndex. It has training and holdout sources available.
Start/stop models contain a single response item with startstop value for backtestIndex.
- Feature Effect of training is always available (except for the old project which supports only Feature Effect for validation).
- When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Effect is not available for validation or holdout.
- Feature Effect for holdout is not available when there is no holdout configured for the project.
source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.
backtestIndex is expected parameter to submit compute request and retrieve Feature Effect. One of provided backtest indexes shall be used.
Returns: - feature_effect_metadata: FeatureEffectMetadataDatetime
-
get_feature_fit_metadata
()¶ Retrieve Feature Fit metadata for each backtest. Response contains status and available sources for each backtest of the model.
- Each backtest is available for training and validation
- If holdout is configured for the project it has holdout as backtestIndex. It has training and holdout sources available.
Start/stop models contain a single response item with startstop value for backtestIndex.
- Feature Fit of training is always available (except for the old project which supports only Feature Effect for validation).
- When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Fit is not available for validation or holdout.
- Feature Fit for holdout is not available when there is no holdout configured for the project.
source is expected parameter to retrieve Feature Fit. One of provided sources shall be used.
backtestIndex is expected parameter to submit compute request and retrieve Feature Fit. One of provided backtest indexes shall be used.
Returns: - feature_effect_metadata: FeatureFitMetadataDatetime
-
request_feature_effect
(backtest_index)¶ Request feature effects to be computed for the model.
See
get_feature_effect
for more information on the result of the job.See
get_feature_effect_metadata
for retrieving information of backtest_index.Parameters: - backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.
The backtest index to retrieve Feature Effects for.
Returns: - job : Job
A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
Raises: - JobAlreadyRequested (422)
If the feature effect have already been requested.
-
get_feature_effect
(source, backtest_index)¶ Retrieve Feature Effects for the model.
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information of source, backtest_index.Parameters: - source: string
The source Feature Effects are retrieved for. One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the availiable sources for feature effect.
- backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.
The backtest index to retrieve Feature Effects for.
Returns: - feature_effects: FeatureEffects
The feature effects data.
Raises: - ClientError (404)
If the feature effects have not been computed or source is not valid value.
-
get_or_request_feature_effect
(source, backtest_index, max_wait=600)¶ Retrieve feature effect for the model, requesting a job if it hasn’t been run previously
See
get_feature_effect_metadata
for retrieving information of source, backtest_index.Parameters: - max_wait : int, optional
The maximum time to wait for a requested feature effect job to complete before erroring
- source : string
The source Feature Effects are retrieved for. One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the availiable sources for feature effect.
- backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.
The backtest index to retrieve Feature Effects for.
Returns: - feature_effects : FeatureEffects
The feature effects data.
-
request_feature_fit
(backtest_index)¶ Request feature fit to be computed for the model.
See
get_feature_fit
for more information on the result of the job.See
get_feature_fit_metadata
for retrieving information of backtest_index.Parameters: - backtest_index: string, FeatureFitMetadataDatetime.backtest_index.
The backtest index to retrieve Feature Fit for.
Returns: - job : Job
A Job representing the feature fit computation. To get the completed feature fit data, use job.get_result or job.get_result_when_complete.
Raises: - JobAlreadyRequested (422)
If the feature fit have already been requested.
-
get_feature_fit
(source, backtest_index)¶ Retrieve Feature Fit for the model.
Feature Fit provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Fit has already been computed with
request_feature_fit
.See
get_feature_fit_metadata
for retrieving information of source, backtest_index.Parameters: - source: string
The source Feature Fit are retrieved for. One value of [FeatureFitMetadataDatetime.sources]. To retrieve the availiable sources for feature fit.
- backtest_index: string, FeatureFitMetadataDatetime.backtest_index.
The backtest index to retrieve Feature Fit for.
Returns: - feature_fit: FeatureFit
The feature fit data.
Raises: - ClientError (404)
If the feature fit have not been computed or source is not valid value.
-
get_or_request_feature_fit
(source, backtest_index, max_wait=600)¶ Retrieve feature fit for the model, requesting a job if it hasn’t been run previously
See
get_feature_fit_metadata
for retrieving information of source, backtest_index.Parameters: - max_wait : int, optional
The maximum time to wait for a requested feature fit job to complete before erroring
- source : string
The source Feature Fit are retrieved for. One value of [FeatureFitMetadataDatetime.sources]. To retrieve the availiable sources for feature effect.
- backtest_index: string, FeatureFitMetadataDatetime.backtest_index.
The backtest index to retrieve Feature Fit for.
Returns: - feature_fit : FeatureFit
The feature fit data.
-
calculate_prediction_intervals
(prediction_intervals_size)¶ Calculate prediction intervals for this DatetimeModel for the specified size.
New in version v2.19.
Parameters: - prediction_intervals_size : int
The prediction intervals size to calculate for this model. See the prediction intervals documentation for more information.
Returns: - job : Job
a
Job
tracking the prediction intervals computation
-
get_calculated_prediction_intervals
(offset=None, limit=None)¶ Retrieve a list of already-calculated prediction intervals for this model
New in version v2.19.
Parameters: - offset : int, optional
If provided, this many results will be skipped
- limit : int, optional
If provided, at most this many results will be returned. If not provided, will return at most 100 results.
Returns: - list[int]
A descending-ordered list of already-calculated prediction interval sizes
-
advanced_tune
(params, description=None)¶ Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.
Parameters: - params : dict
Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- description : unicode
Human-readable string describing the newly advanced-tuned model
Returns: - ModelJob
The created job to build the model
-
delete
()¶ Delete a model from the project’s leaderboard.
-
download_export
(filepath)¶ Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.
This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.
Parameters: - filepath : str
The path at which to save the exported model file.
-
download_scoring_code
(file_name, source_code=False)¶ Download scoring code JAR.
Parameters: - file_name : str
File path where scoring code will be saved.
- source_code : bool, optional
Set to True to download source code archive. It will not be executable.
-
classmethod
fetch_resource_data
(url, join_endpoint=True)¶ (Deprecated.) Used to acquire model data directly from its url.
Consider using get instead, as this is a convenience function used for development of datarobot
Parameters: - url : str
The resource we are acquiring
- join_endpoint : boolean, optional
Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint
Returns: - model_data : dict
The queried model’s data
-
get_advanced_tuning_parameters
()¶ Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.
Returns: - dict
A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuningDescription and tuningParameters.
tuningDescription an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.
tuningParameters is a list of a dicts, each has the following keys
- parameterName : (unicode) name of the parameter (unique per task, see below)
- parameterId : (unicode) opaque ID string uniquely identifying parameter
- defaultValue : (*) default value of the parameter for the blueprint
- currentValue : (*) value of the parameter that was used for this model
- taskName : (unicode) name of the task that this parameter belongs to
- constraints: (dict) see the notes below
Notes
The type of defaultValue and currentValue is defined by the constraints structure. It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.
"constraints": { "select": { "values": [<list(basestring or number) : possible values>] }, "ascii": {}, "unicode": {}, "int": { "min": <int : minimum valid value>, "max": <int : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "float": { "min": <float : minimum valid value>, "max": <float : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "intList": { "length": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <int : minimum valid value>, "max_val": <int : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "floatList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <float : minimum valid value>, "max_val": <float : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> } }
The keys have meaning as follows:
- select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
- ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
- unicode: The parameter may be any Python unicode object.
- int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
- float: The value may be an object of type float within the specified range (inclusive).
- intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.
-
get_all_confusion_charts
(fallback_to_parent_insights=False)¶ Retrieve a list of all confusion charts available for the model.
Parameters: - fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of ConfusionChart
Data for all available confusion charts for model.
-
get_all_lift_charts
(fallback_to_parent_insights=False)¶ Retrieve a list of all lift charts available for the model.
Parameters: - fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of LiftChart
Data for all available model lift charts.
-
get_all_residuals_charts
(fallback_to_parent_insights=False)¶ Retrieve a list of all lift charts available for the model.
Parameters: - fallback_to_parent_insights : bool
Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of ResidualsChart
Data for all available model residuals charts.
-
get_all_roc_curves
(fallback_to_parent_insights=False)¶ Retrieve a list of all ROC curves available for the model.
Parameters: - fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of RocCurve
Data for all available model ROC curves.
-
get_confusion_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model’s confusion chart for the specified source.
Parameters: - source : str
Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: - ConfusionChart
Model ConfusionChart data
Raises: - ClientError
If the insight is not available for this model
-
get_feature_impact
()¶ Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.
Elsewhere this technique is sometimes called ‘Permutation Importance’.
Requires that Feature Impact has already been computed with
request_feature_impact
.Returns: - feature_impacts : list of dict
The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Raises: - ClientError (404)
If the feature impacts have not been computed.
-
get_features_used
()¶ Query the server to determine which features were used.
Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.
Returns: - features : list of str
The names of the features used in the model.
-
get_frozen_child_models
()¶ Retrieves the ids for all the models that are frozen from this model
Returns: - A list of Models
-
get_leaderboard_ui_permalink
()¶ Returns: - url : str
Permanent static hyperlink to this model at leaderboard.
-
get_lift_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model lift chart for the specified source.
Parameters: - source : str
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: - LiftChart
Model lift chart data
Raises: - ClientError
If the insight is not available for this model
-
get_missing_report_info
()¶ Retrieve a model missing data report on training data that can be used to understand missing values treatment in a model. Report consists of missing values reports for features which took part in modelling and are numeric or categorical.
Returns: - An iterable of MissingReportPerFeature
The queried model missing report, sorted by missing count (DESCENDING order).
-
get_model_blueprint_chart
()¶ Retrieve a model blueprint chart that can be used to understand data flow in blueprint.
Returns: - ModelBlueprintChart
The queried model blueprint chart.
-
get_model_blueprint_documents
()¶ Get documentation for tasks used in this model.
Returns: - list of BlueprintTaskDocument
All documents available for the model.
-
get_multiclass_feature_impact
()¶ For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.
Requires that Feature Impact has already been computed with
request_feature_impact
.Returns: - feature_impacts : list of dict
The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Raises: - ClientError (404)
If the multiclass feature impacts have not been computed.
-
get_multiclass_lift_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model lift chart for the specified source.
Parameters: - source : str
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: - list of LiftChart
Model lift chart data for each saved target class
Raises: - ClientError
If the insight is not available for this model
-
get_or_request_feature_impact
(max_wait=600)¶ Retrieve feature impact for the model, requesting a job if it hasn’t been run previously
Parameters: - max_wait : int, optional
The maximum time to wait for a requested feature impact job to complete before erroring
Returns: - feature_impacts : list of dict
The feature impact data. See
get_feature_impact
for the exact schema.
-
get_parameters
()¶ Retrieve model parameters.
Returns: - ModelParameters
Model parameters for this model.
-
get_pareto_front
()¶ Retrieve the Pareto Front for a Eureqa model.
This method is only supported for Eureqa models.
Returns: - ParetoFront
Model ParetoFront data
-
get_prime_eligibility
()¶ Check if this model can be approximated with DataRobot Prime
Returns: - prime_eligibility : dict
a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
-
get_residuals_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model residuals chart for the specified source.
Parameters: - source : str
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
Returns: - ResidualsChart
Model residuals chart data
Raises: - ClientError
If the insight is not available for this model
-
get_roc_curve
(source, fallback_to_parent_insights=False)¶ Retrieve model ROC curve for the specified source.
Parameters: - source : str
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
Returns: - RocCurve
Model ROC curve data
Raises: - ClientError
If the insight is not available for this model
-
get_rulesets
()¶ List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.
Returns: - rulesets : list of Ruleset
-
get_supported_capabilities
()¶ Retrieves a summary of the capabilities supported by a model.
New in version v2.14.
Returns: - supportsBlending: bool
whether the model supports blending
- supportsMonotonicConstraints: bool
whether the model supports monotonic constraints
- hasWordCloud: bool
whether the model has word cloud data available
- eligibleForPrime: bool
whether the model is eligible for Prime
- hasParameters: bool
whether the model has parameters that can be retrieved
- supportsCodeGeneration: bool
(New in version v2.18) whether the model supports code generation
-
get_word_cloud
(exclude_stop_words=False)¶ Retrieve a word cloud data for the model.
Parameters: - exclude_stop_words : bool, optional
Set to True if you want stopwords filtered out of response.
Returns: - WordCloud
Word cloud data for the model.
-
open_model_browser
()¶ Opens model at project leaderboard in web browser.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
-
request_approximation
()¶ Request an approximation of this model using DataRobot Prime
This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.
Returns: - job : Job
the job generating the rulesets
-
request_feature_impact
()¶ Request feature impacts to be computed for the model.
See
get_feature_impact
for more information on the result of the job.Returns: - job : Job
A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.
Raises: - JobAlreadyRequested (422)
If the feature impacts have already been requested.
-
request_frozen_datetime_model
(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None)¶ Train a new frozen model with parameters from this model
Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.
Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.
Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).
Parameters: - training_row_count : int, optional
the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
- training_duration : str, optional
a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
- training_start_date : datetime.datetime, optional
the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
- training_end_date : datetime.datetime, optional
the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
- time_window_sample_pct : int, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
Returns: - model_job : ModelJob
the modeling job training a frozen model
-
request_predictions
(dataset_id, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None)¶ Request predictions against a previously uploaded dataset
Parameters: - dataset_id : string
The dataset to make predictions against (as uploaded from Project.upload_dataset)
- include_prediction_intervals : bool, optional
(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_size : int, optional
(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_point : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_date : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_date : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.
Returns: - job : PredictJob
The job computing the predictions
-
request_transferable_export
(prediction_intervals_size=None)¶ Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.
This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.
This function does not download the exported file. Use download_export for that.
Parameters: - prediction_intervals_size : int, optional
(New in v2.19) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Prediction intervals size must be between 1 and 100 (inclusive).
Examples
model = datarobot.Model.get('p-id', 'l-id') job = model.request_transferable_export() job.wait_for_completion() model.download_export('my_exported_model.drmodel') # Client must be configured to use standalone prediction server for import: datarobot.Client(token='my-token-at-standalone-server', endpoint='standalone-server-url/api/v2') imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
-
set_prediction_threshold
(threshold)¶ Set a custom prediction threshold for the model
May not be used once
prediction_threshold_read_only
is True for this model.Parameters: - threshold : float
only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).
-
star_model
()¶ Mark the model as starred
Model stars propagate to the web application and the API, and can be used to filter when listing models.
-
start_advanced_tuning_session
()¶ Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.
As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.
Returns: - AdvancedTuningSession
Session for setting up and running Advanced Tuning on a model
-
train_datetime
(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False)¶ Train this model on a different featurelist or amount of data
Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.
Parameters: - featurelist_id : str, optional
the featurelist to use to train the model. If not specified, the featurelist of this model is used.
- training_row_count : int, optional
the number of rows of data that should be used to train the model. If specified, neither
training_duration
noruse_project_settings
may be specified.- training_duration : str, optional
a duration string specifying what time range the data used to train the model should span. If specified, neither
training_row_count
noruse_project_settings
may be specified.- use_project_settings : bool, optional
(New in version v2.20) defaults to
False
. IfTrue
, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neithertraining_row_count
nortraining_duration
may be specified.- time_window_sample_pct : int, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- monotonic_increasing_featurelist_id : str, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_id : str, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.
Returns: - job : ModelJob
the created job to build the model
-
unstar_model
()¶ Unmark the model as starred
Model stars propagate to the web application and the API, and can be used to filter when listing models.
Frozen Model¶
-
class
datarobot.models.
FrozenModel
(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, parent_model_id=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None)¶ A model tuned with parameters which are derived from another model
Attributes: - id : str
the id of the model
- project_id : str
the id of the project the model belongs to
- processes : list of str
the processes used by the model
- featurelist_name : str
the name of the featurelist used by the model
- featurelist_id : str
the id of the featurelist used by the model
- sample_pct : float
the percentage of the project dataset used in training the model
- training_row_count : int or None
the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
- training_duration : str or None
only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_date : datetime or None
only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_date : datetime or None
only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- model_type : str
what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
- model_category : str
what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
- is_frozen : bool
whether this model is a frozen model
- parent_model_id : str
the id of the model that tuning parameters are derived from
- blueprint_id : str
the id of the blueprint used in this model
- metrics : dict
a mapping from each metric to the model’s scores for that metric
- monotonic_increasing_featurelist_id : str
optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_id : str
optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraints : bool
optional, whether this model supports enforcing monotonic constraints
- is_starred : bool
whether this model marked as starred
- prediction_threshold : float
for binary classification projects, the threshold used for predictions
- prediction_threshold_read_only : bool
indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- model_number : integer
model number assigned to a model
-
classmethod
get
(project_id, model_id)¶ Retrieve a specific frozen model.
Parameters: - project_id : str
The project’s id.
- model_id : str
The
model_id
of the leaderboard item to retrieve.
Returns: - model : FrozenModel
The queried instance.
Imported Model¶
Note
Imported Models are used in Stand Alone Scoring Engines. If you are not an administrator of such an engine, they are not relevant to you.
-
class
datarobot.models.
ImportedModel
(id, imported_at=None, model_id=None, target=None, featurelist_name=None, dataset_name=None, model_name=None, project_id=None, version=None, note=None, origin_url=None, imported_by_username=None, project_name=None, created_by_username=None, created_by_id=None, imported_by_id=None, display_name=None)¶ Represents an imported model available for making predictions. These are only relevant for administrators of on-premise Stand Alone Scoring Engines.
ImportedModels are trained in one DataRobot application, exported as a .drmodel file, and then imported for use in a Stand Alone Scoring Engine.
Attributes: - id : str
id of the import
- model_name : str
model type describing the model generated by DataRobot
- display_name : str
manually specified human-readable name of the imported model
- note : str
manually added node about this imported model
- imported_at : datetime
the time the model was imported
- imported_by_username : str
username of the user who imported the model
- imported_by_id : str
id of the user who imported the model
- origin_url : str
URL of the application the model was exported from
- model_id : str
original id of the model prior to export
- featurelist_name : str
name of the featurelist used to train the model
- project_id : str
id of the project the model belonged to prior to export
- project_name : str
name of the project the model belonged to prior to export
- target : str
the target of the project the model belonged to prior to export
- version : float
project version of the project the model belonged to
- dataset_name : str
filename of the dataset used to create the project the model belonged to
- created_by_username : str
username of the user who created the model prior to export
- created_by_id : str
id of the user who created the model prior to export
-
classmethod
create
(path)¶ Import a previously exported model for predictions.
Parameters: - path : str
The path to the exported model file
-
classmethod
get
(import_id)¶ Retrieve imported model info
Parameters: - import_id : str
The ID of the imported model.
Returns: - imported_model : ImportedModel
The ImportedModel instance
-
classmethod
list
(limit=None, offset=None)¶ List the imported models.
Parameters: - limit : int
The number of records to return. The server will use a (possibly finite) default if not specified.
- offset : int
The number of records to skip.
Returns: - imported_models : list[ImportedModel]
-
update
(display_name=None, note=None)¶ Update the display name or note for an imported model. The ImportedModel object is updated in place.
Parameters: - display_name : str
The new display name.
- note : str
The new note.
-
delete
()¶ Delete this imported model.
RatingTableModel¶
-
class
datarobot.models.
RatingTableModel
(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, rating_table_id=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None)¶ A model that has a rating table.
Attributes: - id : str
the id of the model
- project_id : str
the id of the project the model belongs to
- processes : list of str
the processes used by the model
- featurelist_name : str
the name of the featurelist used by the model
- featurelist_id : str
the id of the featurelist used by the model
- sample_pct : float or None
the percentage of the project dataset used in training the model. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date and training_end_date instead.
- training_row_count : int or None
the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
- training_duration : str or None
only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_date : datetime or None
only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_date : datetime or None
only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- model_type : str
what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
- model_category : str
what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
- is_frozen : bool
whether this model is a frozen model
- blueprint_id : str
the id of the blueprint used in this model
- metrics : dict
a mapping from each metric to the model’s scores for that metric
- rating_table_id : str
the id of the rating table that belongs to this model
- monotonic_increasing_featurelist_id : str
optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_id : str
optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraints : bool
optional, whether this model supports enforcing monotonic constraints
- is_starred : bool
whether this model marked as starred
- prediction_threshold : float
for binary classification projects, the threshold used for predictions
- prediction_threshold_read_only : bool
indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- model_number : integer
model number assigned to a model
-
classmethod
get
(project_id, model_id)¶ Retrieve a specific rating table model
If the project does not have a rating table, a ClientError will occur.
Parameters: - project_id : str
the id of the project the model belongs to
- model_id : str
the id of the model to retrieve
Returns: - model : RatingTableModel
the model
-
classmethod
create_from_rating_table
(project_id, rating_table_id)¶ Creates a new model from a validated rating table record. The RatingTable must not be associated with an existing model.
Parameters: - project_id : str
the id of the project the rating table belongs to
- rating_table_id : str
the id of the rating table to create this model from
Returns: - job: Job
an instance of created async job
Raises: - ClientError (422)
Raised if creating model from a RatingTable that failed validation
- JobAlreadyRequested
Raised if creating model from a RatingTable that is already associated with a RatingTableModel
-
advanced_tune
(params, description=None)¶ Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.
Parameters: - params : dict
Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- description : unicode
Human-readable string describing the newly advanced-tuned model
Returns: - ModelJob
The created job to build the model
-
cross_validate
()¶ Run Cross Validation on this model.
Note
To perform Cross Validation on a new model with new parameters, use
train
instead.Returns: - ModelJob
The created job to build the model
-
delete
()¶ Delete a model from the project’s leaderboard.
-
download_export
(filepath)¶ Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.
This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.
Parameters: - filepath : str
The path at which to save the exported model file.
-
download_scoring_code
(file_name, source_code=False)¶ Download scoring code JAR.
Parameters: - file_name : str
File path where scoring code will be saved.
- source_code : bool, optional
Set to True to download source code archive. It will not be executable.
-
classmethod
fetch_resource_data
(url, join_endpoint=True)¶ (Deprecated.) Used to acquire model data directly from its url.
Consider using get instead, as this is a convenience function used for development of datarobot
Parameters: - url : str
The resource we are acquiring
- join_endpoint : boolean, optional
Whether the client’s endpoint should be joined to the URL before sending the request. Location headers are returned as absolute locations, so will _not_ need the endpoint
Returns: - model_data : dict
The queried model’s data
-
get_advanced_tuning_parameters
()¶ Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.
Returns: - dict
A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuningDescription and tuningParameters.
tuningDescription an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.
tuningParameters is a list of a dicts, each has the following keys
- parameterName : (unicode) name of the parameter (unique per task, see below)
- parameterId : (unicode) opaque ID string uniquely identifying parameter
- defaultValue : (*) default value of the parameter for the blueprint
- currentValue : (*) value of the parameter that was used for this model
- taskName : (unicode) name of the task that this parameter belongs to
- constraints: (dict) see the notes below
Notes
The type of defaultValue and currentValue is defined by the constraints structure. It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.
"constraints": { "select": { "values": [<list(basestring or number) : possible values>] }, "ascii": {}, "unicode": {}, "int": { "min": <int : minimum valid value>, "max": <int : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "float": { "min": <float : minimum valid value>, "max": <float : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "intList": { "length": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <int : minimum valid value>, "max_val": <int : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "floatList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <float : minimum valid value>, "max_val": <float : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> } }
The keys have meaning as follows:
- select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
- ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
- unicode: The parameter may be any Python unicode object.
- int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
- float: The value may be an object of type float within the specified range (inclusive).
- intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.
-
get_all_confusion_charts
(fallback_to_parent_insights=False)¶ Retrieve a list of all confusion charts available for the model.
Parameters: - fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of ConfusionChart
Data for all available confusion charts for model.
-
get_all_lift_charts
(fallback_to_parent_insights=False)¶ Retrieve a list of all lift charts available for the model.
Parameters: - fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of LiftChart
Data for all available model lift charts.
-
get_all_residuals_charts
(fallback_to_parent_insights=False)¶ Retrieve a list of all lift charts available for the model.
Parameters: - fallback_to_parent_insights : bool
Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of ResidualsChart
Data for all available model residuals charts.
-
get_all_roc_curves
(fallback_to_parent_insights=False)¶ Retrieve a list of all ROC curves available for the model.
Parameters: - fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
Returns: - list of RocCurve
Data for all available model ROC curves.
-
get_confusion_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model’s confusion chart for the specified source.
Parameters: - source : str
Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: - ConfusionChart
Model ConfusionChart data
Raises: - ClientError
If the insight is not available for this model
-
get_cross_validation_scores
(partition=None, metric=None)¶ Returns a dictionary keyed by metric showing cross validation scores per partition.
Cross Validation should already have been performed using
cross_validate
ortrain
.Note
Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.
Parameters: - partition : float
optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value.
- metric: unicode
optional name of the metric to filter to resulting cross validation scores by
Returns: - cross_validation_scores: dict
A dictionary keyed by metric showing cross validation scores per partition.
-
get_feature_effect
(source)¶ Retrieve Feature Effects for the model.
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the availiable sources.Parameters: - source : string
The source Feature Effects are retrieved for.
Returns: - feature_effects : FeatureEffects
The feature effects data.
Raises: - ClientError (404)
If the feature effects have not been computed or source is not valid value.
-
get_feature_effect_metadata
()¶ - Retrieve Feature Effect metadata. Response contains status and available model sources.
- Feature Fit of training is always available (except for the old project which supports only Feature Fit for validation).
- When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Effect is not available for validation or holdout.
- Feature Effect for holdout is not available when there is no holdout configured for the project.
source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.Returns: - feature_effect_metadata: FeatureEffectMetadata
-
get_feature_fit
(source)¶ Retrieve Feature Fit for the model.
Feature Fit provides partial dependence and predicted vs actual values for top-500 features ordered by feature importance score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Fit has already been computed with
request_feature_effect
.See
get_feature_fit_metadata
for retrieving information the availiable sources.Parameters: - source : string
The source Feature Fit are retrieved for. One value of [FeatureFitMetadata.sources].
Returns: - feature_fit : FeatureFit
The feature fit data.
Raises: - ClientError (404)
If the feature fit have not been computed or source is not valid value.
-
get_feature_fit_metadata
()¶ - Retrieve Feature Fit metadata. Response contains status and available model sources.
- Feature Fit of training is always available (except for the old project which supports only Feature Fit for validation).
- When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Fit is not available for validation or holdout.
- Feature Fit for holdout is not available when there is no holdout configured for the project.
source is expected parameter to retrieve Feature Fit. One of provided sources shall be used.Returns: - feature_effect_metadata: FeatureFitMetadata
-
get_feature_impact
()¶ Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.
Elsewhere this technique is sometimes called ‘Permutation Importance’.
Requires that Feature Impact has already been computed with
request_feature_impact
.Returns: - feature_impacts : list of dict
The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Raises: - ClientError (404)
If the feature impacts have not been computed.
-
get_features_used
()¶ Query the server to determine which features were used.
Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.
Returns: - features : list of str
The names of the features used in the model.
-
get_frozen_child_models
()¶ Retrieves the ids for all the models that are frozen from this model
Returns: - A list of Models
-
get_leaderboard_ui_permalink
()¶ Returns: - url : str
Permanent static hyperlink to this model at leaderboard.
-
get_lift_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model lift chart for the specified source.
Parameters: - source : str
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: - LiftChart
Model lift chart data
Raises: - ClientError
If the insight is not available for this model
-
get_missing_report_info
()¶ Retrieve a model missing data report on training data that can be used to understand missing values treatment in a model. Report consists of missing values reports for features which took part in modelling and are numeric or categorical.
Returns: - An iterable of MissingReportPerFeature
The queried model missing report, sorted by missing count (DESCENDING order).
-
get_model_blueprint_chart
()¶ Retrieve a model blueprint chart that can be used to understand data flow in blueprint.
Returns: - ModelBlueprintChart
The queried model blueprint chart.
-
get_model_blueprint_documents
()¶ Get documentation for tasks used in this model.
Returns: - list of BlueprintTaskDocument
All documents available for the model.
-
get_multiclass_feature_impact
()¶ For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.
Requires that Feature Impact has already been computed with
request_feature_impact
.Returns: - feature_impacts : list of dict
The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
Raises: - ClientError (404)
If the multiclass feature impacts have not been computed.
-
get_multiclass_lift_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model lift chart for the specified source.
Parameters: - source : str
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
Returns: - list of LiftChart
Model lift chart data for each saved target class
Raises: - ClientError
If the insight is not available for this model
-
get_or_request_feature_effect
(source, max_wait=600)¶ Retrieve feature effect for the model, requesting a job if it hasn’t been run previously
See
get_feature_effect_metadata
for retrieving information of source.Parameters: - max_wait : int, optional
The maximum time to wait for a requested feature effect job to complete before erroring
- source : string
The source Feature Effects are retrieved for.
Returns: - feature_effects : FeatureEffects
The feature effects data.
-
get_or_request_feature_fit
(source, max_wait=600)¶ Retrieve feature fit for the model, requesting a job if it hasn’t been run previously
See
get_feature_fit_metadata
for retrieving information of source.Parameters: - max_wait : int, optional
The maximum time to wait for a requested feature fit job to complete before erroring
- source : string
The source Feature Fit are retrieved for. One value of [FeatureFitMetadata.sources].
Returns: - feature_effects : FeatureFit
The feature fit data.
-
get_or_request_feature_impact
(max_wait=600)¶ Retrieve feature impact for the model, requesting a job if it hasn’t been run previously
Parameters: - max_wait : int, optional
The maximum time to wait for a requested feature impact job to complete before erroring
Returns: - feature_impacts : list of dict
The feature impact data. See
get_feature_impact
for the exact schema.
-
get_parameters
()¶ Retrieve model parameters.
Returns: - ModelParameters
Model parameters for this model.
-
get_pareto_front
()¶ Retrieve the Pareto Front for a Eureqa model.
This method is only supported for Eureqa models.
Returns: - ParetoFront
Model ParetoFront data
-
get_prime_eligibility
()¶ Check if this model can be approximated with DataRobot Prime
Returns: - prime_eligibility : dict
a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
-
get_residuals_chart
(source, fallback_to_parent_insights=False)¶ Retrieve model residuals chart for the specified source.
Parameters: - source : str
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
Returns: - ResidualsChart
Model residuals chart data
Raises: - ClientError
If the insight is not available for this model
-
get_roc_curve
(source, fallback_to_parent_insights=False)¶ Retrieve model ROC curve for the specified source.
Parameters: - source : str
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insights : bool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
Returns: - RocCurve
Model ROC curve data
Raises: - ClientError
If the insight is not available for this model
-
get_rulesets
()¶ List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.
Returns: - rulesets : list of Ruleset
-
get_supported_capabilities
()¶ Retrieves a summary of the capabilities supported by a model.
New in version v2.14.
Returns: - supportsBlending: bool
whether the model supports blending
- supportsMonotonicConstraints: bool
whether the model supports monotonic constraints
- hasWordCloud: bool
whether the model has word cloud data available
- eligibleForPrime: bool
whether the model is eligible for Prime
- hasParameters: bool
whether the model has parameters that can be retrieved
- supportsCodeGeneration: bool
(New in version v2.18) whether the model supports code generation
-
get_word_cloud
(exclude_stop_words=False)¶ Retrieve a word cloud data for the model.
Parameters: - exclude_stop_words : bool, optional
Set to True if you want stopwords filtered out of response.
Returns: - WordCloud
Word cloud data for the model.
-
open_model_browser
()¶ Opens model at project leaderboard in web browser.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
-
request_approximation
()¶ Request an approximation of this model using DataRobot Prime
This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.
Returns: - job : Job
the job generating the rulesets
-
request_feature_effect
()¶ Request feature effects to be computed for the model.
See
get_feature_effect
for more information on the result of the job.Returns: - job : Job
A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
Raises: - JobAlreadyRequested (422)
If the feature effect have already been requested.
-
request_feature_fit
()¶ Request feature fit to be computed for the model.
See
get_feature_effect
for more information on the result of the job.Returns: - job : Job
A Job representing the feature fit computation. To get the completed feature fit data, use job.get_result or job.get_result_when_complete.
Raises: - JobAlreadyRequested (422)
If the feature effect have already been requested.
-
request_feature_impact
()¶ Request feature impacts to be computed for the model.
See
get_feature_impact
for more information on the result of the job.Returns: - job : Job
A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.
Raises: - JobAlreadyRequested (422)
If the feature impacts have already been requested.
-
request_frozen_datetime_model
(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None)¶ Train a new frozen model with parameters from this model
Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.
Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.
Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).
Parameters: - training_row_count : int, optional
the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
- training_duration : str, optional
a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
- training_start_date : datetime.datetime, optional
the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
- training_end_date : datetime.datetime, optional
the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
- time_window_sample_pct : int, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
Returns: - model_job : ModelJob
the modeling job training a frozen model
-
request_frozen_model
(sample_pct=None, training_row_count=None)¶ Train a new frozen model with parameters from this model
Note
This method only works if project the model belongs to is not datetime partitioned. If it is, use
request_frozen_datetime_model
instead.Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
Parameters: - sample_pct : float
optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.
- training_row_count : int
(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.
Returns: - model_job : ModelJob
the modeling job training a frozen model
-
request_predictions
(dataset_id, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None)¶ Request predictions against a previously uploaded dataset
Parameters: - dataset_id : string
The dataset to make predictions against (as uploaded from Project.upload_dataset)
- include_prediction_intervals : bool, optional
(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_size : int, optional
(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_point : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_date : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_date : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.
Returns: - job : PredictJob
The job computing the predictions
-
request_training_predictions
(data_subset)¶ Start a job to build training predictions
Parameters: - data_subset : str
data set definition to build predictions on. Choices are:
- dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
- models in datetime partitioned projects
- dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
- all data except training set. Not valid for models in datetime partitioned projects
- dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
- dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
- the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
Returns: - Job
an instance of created async job
-
request_transferable_export
(prediction_intervals_size=None)¶ Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.
This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.
This function does not download the exported file. Use download_export for that.
Parameters: - prediction_intervals_size : int, optional
(New in v2.19) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Prediction intervals size must be between 1 and 100 (inclusive).
Examples
model = datarobot.Model.get('p-id', 'l-id') job = model.request_transferable_export() job.wait_for_completion() model.download_export('my_exported_model.drmodel') # Client must be configured to use standalone prediction server for import: datarobot.Client(token='my-token-at-standalone-server', endpoint='standalone-server-url/api/v2') imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')
-
retrain
(sample_pct=None, featurelist_id=None, training_row_count=None)¶ Submit a job to the queue to train a blender model.
Parameters: - sample_pct: str, optional
The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
- featurelist_id : str, optional
The featurelist id
- training_row_count : str, optional
The number of rows to train the model. If this parameter is used then sample_pct should not be given.
Returns: - job : ModelJob
The created job that is retraining the model
-
set_prediction_threshold
(threshold)¶ Set a custom prediction threshold for the model
May not be used once
prediction_threshold_read_only
is True for this model.Parameters: - threshold : float
only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).
-
star_model
()¶ Mark the model as starred
Model stars propagate to the web application and the API, and can be used to filter when listing models.
-
start_advanced_tuning_session
()¶ Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.
As of v2.17, all models other than blenders, open source, prime, scaleout, baseline and user-created support Advanced Tuning.
Returns: - AdvancedTuningSession
Session for setting up and running Advanced Tuning on a model
-
train
(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)¶ Train the blueprint used in model on a particular featurelist or amount of data.
This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.
Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.
In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.
Note
For datetime partitioned projects, see
train_datetime
instead.Parameters: - sample_pct : float, optional
The amount of data to use for training, as a percentage of the project dataset from 0 to 100.
- featurelist_id : str, optional
The identifier of the featurelist to use. If not defined, the featurelist of this model is used.
- scoring_type : str, optional
Either
SCORING_TYPE.validation
orSCORING_TYPE.cross_validation
.SCORING_TYPE.validation
is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning,SCORING_TYPE.cross_validation
can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.- training_row_count : int, optional
The number of rows to use to train the requested model.
- monotonic_increasing_featurelist_id : str
(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_id : str
(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.
Returns: - model_job_id : str
id of created job, can be used as parameter to
ModelJob.get
method orwait_for_async_model_creation
function
Examples
project = Project.get('p-id') model = Model.get('p-id', 'l-id') model_job_id = model.train(training_row_count=project.max_train_rows)
-
train_datetime
(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False)¶ Train this model on a different featurelist or amount of data
Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.
Parameters: - featurelist_id : str, optional
the featurelist to use to train the model. If not specified, the featurelist of this model is used.
- training_row_count : int, optional
the number of rows of data that should be used to train the model. If specified, neither
training_duration
noruse_project_settings
may be specified.- training_duration : str, optional
a duration string specifying what time range the data used to train the model should span. If specified, neither
training_row_count
noruse_project_settings
may be specified.- use_project_settings : bool, optional
(New in version v2.20) defaults to
False
. IfTrue
, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neithertraining_row_count
nortraining_duration
may be specified.- time_window_sample_pct : int, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- monotonic_increasing_featurelist_id : str, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_id : str, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.
Returns: - job : ModelJob
the created job to build the model
-
unstar_model
()¶ Unmark the model as starred
Model stars propagate to the web application and the API, and can be used to filter when listing models.
Advanced Tuning¶
-
class
datarobot.models.advanced_tuning.
AdvancedTuningSession
(model)¶ A session enabling users to configure and run advanced tuning for a model.
Every model contains a set of one or more tasks. Every task contains a set of zero or more parameters. This class allows tuning the values of each parameter on each task of a model, before running that model.
This session is client-side only and is not persistent. Only the final model, constructed when run is called, is persisted on the DataRobot server.
Attributes: - description : basestring
Description for the new advance-tuned model. Defaults to the same description as the base model.
-
get_task_names
()¶ Get the list of task names that are available for this model
Returns: - list(basestring)
List of task names
-
get_parameter_names
(task_name)¶ Get the list of parameter names available for a specific task
Returns: - list(basestring)
List of parameter names
-
set_parameter
(value, task_name=None, parameter_name=None, parameter_id=None)¶ Set the value of a parameter to be used
The caller must supply enough of the optional arguments to this function to uniquely identify the parameter that is being set. For example, a less-common parameter name such as ‘building_block__complementary_error_function’ might only be used once (if at all) by a single task in a model. In which case it may be sufficient to simply specify ‘parameter_name’. But a more-common name such as ‘random_seed’ might be used by several of the model’s tasks, and it may be necessary to also specify ‘task_name’ to clarify which task’s random seed is to be set. This function only affects client-side state. It will not check that the new parameter value(s) are valid.
Parameters: - task_name : basestring
Name of the task whose parameter needs to be set
- parameter_name : basestring
Name of the parameter to set
- parameter_id : basestring
ID of the parameter to set
- value : int, float, list, or basestring
New value for the parameter, with legal values determined by the parameter being set
Raises: - NoParametersFoundException
if no matching parameters are found.
- NonUniqueParametersException
if multiple parameters matched the specified filtering criteria
-
get_parameters
()¶ Returns the set of parameters available to this model
The returned parameters have one additional key, “value”, reflecting any new values that have been set in this AdvancedTuningSession. When the session is run, “value” will be used, or if it is unset, “current_value”.
Returns: - parameters : dict
“Parameters” dictionary, same as specified on Model.get_advanced_tuning_params.
- An additional field is added per parameter to the ‘tuningParameters’ list in the dictionary:
- value : int, float, list, or basestring
The current value of the parameter. None if none has been specified.
-
run
()¶ Submit this model for Advanced Tuning.
Returns: - datarobot.models.modeljob.ModelJob
The created job to build the model
ModelJob¶
-
datarobot.models.modeljob.
wait_for_async_model_creation
(project_id, model_job_id, max_wait=600)¶ Given a Project id and ModelJob id poll for status of process responsible for model creation until model is created.
Parameters: - project_id : str
The identifier of the project
- model_job_id : str
The identifier of the ModelJob
- max_wait : int, optional
Time in seconds after which model creation is considered unsuccessful
Returns: - model : Model
Newly created model
Raises: - AsyncModelCreationError
Raised if status of fetched ModelJob object is
error
- AsyncTimeoutError
Model wasn’t created in time, specified by
max_wait
parameter
-
class
datarobot.models.
ModelJob
(data, completed_resource_url=None)¶ Tracks asynchronous work being done within a project
Attributes: - id : int
the id of the job
- project_id : str
the id of the project the job belongs to
- status : str
the status of the job - will be one of
datarobot.enums.QUEUE_STATUS
- job_type : str
what kind of work the job is doing - will be ‘model’ for modeling jobs
- is_blocked : bool
if true, the job is blocked (cannot be executed) until its dependencies are resolved
- sample_pct : float
the percentage of the project’s dataset used in this modeling job
- model_type : str
the model this job builds (e.g. ‘Nystroem Kernel SVM Regressor’)
- processes : list of str
the processes used by the model
- featurelist_id : str
the id of the featurelist used in this modeling job
- blueprint : Blueprint
the blueprint used in this modeling job
-
classmethod
from_job
(job)¶ Transforms a generic Job into a ModelJob
Parameters: - job: Job
A generic job representing a ModelJob
Returns: - model_job: ModelJob
A fully populated ModelJob with all the details of the job
Raises: - ValueError:
If the generic Job was not a model job, e.g. job_type != JOB_TYPE.MODEL
-
classmethod
get
(project_id, model_job_id)¶ Fetches one ModelJob. If the job finished, raises PendingJobFinished exception.
Parameters: - project_id : str
The identifier of the project the model belongs to
- model_job_id : str
The identifier of the model_job
Returns: - model_job : ModelJob
The pending ModelJob
Raises: - PendingJobFinished
If the job being queried already finished, and the server is re-routing to the finished model.
- AsyncFailureError
Querying this resource gave a status code other than 200 or 303
-
classmethod
get_model
(project_id, model_job_id)¶ Fetches a finished model from the job used to create it.
Parameters: - project_id : str
The identifier of the project the model belongs to
- model_job_id : str
The identifier of the model_job
Returns: - model : Model
The finished model
Raises: - JobNotFinished
If the job has not finished yet
- AsyncFailureError
Querying the model_job in question gave a status code other than 200 or 303
-
cancel
()¶ Cancel this job. If this job has not finished running, it will be removed and canceled.
-
get_result
(params=None)¶ Parameters: - params : dict, optional
- Query parameters to be added to request to get results.
- For featureEffects and featureFit, source param is required to define source,
- otherwise the default is `training`
Returns: - result : object
- Return type depends on the job type:
- for model jobs, a Model is returned
- for predict jobs, a pandas.DataFrame (with predictions) is returned
- for featureImpact jobs, a list of dicts (see
Model.get_feature_impact
for more detail) - for primeRulesets jobs, a list of Rulesets
- for primeModel jobs, a PrimeModel
- for primeDownloadValidation jobs, a PrimeFile
- for reasonCodesInitialization jobs, a ReasonCodesInitialization
- for reasonCodes jobs, a ReasonCodes
- for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
- for predictionExplanations jobs, a PredictionExplanations
- for featureEffects, a FeatureEffects
- for featureFit, a FeatureFit
Raises: - JobNotFinished
If the job is not finished, the result is not available.
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
-
get_result_when_complete
(max_wait=600, params=None)¶ Parameters: - max_wait : int, optional
How long to wait for the job to finish.
- params : dict, optional
Query parameters to be added to request.
Returns: - result: object
Return type is the same as would be returned by Job.get_result.
Raises: - AsyncTimeoutError
If the job does not finish in time
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
-
refresh
()¶ Update this object with the latest job data from the server.
-
wait_for_completion
(max_wait=600)¶ Waits for job to complete.
Parameters: - max_wait : int, optional
How long to wait for the job to finish.
Pareto Front¶
-
class
datarobot.models.pareto_front.
ParetoFront
(project_id, error_metric, hyperparameters, target_type, solutions)¶ Pareto front data for a Eureqa model.
The pareto front reflects the tradeoffs between error and complexity for particular model. The solutions reflect possible Eureqa models that are different levels of complexity. By default, only one solution will have a corresponding model, but models can be created for each solution.
Attributes: - project_id : str
the ID of the project the model belongs to
- error_metric : str
Eureqa error-metric identifier used to compute error metrics for this search. Note that Eureqa error metrics do NOT correspond 1:1 with DataRobot error metrics – the available metrics are not the same, and are computed from a subset of the training data rather than from the validation data.
- hyperparameters : dict
Hyperparameters used by this run of the Eureqa blueprint
- target_type : str
Indicating what kind of modeling is being done in this project, either ‘Regression’, ‘Binary’ (Binary classification), or ‘Multiclass’ (Multiclass classification).
- solutions : list(Solution)
Solutions that Eureqa has found to model this data. Some solutions will have greater accuracy. Others will have slightly less accuracy but will use simpler expressions.
-
class
datarobot.models.pareto_front.
Solution
(eureqa_solution_id, complexity, error, expression, expression_annotated, best_model, project_id)¶ Eureqa Solution.
A solution represents a possible Eureqa model; however not all solutions have models associated with them. It must have a model created before it can be used to make predictions, etc.
Attributes: - eureqa_solution_id: str
ID of this Solution
- complexity: int
Complexity score for this solution. Complexity score is a function of the mathematical operators used in the current solution. The Complexity calculation can be tuned via model hyperparameters.
- error: float
Error for the current solution, as computed by Eureqa using the ‘error_metric’ error metric.
- expression: str
Eureqa model equation string.
- expression_annotated: str
Eureqa model equation string with variable names tagged for easy identification.
- best_model: bool
True, if the model is determined to be the best
-
create_model
()¶ Add this solution to the leaderboard, if it is not already present.
Partitioning¶
-
class
datarobot.
RandomCV
(holdout_pct, reps, seed=0)¶ A partition in which observations are randomly assigned to cross-validation groups and the holdout set.
Parameters: - holdout_pct : int
the desired percentage of dataset to assign to holdout set
- reps : int
number of cross validation folds to use
- seed : int
a seed to use for randomization
-
class
datarobot.
StratifiedCV
(holdout_pct, reps, seed=0)¶ A partition in which observations are randomly assigned to cross-validation groups and the holdout set, preserving in each group the same ratio of positive to negative cases as in the original data.
Parameters: - holdout_pct : int
the desired percentage of dataset to assign to holdout set
- reps : int
number of cross validation folds to use
- seed : int
a seed to use for randomization
-
class
datarobot.
GroupCV
(holdout_pct, reps, partition_key_cols, seed=0)¶ A partition in which one column is specified, and rows sharing a common value for that column are guaranteed to stay together in the partitioning into cross-validation groups and the holdout set.
Parameters: - holdout_pct : int
the desired percentage of dataset to assign to holdout set
- reps : int
number of cross validation folds to use
- partition_key_cols : list
a list containing a single string, where the string is the name of the column whose values should remain together in partitioning
- seed : int
a seed to use for randomization
-
class
datarobot.
UserCV
(user_partition_col, cv_holdout_level, seed=0)¶ A partition where the cross-validation folds and the holdout set are specified by the user.
Parameters: - user_partition_col : string
the name of the column containing the partition assignments
- cv_holdout_level
the value of the partition column indicating a row is part of the holdout set
- seed : int
a seed to use for randomization
-
class
datarobot.
RandomTVH
(holdout_pct, validation_pct, seed=0)¶ Specifies a partitioning method in which rows are randomly assigned to training, validation, and holdout.
Parameters: - holdout_pct : int
the desired percentage of dataset to assign to holdout set
- validation_pct : int
the desired percentage of dataset to assign to validation set
- seed : int
a seed to use for randomization
-
class
datarobot.
UserTVH
(user_partition_col, training_level, validation_level, holdout_level, seed=0)¶ Specifies a partitioning method in which rows are assigned by the user to training, validation, and holdout sets.
Parameters: - user_partition_col : string
the name of the column containing the partition assignments
- training_level
the value of the partition column indicating a row is part of the training set
- validation_level
the value of the partition column indicating a row is part of the validation set
- holdout_level
the value of the partition column indicating a row is part of the holdout set (use None if you want no holdout set)
- seed : int
a seed to use for randomization
-
class
datarobot.
StratifiedTVH
(holdout_pct, validation_pct, seed=0)¶ A partition in which observations are randomly assigned to train, validation, and holdout sets, preserving in each group the same ratio of positive to negative cases as in the original data.
Parameters: - holdout_pct : int
the desired percentage of dataset to assign to holdout set
- validation_pct : int
the desired percentage of dataset to assign to validation set
- seed : int
a seed to use for randomization
-
class
datarobot.
GroupTVH
(holdout_pct, validation_pct, partition_key_cols, seed=0)¶ A partition in which one column is specified, and rows sharing a common value for that column are guaranteed to stay together in the partitioning into the training, validation, and holdout sets.
Parameters: - holdout_pct : int
the desired percentage of dataset to assign to holdout set
- validation_pct : int
the desired percentage of dataset to assign to validation set
- partition_key_cols : list
a list containing a single string, where the string is the name of the column whose values should remain together in partitioning
- seed : int
a seed to use for randomization
-
class
datarobot.
DatetimePartitioningSpecification
(datetime_partition_column, autopilot_data_selection_method=None, validation_duration=None, holdout_start_date=None, holdout_duration=None, disable_holdout=None, gap_duration=None, number_of_backtests=None, backtests=None, use_time_series=False, default_to_known_in_advance=False, default_to_do_not_derive=False, feature_derivation_window_start=None, feature_derivation_window_end=None, feature_settings=None, forecast_window_start=None, forecast_window_end=None, windows_basis_unit=None, treat_as_exponential=None, differencing_method=None, periodicities=None, multiseries_id_columns=None, use_cross_series_features=None, aggregation_type=None, cross_series_group_by_columns=None, calendar_id=None, holdout_end_date=None, unsupervised_mode=False)¶ Uniquely defines a DatetimePartitioning for some project
Includes only the attributes of DatetimePartitioning that are directly controllable by users, not those determined by the DataRobot application based on the project dataset and the user-controlled settings.
This is the specification that should be passed to
Project.set_target
via thepartitioning_method
parameter. To see the full partitioning based on the project dataset, useDatetimePartitioning.generate
.All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method.Note that either (
holdout_start_date
,holdout_duration
) or (holdout_start_date
,holdout_end_date
) can be used to specify holdout partitioning settings.Attributes: - datetime_partition_column : str
the name of the column whose values as dates are used to assign a row to a particular partition
- autopilot_data_selection_method : str
one of
datarobot.enums.DATETIME_AUTOPILOT_DATA_SELECTION_METHOD
. Whether models created by the autopilot should use “rowCount” or “duration” as their data_selection_method.- validation_duration : str or None
the default validation_duration for the backtests
- holdout_start_date : datetime.datetime or None
The start date of holdout scoring data. If
holdout_start_date
is specified, eitherholdout_duration
orholdout_end_date
must also be specified. Ifdisable_holdout
is set toTrue
,holdout_start_date
,holdout_duration
, andholdout_end_date
may not be specified.- holdout_duration : str or None
The duration of the holdout scoring data. If
holdout_duration
is specified,holdout_start_date
must also be specified. Ifdisable_holdout
is set toTrue
,holdout_duration
,holdout_start_date
, andholdout_end_date
may not be specified.- holdout_end_date : datetime.datetime or None
The end date of holdout scoring data. If
holdout_end_date
is specified,holdout_start_date
must also be specified. Ifdisable_holdout
is set toTrue
,holdout_end_date
,holdout_start_date
, andholdout_duration
may not be specified.- disable_holdout : bool or None
(New in version v2.8) Whether to suppress allocating a holdout fold. If set to
True
,holdout_start_date
,holdout_duration
, andholdout_end_date
may not be specified.- gap_duration : str or None
The duration of the gap between training and holdout scoring data
- number_of_backtests : int or None
the number of backtests to use
- backtests : list of
BacktestSpecification
the exact specification of backtests to use. The indexes of the specified backtests should range from 0 to number_of_backtests - 1. If any backtest is left unspecified, a default configuration will be chosen.
- use_time_series : bool
(New in version v2.8) Whether to create a time series project (if
True
) or an OTV project which uses datetime partitioning (ifFalse
). The default behaviour is to create an OTV project.- default_to_known_in_advance : bool
(New in version v2.11) Optional, default
False
. Used for time series projects only. Sets whether all features default to being treated as known in advance. Known in advance features are expected to be known for dates in the future when making predictions, e.g., “is this a holiday?”. Individual features can be set to a value different than the default using thefeature_settings
parameter.- default_to_do_not_derive : bool
(New in v2.17) Optional, default
False
. Used for time series projects only. Sets whether all features default to being treated as do-not-derive features, excluding them from feature derivation. Individual features can be set to a value different than the default by using thefeature_settings
parameter.- feature_derivation_window_start : int or None
(New in version v2.8) Only used for time series projects. Offset into the past to define how far back relative to the forecast point the feature derivation window should start. Expressed in terms of the
windows_basis_unit
and should be negative or zero.- feature_derivation_window_end : int or None
(New in version v2.8) Only used for time series projects. Offset into the past to define how far back relative to the forecast point the feature derivation window should end. Expressed in terms of the
windows_basis_unit
and should be a positive value.- feature_settings : list of
FeatureSettings
(New in version v2.9) Optional, a list specifying per feature settings, can be left unspecified.
- forecast_window_start : int or None
(New in version v2.8) Only used for time series projects. Offset into the future to define how far forward relative to the forecast point the forecast window should start. Expressed in terms of the
windows_basis_unit
.- forecast_window_end : int or None
(New in version v2.8) Only used for time series projects. Offset into the future to define how far forward relative to the forecast point the forecast window should end. Expressed in terms of the
windows_basis_unit
.- windows_basis_unit : string, optional
(New in version v2.14) Only used for time series projects. Indicates which unit is a basis for feature derivation window and forecast window. Valid options are detected time unit (one of the
datarobot.enums.TIME_UNITS
) or “ROW”. If omitted, the default value is the detected time unit.- treat_as_exponential : string, optional
(New in version v2.9) defaults to “auto”. Used to specify whether to treat data as exponential trend and apply transformations like log-transform. Use values from the
datarobot.enums.TREAT_AS_EXPONENTIAL
enum.- differencing_method : string, optional
(New in version v2.9) defaults to “auto”. Used to specify which differencing method to apply of case if data is stationary. Use values from
datarobot.enums.DIFFERENCING_METHOD
enum.- periodicities : list of Periodicity, optional
(New in version v2.9) a list of
datarobot.Periodicity
. Periodicities units should be “ROW”, if thewindows_basis_unit
is “ROW”.- multiseries_id_columns : list of str or null
(New in version v2.11) a list of the names of multiseries id columns to define series within the training data. Currently only one multiseries id column is supported.
- use_cross_series_features : bool
(New in version v2.14) Whether to use cross series features.
- aggregation_type : str, optional
(New in version v2.14) The aggregation type to apply when creating cross series features. Optional, must be one of “total” or “average”.
- cross_series_group_by_columns : list of str, optional
(New in version v2.15) List of columns (currently of length 1). Optional setting that indicates how to further split series into related groups. For example, if every series is sales of an individual product, the series group-by could be the product category with values like “men’s clothing”, “sports equipment”, etc.. Can only be used in a multiseries project with
use_cross_series_features
set toTrue
.- calendar_id : str, optional
(New in version v2.15) The id of the
CalendarFile
to use with this project.- unsupervised_mode: bool, optional
(New in version v2.20) defaults to False, indicates whether partitioning should be constructed for the unsupervised project.
-
collect_payload
()¶ Set up the dict that should be sent to the server when setting the target Returns ——- partitioning_spec : dict
-
prep_payload
(project_id, max_wait=600)¶ Run any necessary validation and prep of the payload, including async operations
Mainly used for the datetime partitioning spec but implemented in general for consistency
-
class
datarobot.
BacktestSpecification
(index, gap_duration=None, validation_start_date=None, validation_duration=None, validation_end_date=None, primary_training_start_date=None, primary_training_end_date=None)¶ Uniquely defines a Backtest used in a DatetimePartitioning
Includes only the attributes of a backtest directly controllable by users. The other attributes are assigned by the DataRobot application based on the project dataset and the user-controlled settings.
There are two ways to specify an individual backtest:
Option 1: Use
index
,gap_duration
,validation_start_date
, andvaliidation_duration
. All durations should be specified with a duration string such as those returned by thepartitioning_methods.construct_duration_string
helper method.import datarobot as dr partitioning_spec = dr.DatetimePartitioningSpecification( backtests=[ # modify the first backtest using option 1 dr.BacktestSpecification( index=0, gap_duration=dr.partitioning_methods.construct_duration_string(), validation_start_date=datetime(year=2010, month=1, day=1), validation_duration=dr.partitioning_methods.construct_duration_string(years=1), ) ], # other partitioning settings... )
Option 2 (New in version v2.20): Use
index
,primary_training_start_date
,primary_training_end_date
,validation_start_date
, andvalidation_end_date
. In this case, note that settingprimary_training_end_date
andvalidation_start_date
to the same timestamp will result with no gap being created.import datarobot as dr partitioning_spec = dr.DatetimePartitioningSpecification( backtests=[ # modify the first backtest using option 2 dr.BacktestSpecification( index=0, primary_training_start_date=datetime(year=2005, month=1, day=1), primary_training_end_date=datetime(year=2010, month=1, day=1), validation_start_date=datetime(year=2010, month=1, day=1), validation_end_date=datetime(year=2011, month=1, day=1), ) ], # other partitioning settings... )
Attributes: - index : int
the index of the backtest to update
- gap_duration : str
the desired duration of the gap between training and validation scoring data for the backtest
- validation_start_date : datetime.datetime
the desired start date of the validation scoring data for this backtest
- validation_duration : datetime.datetime
the desired duration of the validation scoring data for this backtest
- validation_end_date : datetime.datetime
the desired end date of the validation scoring data for this backtest
- primary_training_start_date : datetime.datetime
the desired start date of the training partition for this backtest
- primary_training_end_date : datetime.datetime
the desired end date of the training partition for this backtest
-
class
datarobot.
FeatureSettings
(feature_name, known_in_advance=None, do_not_derive=None)¶ Per feature settings
Attributes: - feature_name : string
name of the feature
- known_in_advance : bool
(New in version v2.11) Optional, for time series projects only. Sets whether the feature is known in advance, i.e., values for future dates are known at prediction time. If not specified, the feature uses the value from the default_to_known_in_advance flag.
- do_not_derive : bool
(New in v2.17) Optional, for time series projects only. Sets whether the feature is excluded from feature derivation. If not specified, the feature uses the value from the default_to_do_not_derive flag.
-
class
datarobot.
Periodicity
(time_steps, time_unit)¶ Periodicity configuration
Parameters: - time_steps : int
Time step value
- time_unit : string
Time step unit, valid options are values from datarobot.enums.TIME_UNITS
Examples
from datarobot as dr periodicities = [ dr.Periodicity(time_steps=10, time_unit=dr.enums.TIME_UNITS.HOUR), dr.Periodicity(time_steps=600, time_unit=dr.enums.TIME_UNITS.MINUTE)] spec = dr.DatetimePartitioningSpecification( # ... periodicities=periodicities )
-
class
datarobot.
DatetimePartitioning
(project_id=None, datetime_partition_column=None, date_format=None, autopilot_data_selection_method=None, validation_duration=None, available_training_start_date=None, available_training_duration=None, available_training_row_count=None, available_training_end_date=None, primary_training_start_date=None, primary_training_duration=None, primary_training_row_count=None, primary_training_end_date=None, gap_start_date=None, gap_duration=None, gap_row_count=None, gap_end_date=None, holdout_start_date=None, holdout_duration=None, holdout_row_count=None, holdout_end_date=None, number_of_backtests=None, backtests=None, total_row_count=None, use_time_series=False, default_to_known_in_advance=False, default_to_do_not_derive=False, feature_derivation_window_start=None, feature_derivation_window_end=None, feature_settings=None, forecast_window_start=None, forecast_window_end=None, windows_basis_unit=None, treat_as_exponential=None, differencing_method=None, periodicities=None, multiseries_id_columns=None, number_of_known_in_advance_features=0, number_of_do_not_derive_features=0, use_cross_series_features=None, aggregation_type=None, cross_series_group_by_columns=None, calendar_id=None, calendar_name=None)¶ Full partitioning of a project for datetime partitioning.
To instantiate, use
DatetimePartitioning.get(project_id)
.Includes both the attributes specified by the user, as well as those determined by the DataRobot application based on the project dataset. In order to use a partitioning to set the target, call
to_specification
and pass the resultingDatetimePartitioningSpecification
toProject.set_target
via thepartitioning_method
parameter.The available training data corresponds to all the data available for training, while the primary training data corresponds to the data that can be used to train while ensuring that all backtests are available. If a model is trained with more data than is available in the primary training data, then all backtests may not have scores available.
Attributes: - project_id : str
the id of the project this partitioning applies to
- datetime_partition_column : str
the name of the column whose values as dates are used to assign a row to a particular partition
- date_format : str
the format (e.g. “%Y-%m-%d %H:%M:%S”) by which the partition column was interpreted (compatible with strftime)
- autopilot_data_selection_method : str
one of
datarobot.enums.DATETIME_AUTOPILOT_DATA_SELECTION_METHOD
. Whether models created by the autopilot use “rowCount” or “duration” as their data_selection_method.- validation_duration : str or None
the validation duration specified when initializing the partitioning - not directly significant if the backtests have been modified, but used as the default validation_duration for the backtests. Can be absent if this is a time series project with an irregular primary date/time feature.
- available_training_start_date : datetime.datetime
The start date of the available training data for scoring the holdout
- available_training_duration : str
The duration of the available training data for scoring the holdout
- available_training_row_count : int or None
The number of rows in the available training data for scoring the holdout. Only available when retrieving the partitioning after setting the target.
- available_training_end_date : datetime.datetime
The end date of the available training data for scoring the holdout
- primary_training_start_date : datetime.datetime or None
The start date of primary training data for scoring the holdout. Unavailable when the holdout fold is disabled.
- primary_training_duration : str
The duration of the primary training data for scoring the holdout
- primary_training_row_count : int or None
The number of rows in the primary training data for scoring the holdout. Only available when retrieving the partitioning after setting the target.
- primary_training_end_date : datetime.datetime or None
The end date of the primary training data for scoring the holdout. Unavailable when the holdout fold is disabled.
- gap_start_date : datetime.datetime or None
The start date of the gap between training and holdout scoring data. Unavailable when the holdout fold is disabled.
- gap_duration : str
The duration of the gap between training and holdout scoring data
- gap_row_count : int or None
The number of rows in the gap between training and holdout scoring data. Only available when retrieving the partitioning after setting the target.
- gap_end_date : datetime.datetime or None
The end date of the gap between training and holdout scoring data. Unavailable when the holdout fold is disabled.
- holdout_start_date : datetime.datetime or None
The start date of holdout scoring data. Unavailable when the holdout fold is disabled.
- holdout_duration : str
The duration of the holdout scoring data
- holdout_row_count : int or None
The number of rows in the holdout scoring data. Only available when retrieving the partitioning after setting the target.
- holdout_end_date : datetime.datetime or None
The end date of the holdout scoring data. Unavailable when the holdout fold is disabled.
- number_of_backtests : int
the number of backtests used.
- backtests : list of
Backtest
the configured backtests.
- total_row_count : int
the number of rows in the project dataset. Only available when retrieving the partitioning after setting the target.
- use_time_series : bool
(New in version v2.8) Whether to create a time series project (if
True
) or an OTV project which uses datetime partitioning (ifFalse
). The default behaviour is to create an OTV project.- default_to_known_in_advance : bool
(New in version v2.11) Optional, default
False
. Used for time series projects only. Sets whether all features default to being treated as known in advance. Known in advance features are expected to be known for dates in the future when making predictions, e.g., “is this a holiday?”. Individual features can be set to a value different from the default using thefeature_settings
parameter.- default_to_do_not_derive : bool
(New in v2.17) Optional, default
False
. Used for time series projects only. Sets whether all features default to being treated as do-not-derive features, excluding them from feature derivation. Individual features can be set to a value different from the default by using thefeature_settings
parameter.- feature_derivation_window_start : int or None
(New in version v2.8) Only used for time series projects. Offset into the past to define how far back relative to the forecast point the feature derivation window should start. Expressed in terms of the
windows_basis_unit
.- feature_derivation_window_end : int or None
(New in version v2.8) Only used for time series projects. Offset into the past to define how far back relative to the forecast point the feature derivation window should end. Expressed in terms of the
windows_basis_unit
.- feature_settings : list of
FeatureSettings
(New in version v2.9) Optional, a list specifying per feature settings, can be left unspecified.
- forecast_window_start : int or None
(New in version v2.8) Only used for time series projects. Offset into the future to define how far forward relative to the forecast point the forecast window should start. Expressed in terms of the
windows_basis_unit
.- forecast_window_end : int or None
(New in version v2.8) Only used for time series projects. Offset into the future to define how far forward relative to the forecast point the forecast window should end. Expressed in terms of the
windows_basis_unit
.- windows_basis_unit : string, optional
(New in version v2.14) Only used for time series projects. Indicates which unit is a basis for feature derivation window and forecast window. Valid options are detected time unit (one of the
datarobot.enums.TIME_UNITS
) or “ROW”. If omitted, the default value is detected time unit.- treat_as_exponential : string, optional
(New in version v2.9) defaults to “auto”. Used to specify whether to treat data as exponential trend and apply transformations like log-transform. Use values from the
datarobot.enums.TREAT_AS_EXPONENTIAL
enum.- differencing_method : string, optional
(New in version v2.9) defaults to “auto”. Used to specify which differencing method to apply of case if data is stationary. Use values from the
datarobot.enums.DIFFERENCING_METHOD
enum.- periodicities : list of Periodicity, optional
(New in version v2.9) a list of
datarobot.Periodicity
. Periodicities units should be “ROW”, if thewindows_basis_unit
is “ROW”.- multiseries_id_columns : list of str or null
(New in version v2.11) a list of the names of multiseries id columns to define series within the training data. Currently only one multiseries id column is supported.
- number_of_known_in_advance_features : int
(New in version v2.14) Number of features that are marked as known in advance.
- number_of_do_not_derive_features : int
(New in v2.17) Number of features that are excluded from derivation.
- use_cross_series_features : bool
(New in version v2.14) Whether to use cross series features.
- aggregation_type : str, optional
(New in version v2.14) The aggregation type to apply when creating cross series features. Optional, must be one of “total” or “average”.
- cross_series_group_by_columns : list of str, optional
(New in version v2.15) List of columns (currently of length 1). Optional setting that indicates how to further split series into related groups. For example, if every series is sales of an individual product, the series group-by could be the product category with values like “men’s clothing”, “sports equipment”, etc.. Can only be used in a multiseries project with
use_cross_series_features
set toTrue
.- calendar_id : str, optional
(New in version v2.15) Only available for time series projects. The id of the
CalendarFile
to use with this project.- calendar_name : str, optional
(New in version v2.17) Only available for time series projects. The name of the
CalendarFile
used with this project.
-
classmethod
generate
(project_id, spec, max_wait=600)¶ Preview the full partitioning determined by a DatetimePartitioningSpecification
Based on the project dataset and the partitioning specification, inspect the full partitioning that would be used if the same specification were passed into
Project.set_target
.Parameters: - project_id : str
the id of the project
- spec : DatetimePartitioningSpec
the desired partitioning
- max_wait : int, optional
For some settings (e.g. generating a partitioning preview for a multiseries project for the first time), an asynchronous task must be run to analyze the dataset. max_wait governs the maximum time (in seconds) to wait before giving up. In all non-multiseries projects, this is unused.
Returns: - DatetimePartitioning :
the full generated partitioning
-
classmethod
get
(project_id)¶ Retrieve the DatetimePartitioning from a project
Only available if the project has already set the target as a datetime project.
Parameters: - project_id : str
the id of the project to retrieve partitioning for
Returns: - DatetimePartitioning : the full partitioning for the project
-
classmethod
feature_log_list
(project_id, offset=None, limit=None)¶ Retrieve the feature derivation log content and log length for a time series project.
The Time Series Feature Log provides details about the feature generation process for a time series project. It includes information about which features are generated and their priority, as well as the detected properties of the time series data such as whether the series is stationary, and periodicities detected.
This route is only supported for time series projects that have finished partitioning.
The feature derivation log will include information about:
- Detected stationarity of the series:e.g. ‘Series detected as non-stationary’
- Detected presence of multiplicative trend in the series:e.g. ‘Multiplicative trend detected’
- Detected presence of multiplicative trend in the series:e.g. ‘Detected periodicities: 7 day’
- Maximum number of feature to be generated:e.g. ‘Maximum number of feature to be generated is 1440’
- Window sizes used in rolling statistics / lag extractorse.g. ‘The window sizes chosen to be: 2 months(because the time step is 1 month and Feature Derivation Window is 2 months)’
- Features that are specified as known-in-advancee.g. ‘Variables treated as apriori: holiday’
- Details about why certain variables are transformed in the input datae.g. ‘Generating variable “y (log)” from “y” because multiplicative trendis detected’
- Details about features generated as timeseries features, and their prioritye.g. ‘Generating feature “date (actual)” from “date” (priority: 1)’
Parameters: - project_id : str
project id to retrieve a feature derivation log for.
- offset : int
optional, defaults is 0, this many results will be skipped.
- limit : int
optional, defaults to 100, at most this many results are returned. To specify
- no limit, use 0. The default may change without notice.
-
classmethod
feature_log_retrieve
(project_id)¶ Retrieve the feature derivation log content and log length for a time series project.
The Time Series Feature Log provides details about the feature generation process for a time series project. It includes information about which features are generated and their priority, as well as the detected properties of the time series data such as whether the series is stationary, and periodicities detected.
This route is only supported for time series projects that have finished partitioning.
The feature derivation log will include information about:
- Detected stationarity of the series:e.g. ‘Series detected as non-stationary’
- Detected presence of multiplicative trend in the series:e.g. ‘Multiplicative trend detected’
- Detected presence of multiplicative trend in the series:e.g. ‘Detected periodicities: 7 day’
- Maximum number of feature to be generated:e.g. ‘Maximum number of feature to be generated is 1440’
- Window sizes used in rolling statistics / lag extractorse.g. ‘The window sizes chosen to be: 2 months(because the time step is 1 month and Feature Derivation Window is 2 months)’
- Features that are specified as known-in-advancee.g. ‘Variables treated as apriori: holiday’
- Details about why certain variables are transformed in the input datae.g. ‘Generating variable “y (log)” from “y” because multiplicative trendis detected’
- Details about features generated as timeseries features, and their prioritye.g. ‘Generating feature “date (actual)” from “date” (priority: 1)’
Parameters: - project_id : str
project id to retrieve a feature derivation log for.
-
to_specification
(use_holdout_start_end_format=False, use_backtest_start_end_format=False)¶ Render the DatetimePartitioning as a
DatetimePartitioningSpecification
The resulting specification can be used when setting the target, and contains only the attributes directly controllable by users.
Parameters: - use_holdout_start_end_format : bool, optional
Defaults to
False
. IfTrue
, will useholdout_end_date
when configuring the holdout partition. IfFalse
, will useholdout_duration
instead.- use_backtest_start_end_format : bool, optional
Defaults to
False
. IfFalse
, will use a duration-based approach for specifying backtests (gap_duration
,validation_start_date
, andvalidation_duration
). IfTrue
, will use a start/end date approach for specifying backtests (primary_training_start_date
,primary_training_end_date
,validation_start_date
,validation_end_date
).
Returns: - DatetimePartitioningSpecification
the specification for this partitioning
-
to_dataframe
()¶ Render the partitioning settings as a dataframe for convenience of display
Excludes project_id, datetime_partition_column, date_format, autopilot_data_selection_method, validation_duration, and number_of_backtests, as well as the row count information, if present.
Also excludes the time series specific parameters for use_time_series, default_to_known_in_advance, default_to_do_not_derive, and defining the feature derivation and forecast windows.
-
class
datarobot.helpers.partitioning_methods.
Backtest
(index=None, available_training_start_date=None, available_training_duration=None, available_training_row_count=None, available_training_end_date=None, primary_training_start_date=None, primary_training_duration=None, primary_training_row_count=None, primary_training_end_date=None, gap_start_date=None, gap_duration=None, gap_row_count=None, gap_end_date=None, validation_start_date=None, validation_duration=None, validation_row_count=None, validation_end_date=None, total_row_count=None)¶ A backtest used to evaluate models trained in a datetime partitioned project
When setting up a datetime partitioning project, backtests are specified by a
BacktestSpecification
.The available training data corresponds to all the data available for training, while the primary training data corresponds to the data that can be used to train while ensuring that all backtests are available. If a model is trained with more data than is available in the primary training data, then all backtests may not have scores available.
Attributes: - index : int
the index of the backtest
- available_training_start_date : datetime.datetime
the start date of the available training data for this backtest
- available_training_duration : str
the duration of available training data for this backtest
- available_training_row_count : int or None
the number of rows of available training data for this backtest. Only available when retrieving from a project where the target is set.
- available_training_end_date : datetime.datetime
the end date of the available training data for this backtest
- primary_training_start_date : datetime.datetime
the start date of the primary training data for this backtest
- primary_training_duration : str
the duration of the primary training data for this backtest
- primary_training_row_count : int or None
the number of rows of primary training data for this backtest. Only available when retrieving from a project where the target is set.
- primary_training_end_date : datetime.datetime
the end date of the primary training data for this backtest
- gap_start_date : datetime.datetime
the start date of the gap between training and validation scoring data for this backtest
- gap_duration : str
the duration of the gap between training and validation scoring data for this backtest
- gap_row_count : int or None
the number of rows in the gap between training and validation scoring data for this backtest. Only available when retrieving from a project where the target is set.
- gap_end_date : datetime.datetime
the end date of the gap between training and validation scoring data for this backtest
- validation_start_date : datetime.datetime
the start date of the validation scoring data for this backtest
- validation_duration : str
the duration of the validation scoring data for this backtest
- validation_row_count : int or None
the number of rows of validation scoring data for this backtest. Only available when retrieving from a project where the target is set.
- validation_end_date : datetime.datetime
the end date of the validation scoring data for this backtest
- total_row_count : int or None
the number of rows in this backtest. Only available when retrieving from a project where the target is set.
-
to_specification
(use_start_end_format=False)¶ Render this backtest as a
BacktestSpecification
.The resulting specification includes only the attributes users can directly control, not those indirectly determined by the project dataset.
Parameters: - use_start_end_format : bool
Default
False
. IfFalse
, will use a duration-based approach for specifying backtests (gap_duration
,validation_start_date
, andvalidation_duration
). IfTrue
, will use a start/end date approach for specifying backtests (primary_training_start_date
,primary_training_end_date
,validation_start_date
,validation_end_date
).
Returns: - BacktestSpecification
the specification for this backtest
-
to_dataframe
()¶ Render this backtest as a dataframe for convenience of display
Returns: - backtest_partitioning : pandas.Dataframe
the backtest attributes, formatted into a dataframe
-
datarobot.helpers.partitioning_methods.
construct_duration_string
(years=0, months=0, days=0, hours=0, minutes=0, seconds=0)¶ Construct a valid string representing a duration in accordance with ISO8601
A duration of six months, 3 days, and 12 hours could be represented as P6M3DT12H.
Parameters: - years : int
the number of years in the duration
- months : int
the number of months in the duration
- days : int
the number of days in the duration
- hours : int
the number of hours in the duration
- minutes : int
the number of minutes in the duration
- seconds : int
the number of seconds in the duration
Returns: - duration_string: str
The duration string, specified compatibly with ISO8601
PredictJob¶
-
datarobot.models.predict_job.
wait_for_async_predictions
(project_id, predict_job_id, max_wait=600)¶ Given a Project id and PredictJob id poll for status of process responsible for predictions generation until it’s finished
Parameters: - project_id : str
The identifier of the project
- predict_job_id : str
The identifier of the PredictJob
- max_wait : int, optional
Time in seconds after which predictions creation is considered unsuccessful
Returns: - predictions : pandas.DataFrame
Generated predictions.
Raises: - AsyncPredictionsGenerationError
Raised if status of fetched PredictJob object is
error
- AsyncTimeoutError
Predictions weren’t generated in time, specified by
max_wait
parameter
-
class
datarobot.models.
PredictJob
(data, completed_resource_url=None)¶ Tracks asynchronous work being done within a project
Attributes: - id : int
the id of the job
- project_id : str
the id of the project the job belongs to
- status : str
the status of the job - will be one of
datarobot.enums.QUEUE_STATUS
- job_type : str
what kind of work the job is doing - will be ‘predict’ for predict jobs
- is_blocked : bool
if true, the job is blocked (cannot be executed) until its dependencies are resolved
- message : str
a message about the state of the job, typically explaining why an error occurred
-
classmethod
from_job
(job)¶ Transforms a generic Job into a PredictJob
Parameters: - job: Job
A generic job representing a PredictJob
Returns: - predict_job: PredictJob
A fully populated PredictJob with all the details of the job
Raises: - ValueError:
If the generic Job was not a predict job, e.g. job_type != JOB_TYPE.PREDICT
-
classmethod
create
(model, sourcedata)¶ Note
Deprecated in v2.3 in favor of
Project.upload_dataset
andModel.request_predictions
. That workflow allows you to reuse the same dataset for predictions from multiple models within one project.Starts predictions generation for provided data using previously created model.
Parameters: - model : Model
Model to use for predictions generation
- sourcedata : str, file or pandas.DataFrame
Data to be used for predictions. If this parameter is a str, it can be either a path to a local file or raw file content. If using a file on disk, the filename must consist of ASCII characters only. The file must be a CSV, and cannot be compressed
Returns: - predict_job_id : str
id of created job, can be used as parameter to
PredictJob.get
orPredictJob.get_predictions
methods orwait_for_async_predictions
function
Raises: - InputNotUnderstoodError
If the parameter for sourcedata didn’t resolve into known data types
Examples
model = Model.get('p-id', 'l-id') predict_job = PredictJob.create(model, './data_to_predict.csv')
-
classmethod
get
(project_id, predict_job_id)¶ Fetches one PredictJob. If the job finished, raises PendingJobFinished exception.
Parameters: - project_id : str
The identifier of the project the model on which prediction was started belongs to
- predict_job_id : str
The identifier of the predict_job
Returns: - predict_job : PredictJob
The pending PredictJob
Raises: - PendingJobFinished
If the job being queried already finished, and the server is re-routing to the finished predictions.
- AsyncFailureError
Querying this resource gave a status code other than 200 or 303
-
classmethod
get_predictions
(project_id, predict_job_id, class_prefix='class_')¶ Fetches finished predictions from the job used to generate them.
Note
The prediction API for classifications now returns an additional prediction_values dictionary that is converted into a series of class_prefixed columns in the final dataframe. For example, <label> = 1.0 is converted to ‘class_1.0’. If you are on an older version of the client (prior to v2.8), you must update to v2.8 to correctly pivot this data.
Parameters: - project_id : str
The identifier of the project to which belongs the model used for predictions generation
- predict_job_id : str
The identifier of the predict_job
- class_prefix : str
The prefix to append to labels in the final dataframe (e.g., apple -> class_apple)
Returns: - predictions : pandas.DataFrame
Generated predictions
Raises: - JobNotFinished
If the job has not finished yet
- AsyncFailureError
Querying the predict_job in question gave a status code other than 200 or 303
-
cancel
()¶ Cancel this job. If this job has not finished running, it will be removed and canceled.
-
get_result
(params=None)¶ Parameters: - params : dict, optional
- Query parameters to be added to request to get results.
- For featureEffects and featureFit, source param is required to define source,
- otherwise the default is `training`
Returns: - result : object
- Return type depends on the job type:
- for model jobs, a Model is returned
- for predict jobs, a pandas.DataFrame (with predictions) is returned
- for featureImpact jobs, a list of dicts (see
Model.get_feature_impact
for more detail) - for primeRulesets jobs, a list of Rulesets
- for primeModel jobs, a PrimeModel
- for primeDownloadValidation jobs, a PrimeFile
- for reasonCodesInitialization jobs, a ReasonCodesInitialization
- for reasonCodes jobs, a ReasonCodes
- for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
- for predictionExplanations jobs, a PredictionExplanations
- for featureEffects, a FeatureEffects
- for featureFit, a FeatureFit
Raises: - JobNotFinished
If the job is not finished, the result is not available.
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
-
get_result_when_complete
(max_wait=600, params=None)¶ Parameters: - max_wait : int, optional
How long to wait for the job to finish.
- params : dict, optional
Query parameters to be added to request.
Returns: - result: object
Return type is the same as would be returned by Job.get_result.
Raises: - AsyncTimeoutError
If the job does not finish in time
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
-
refresh
()¶ Update this object with the latest job data from the server.
-
wait_for_completion
(max_wait=600)¶ Waits for job to complete.
Parameters: - max_wait : int, optional
How long to wait for the job to finish.
Prediction Dataset¶
-
class
datarobot.models.
PredictionDataset
(project_id, id, name, created, num_rows, num_columns, forecast_point=None, predictions_start_date=None, predictions_end_date=None, relax_known_in_advance_features_check=None, data_quality_warnings=None, forecast_point_range=None, data_start_date=None, data_end_date=None, max_forecast_date=None)¶ A dataset uploaded to make predictions
Typically created via project.upload_dataset
Attributes: - id : str
the id of the dataset
- project_id : str
the id of the project the dataset belongs to
- created : str
the time the dataset was created
- name : str
the name of the dataset
- num_rows : int
the number of rows in the dataset
- num_columns : int
the number of columns in the dataset
- forecast_point : datetime.datetime or None
For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series predictions documentation for more information.
- predictions_start_date : datetime.datetime or None, optional
For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_date : datetime.datetime or None, optional
For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.- relax_known_in_advance_features_check : bool, optional
(New in version v2.15) For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- data_quality_warnings : dict, optional
(New in version v2.15) A dictionary that contains available warnings about potential problems in this prediction dataset. Empty if no warnings.
- forecast_point_range : list[datetime.datetime] or None, optional
(New in version v2.20) For time series projects only. Specifies the range of dates available for use as a forecast point.
- data_start_date : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. The minimum primary date of this prediction dataset.
- data_end_date : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. The maximum primary date of this prediction dataset.
- max_forecast_date : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. The maximum forecast date of this prediction dataset.
-
classmethod
get
(project_id, dataset_id)¶ Retrieve information about a dataset uploaded for predictions
Parameters: - project_id:
the id of the project to query
- dataset_id:
the id of the dataset to retrieve
Returns: - dataset: PredictionDataset
A dataset uploaded to make predictions
-
delete
()¶ Delete a dataset uploaded for predictions
Will also delete predictions made using this dataset and cancel any predict jobs using this dataset.
Prediction Explanations¶
-
class
datarobot.
PredictionExplanationsInitialization
(project_id, model_id, prediction_explanations_sample=None)¶ Represents a prediction explanations initialization of a model.
Attributes: - project_id : str
id of the project the model belongs to
- model_id : str
id of the model the prediction explanations initialization is for
- prediction_explanations_sample : list of dict
a small sample of prediction explanations that could be generated for the model
-
classmethod
get
(project_id, model_id)¶ Retrieve the prediction explanations initialization for a model.
Prediction explanations initializations are a prerequisite for computing prediction explanations, and include a sample what the computed prediction explanations for a prediction dataset would look like.
Parameters: - project_id : str
id of the project the model belongs to
- model_id : str
id of the model the prediction explanations initialization is for
Returns: - prediction_explanations_initialization : PredictionExplanationsInitialization
The queried instance.
Raises: - ClientError (404)
If the project or model does not exist or the initialization has not been computed.
-
classmethod
create
(project_id, model_id)¶ Create a prediction explanations initialization for the specified model.
Parameters: - project_id : str
id of the project the model belongs to
- model_id : str
id of the model for which initialization is requested
Returns: - job : Job
an instance of created async job
-
delete
()¶ Delete this prediction explanations initialization.
-
class
datarobot.
PredictionExplanations
(id, project_id, model_id, dataset_id, max_explanations, num_columns, finish_time, prediction_explanations_location, threshold_low=None, threshold_high=None)¶ Represents prediction explanations metadata and provides access to computation results.
Examples
prediction_explanations = dr.PredictionExplanations.get(project_id, explanations_id) for row in prediction_explanations.get_rows(): print(row) # row is an instance of PredictionExplanationsRow
Attributes: - id : str
id of the record and prediction explanations computation result
- project_id : str
id of the project the model belongs to
- model_id : str
id of the model the prediction explanations are for
- dataset_id : str
id of the prediction dataset prediction explanations were computed for
- max_explanations : int
maximum number of prediction explanations to supply per row of the dataset
- threshold_low : float
the lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset
- threshold_high : float
the high threshold, above which a prediction must score in order for prediction explanations to be computed for a row in the dataset
- num_columns : int
the number of columns prediction explanations were computed for
- finish_time : float
timestamp referencing when computation for these prediction explanations finished
- prediction_explanations_location : str
where to retrieve the prediction explanations
-
classmethod
get
(project_id, prediction_explanations_id)¶ Retrieve a specific prediction explanations.
Parameters: - project_id : str
id of the project the explanations belong to
- prediction_explanations_id : str
id of the prediction explanations
Returns: - prediction_explanations : PredictionExplanations
The queried instance.
-
classmethod
create
(project_id, model_id, dataset_id, max_explanations=None, threshold_low=None, threshold_high=None)¶ Create prediction explanations for the specified dataset.
In order to create PredictionExplanations for a particular model and dataset, you must first:
- Compute feature impact for the model via
datarobot.Model.get_feature_impact()
- Compute a PredictionExplanationsInitialization for the model via
datarobot.PredictionExplanationsInitialization.create(project_id, model_id)
- Compute predictions for the model and dataset via
datarobot.Model.request_predictions(dataset_id)
threshold_high
andthreshold_low
are optional filters applied to speed up computation. When at least one is specified, only the selected outlier rows will have prediction explanations computed. Rows are considered to be outliers if their predicted value (in case of regression projects) or probability of being the positive class (in case of classification projects) is less thanthreshold_low
or greater thanthresholdHigh
. If neither is specified, prediction explanations will be computed for all rows.Parameters: - project_id : str
id of the project the model belongs to
- model_id : str
id of the model for which prediction explanations are requested
- dataset_id : str
id of the prediction dataset for which prediction explanations are requested
- threshold_low : float, optional
the lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset. If neither
threshold_high
northreshold_low
is specified, prediction explanations will be computed for all rows.- threshold_high : float, optional
the high threshold, above which a prediction must score in order for prediction explanations to be computed. If neither
threshold_high
northreshold_low
is specified, prediction explanations will be computed for all rows.- max_explanations : int, optional
the maximum number of prediction explanations to supply per row of the dataset, default: 3.
Returns: - job: Job
an instance of created async job
- Compute feature impact for the model via
-
classmethod
list
(project_id, model_id=None, limit=None, offset=None)¶ List of prediction explanations for a specified project.
Parameters: - project_id : str
id of the project to list prediction explanations for
- model_id : str, optional
if specified, only prediction explanations computed for this model will be returned
- limit : int or None
at most this many results are returned, default: no limit
- offset : int or None
this many results will be skipped, default: 0
Returns: - prediction_explanations : list[PredictionExplanations]
-
get_rows
(batch_size=None, exclude_adjusted_predictions=True)¶ Retrieve prediction explanations rows.
Parameters: - batch_size : int or None, optional
maximum number of prediction explanations rows to retrieve per request
- exclude_adjusted_predictions : bool
Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
Yields: - prediction_explanations_row : PredictionExplanationsRow
Represents prediction explanations computed for a prediction row.
-
get_all_as_dataframe
(exclude_adjusted_predictions=True)¶ Retrieve all prediction explanations rows and return them as a pandas.DataFrame.
Returned dataframe has the following structure:
- row_id : row id from prediction dataset
- prediction : the output of the model for this row
- adjusted_prediction : adjusted prediction values (only appears for projects that utilize prediction adjustments, e.g. projects with an exposure column)
- class_0_label : a class level from the target (only appears for classification projects)
- class_0_probability : the probability that the target is this class (only appears for classification projects)
- class_1_label : a class level from the target (only appears for classification projects)
- class_1_probability : the probability that the target is this class (only appears for classification projects)
- explanation_0_feature : the name of the feature contributing to the prediction for this explanation
- explanation_0_feature_value : the value the feature took on
- explanation_0_label : the output being driven by this explanation. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.
- explanation_0_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this explanation
- explanation_0_strength : the amount this feature’s value affected the prediction
- …
- explanation_N_feature : the name of the feature contributing to the prediction for this explanation
- explanation_N_feature_value : the value the feature took on
- explanation_N_label : the output being driven by this explanation. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.
- explanation_N_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this explanation
- explanation_N_strength : the amount this feature’s value affected the prediction
For classification projects, the server does not guarantee any ordering on the prediction values, however within this function we sort the values so that class_X corresponds to the same class from row to row.
Parameters: - exclude_adjusted_predictions : bool
Optional, defaults to True. Set this to False to include adjusted prediction values in the returned dataframe.
Returns: - dataframe: pandas.DataFrame
-
download_to_csv
(filename, encoding='utf-8', exclude_adjusted_predictions=True)¶ Save prediction explanations rows into CSV file.
Parameters: - filename : str or file object
path or file object to save prediction explanations rows
- encoding : string, optional
A string representing the encoding to use in the output file, defaults to ‘utf-8’
- exclude_adjusted_predictions : bool
Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
-
get_prediction_explanations_page
(limit=None, offset=None, exclude_adjusted_predictions=True)¶ Get prediction explanations.
If you don’t want use a generator interface, you can access paginated prediction explanations directly.
Parameters: - limit : int or None
the number of records to return, the server will use a (possibly finite) default if not specified
- offset : int or None
the number of records to skip, default 0
- exclude_adjusted_predictions : bool
Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
Returns: - prediction_explanations : PredictionExplanationsPage
-
delete
()¶ Delete these prediction explanations.
-
class
datarobot.models.prediction_explanations.
PredictionExplanationsRow
(row_id, prediction, prediction_values, prediction_explanations=None, adjusted_prediction=None, adjusted_prediction_values=None)¶ Represents prediction explanations computed for a prediction row.
Notes
PredictionValue
contains:label
: describes what this model output corresponds to. For regression projects, it is the name of the target feature. For classification projects, it is a level from the target feature.value
: the output of the prediction. For regression projects, it is the predicted value of the target. For classification projects, it is the predicted probability the row belongs to the class identified by the label.
PredictionExplanation
contains:label
: described what output was driven by this explanation. For regression projects, it is the name of the target feature. For classification projects, it is the class whose probability increasing would correspond to a positive strength of this prediction explanation.feature
: the name of the feature contributing to the predictionfeature_value
: the value the feature took on for this rowstrength
: the amount this feature’s value affected the predictionqualitative_strength
: a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’)
Attributes: - row_id : int
which row this
PredictionExplanationsRow
describes- prediction : float
the output of the model for this row
- adjusted_prediction : float or None
adjusted prediction value for projects that provide this information, None otherwise
- prediction_values : list
an array of dictionaries with a schema described as
PredictionValue
- adjusted_prediction_values : list
same as prediction_values but for adjusted predictions
- prediction_explanations : list
an array of dictionaries with a schema described as
PredictionExplanation
-
class
datarobot.models.prediction_explanations.
PredictionExplanationsPage
(id, count=None, previous=None, next=None, data=None, prediction_explanations_record_location=None, adjustment_method=None)¶ Represents a batch of prediction explanations received by one request.
Attributes: - id : str
id of the prediction explanations computation result
- data : list[dict]
list of raw prediction explanations; each row corresponds to a row of the prediction dataset
- count : int
total number of rows computed
- previous_page : str
where to retrieve previous page of prediction explanations, None if current page is the first
- next_page : str
where to retrieve next page of prediction explanations, None if current page is the last
- prediction_explanations_record_location : str
where to retrieve the prediction explanations metadata
- adjustment_method : str
Adjustment method that was applied to predictions, or ‘N/A’ if no adjustments were done.
-
classmethod
get
(project_id, prediction_explanations_id, limit=None, offset=0, exclude_adjusted_predictions=True)¶ Retrieve prediction explanations.
Parameters: - project_id : str
id of the project the model belongs to
- prediction_explanations_id : str
id of the prediction explanations
- limit : int or None
the number of records to return; the server will use a (possibly finite) default if not specified
- offset : int or None
the number of records to skip, default 0
- exclude_adjusted_predictions : bool
Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
Returns: - prediction_explanations : PredictionExplanationsPage
The queried instance.
Predictions¶
-
class
datarobot.models.
Predictions
(project_id, prediction_id, model_id=None, dataset_id=None, includes_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None)¶ Represents predictions metadata and provides access to prediction results.
Examples
List all predictions for a project
import datarobot as dr # Fetch all predictions for a project all_predictions = dr.Predictions.list(project_id) # Inspect all calculated predictions for predictions in all_predictions: print(predictions) # repr includes project_id, model_id, and dataset_id
Retrieve predictions by id
import datarobot as dr # Getting predictions by id predictions = dr.Predictions.get(project_id, prediction_id) # Dump actual predictions df = predictions.get_all_as_dataframe() print(df)
Attributes: - project_id : str
id of the project the model belongs to
- model_id : str
id of the model
- prediction_id : str
id of generated predictions
- includes_prediction_intervals : bool, optional
(New in v2.16) For time series projects only. Indicates if prediction intervals will be part of the response. Defaults to False.
- prediction_intervals_size : int, optional
(New in v2.16) For time series projects only. Indicates the percentile used for prediction intervals calculation. Will be present only if includes_prediction_intervals is True.
- forecast_point : datetime.datetime, optional
(New in v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_date : datetime.datetime or None, optional
(New in v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_date : datetime.datetime or None, optional
(New in v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.
-
classmethod
list
(project_id, model_id=None, dataset_id=None)¶ Fetch all the computed predictions metadata for a project.
Parameters: - project_id : str
id of the project
- model_id : str, optional
if specified, only predictions metadata for this model will be retrieved
- dataset_id : str, optional
if specified, only predictions metadata for this dataset will be retrieved
Returns: - A list of :py:class:`Predictions <datarobot.models.Predictions>` objects
-
classmethod
get
(project_id, prediction_id)¶ Retrieve the specific predictions metadata
Parameters: - project_id : str
id of the project the model belongs to
- prediction_id : str
id of the prediction set
Returns: - :py:class:`Predictions <datarobot.models.Predictions>` object representing specified
- predictions
-
get_all_as_dataframe
(class_prefix='class_')¶ Retrieve all prediction rows and return them as a pandas.DataFrame.
Parameters: - class_prefix : str, optional
The prefix to append to labels in the final dataframe. Default is
class_
(e.g., apple -> class_apple)
Returns: - dataframe: pandas.DataFrame
-
download_to_csv
(filename, encoding='utf-8')¶ Save prediction rows into CSV file.
Parameters: - filename : str or file object
path or file object to save prediction rows
- encoding : string, optional
A string representing the encoding to use in the output file, defaults to ‘utf-8’
PredictionServer¶
-
class
datarobot.
PredictionServer
(id=None, url=None, datarobot_key=None)¶ A prediction server can be used to make predictions
Attributes: - id : str
the id of the prediction server
- url : str
the url of the prediction server
- datarobot_key : str
the datarobot-key header used in requests to this prediction server
-
classmethod
list
()¶ Returns a list of prediction servers a user can use to make predictions.
New in version v2.17.
Returns: - prediction_servers : list of PredictionServer instances
Contains a list of prediction servers that can be used to make predictions.
Examples
prediction_servers = PredictionServer.list() prediction_servers >>> [PredictionServer('https://example.com')]
Ruleset¶
-
class
datarobot.models.
Ruleset
(project_id=None, parent_model_id=None, model_id=None, ruleset_id=None, rule_count=None, score=None)¶ Represents an approximation of a model with DataRobot Prime
Attributes: - id : str
the id of the ruleset
- rule_count : int
the number of rules used to approximate the model
- score : float
the validation score of the approximation
- project_id : str
the project the approximation belongs to
- parent_model_id : str
the model being approximated
- model_id : str or None
the model using this ruleset (if it exists). Will be None if no such model has been trained.
-
request_model
()¶ Request training for a model using this ruleset
Training a model using a ruleset is a necessary prerequisite for being able to download the code for a ruleset.
Returns: - job: Job
the job fitting the new Prime model
PrimeFile¶
-
class
datarobot.models.
PrimeFile
(id=None, project_id=None, parent_model_id=None, model_id=None, ruleset_id=None, language=None, is_valid=None)¶ Represents an executable file available for download of the code for a DataRobot Prime model
Attributes: - id : str
the id of the PrimeFile
- project_id : str
the id of the project this PrimeFile belongs to
- parent_model_id : str
the model being approximated by this PrimeFile
- model_id : str
the prime model this file represents
- ruleset_id : int
the ruleset being used in this PrimeFile
- language : str
the language of the code in this file - see enums.LANGUAGE for possibilities
- is_valid : bool
whether the code passed basic validation
-
download
(filepath)¶ Download the code and save it to a file
Parameters: - filepath: string
the location to save the file to
Project¶
-
class
datarobot.models.
Project
(id=None, project_name=None, mode=None, target=None, target_type=None, holdout_unlocked=None, metric=None, stage=None, partition=None, positive_class=None, created=None, advanced_options=None, recommender=None, max_train_pct=None, max_train_rows=None, scaleout_max_train_pct=None, scaleout_max_train_rows=None, file_name=None, feature_engineering_graphs=None, credentials=None, feature_engineering_prediction_point=None, unsupervised_mode=None, use_feature_discovery=None)¶ A project built from a particular training dataset
Attributes: - id : str
the id of the project
- project_name : str
the name of the project
- mode : int
the autopilot mode currently selected for the project - 0 for full autopilot, 1 for semi-automatic, and 2 for manual
- target : str
the name of the selected target features
- target_type : str
Indicating what kind of modeling is being done in this project Options are: ‘Regression’, ‘Binary’ (Binary classification), ‘Multiclass’ (Multiclass classification)
- holdout_unlocked : bool
whether the holdout has been unlocked
- metric : str
the selected project metric (e.g. LogLoss)
- stage : str
the stage the project has reached - one of
datarobot.enums.PROJECT_STAGE
- partition : dict
information about the selected partitioning options
- positive_class : str
for binary classification projects, the selected positive class; otherwise, None
- created : datetime
the time the project was created
- advanced_options : dict
information on the advanced options that were selected for the project settings, e.g. a weights column or a cap of the runtime of models that can advance autopilot stages
- recommender : dict
information on the recommender settings of the project (i.e. whether it is a recommender project, or the id columns)
- max_train_pct : float
the maximum percentage of the project dataset that can be used without going into the validation data or being too large to submit any blueprint for training
- max_train_rows : int
the maximum number of rows that can be trained on without going into the validation data or being too large to submit any blueprint for training
- scaleout_max_train_pct : float
the maximum percentage of the project dataset that can be used to successfully train a scaleout model without going into the validation data. May exceed max_train_pct, in which case only scaleout models can be trained up to this point.
- scaleout_max_train_rows : int
the maximum number of rows that can be used to successfully train a scaleout model without going into the validation data. May exceed max_train_rows, in which case only scaleout models can be trained up to this point.
- file_name : str
the name of the file uploaded for the project dataset
- feature_engineering_graphs: list, optional
information about feature engineering graph such as id of the graph and linkage_keys used to connect relationships in the graph.
- credentials : list, optional
a list of credentials for the feature engineering graphs.
- feature_engineering_prediction_point : str, optional
additional aim parameter
- unsupervised_mode : bool, optional
(New in version v2.20) defaults to False, indicates whether this is an unsupervised project.
-
classmethod
get
(project_id)¶ Gets information about a project.
Parameters: - project_id : str
The identifier of the project you want to load.
Returns: - project : Project
The queried project
Examples
import datarobot as dr p = dr.Project.get(project_id='54e639a18bd88f08078ca831') p.id >>>'54e639a18bd88f08078ca831' p.project_name >>>'Some project name'
-
classmethod
create
(sourcedata, project_name='Untitled Project', max_wait=600, read_timeout=600, dataset_filename=None)¶ Creates a project with provided data.
Project creation is asynchronous process, which means that after initial request we will keep polling status of async process that is responsible for project creation until it’s finished. For SDK users this only means that this method might raise exceptions related to it’s async nature.
Parameters: - sourcedata : basestring, file, pathlib.Path or pandas.DataFrame
Dataset to use for the project. If string can be either a path to a local file, url to publicly available file or raw file content. If using a file, the filename must consist of ASCII characters only.
- project_name : str, unicode, optional
The name to assign to the empty project.
- max_wait : int, optional
Time in seconds after which project creation is considered unsuccessful
- read_timeout: int
The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete
- dataset_filename : string or None, optional
(New in version v2.14) File name to use for dataset. Ignored for url and file path sources.
Returns: - project : Project
Instance with initialized data.
Raises: - InputNotUnderstoodError
Raised if sourcedata isn’t one of supported types.
- AsyncFailureError
Polling for status of async process resulted in response with unsupported status code. Beginning in version 2.1, this will be ProjectAsyncFailureError, a subclass of AsyncFailureError
- AsyncProcessUnsuccessfulError
Raised if project creation was unsuccessful
- AsyncTimeoutError
Raised if project creation took more time, than specified by
max_wait
parameter
Examples
p = Project.create('/home/datasets/somedataset.csv', project_name="New API project") p.id >>> '5921731dkqshda8yd28h' p.project_name >>> 'New API project'
-
classmethod
encrypted_string
(plaintext)¶ Sends a string to DataRobot to be encrypted
This is used for passwords that DataRobot uses to access external data sources
Parameters: - plaintext : str
The string to encrypt
Returns: - ciphertext : str
The encrypted string
-
classmethod
create_from_hdfs
(url, port=None, project_name=None, max_wait=600)¶ Create a project from a datasource on a WebHDFS server.
Parameters: - url : str
The location of the WebHDFS file, both server and full path. Per the DataRobot specification, must begin with hdfs://, e.g. hdfs:///tmp/10kDiabetes.csv
- port : int, optional
The port to use. If not specified, will default to the server default (50070)
- project_name : str, optional
A name to give to the project
- max_wait : int
The maximum number of seconds to wait before giving up.
Returns: - Project
Examples
p = Project.create_from_hdfs('hdfs:///tmp/somedataset.csv', project_name="New API project") p.id >>> '5921731dkqshda8yd28h' p.project_name >>> 'New API project'
-
classmethod
create_from_data_source
(data_source_id, username, password, project_name=None, max_wait=600)¶ Create a project from a data source. Either data_source or data_source_id should be specified.
Parameters: - data_source_id : str
the identifier of the data source.
- username : str
the username for database authentication.
- password : str
the password for database authentication. The password is encrypted at server side and never saved / stored.
- project_name : str, optional
optional, a name to give to the project.
- max_wait : int
optional, the maximum number of seconds to wait before giving up.
Returns: - Project
-
classmethod
create_from_dataset
(dataset_id, dataset_version_id=None, project_name=None, user=None, password=None, credential_id=None, use_kerberos=None)¶ Create a Project from a
datarobot.Dataset
Parameters: - dataset_id: string
The ID of the dataset entry to user for the project’s Dataset
- dataset_version_id: string, optional
The ID of the dataset version to use for the project dataset. If not specified - uses latest version associated with dataset_id
- project_name: string, optional
The name of the project to be created. If not specified, will be “Untitled Project” for database connections, otherwise the project name will be based on the file used.
- user: string, optional
The username for database authentication.
- password: string, optional
The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored
- credential_id: string, optional
The ID of the set of credentials to use instead of user and password.
- use_kerberos: bool, optional
Server default is False. If true, use kerberos authentication for database authentication.
Returns: - Project
-
classmethod
from_async
(async_location, max_wait=600)¶ Given a temporary async status location poll for no more than max_wait seconds until the async process (project creation or setting the target, for example) finishes successfully, then return the ready project
Parameters: - async_location : str
The URL for the temporary async status resource. This is returned as a header in the response to a request that initiates an async process
- max_wait : int
The maximum number of seconds to wait before giving up.
Returns: - project : Project
The project, now ready
Raises: - ProjectAsyncFailureError
If the server returned an unexpected response while polling for the asynchronous operation to resolve
- AsyncProcessUnsuccessfulError
If the final result of the asynchronous operation was a failure
- AsyncTimeoutError
If the asynchronous operation did not resolve within the time specified
-
classmethod
start
(sourcedata, target=None, project_name='Untitled Project', worker_count=None, metric=None, autopilot_on=True, blueprint_threshold=None, response_cap=None, partitioning_method=None, positive_class=None, target_type=None, unsupervised_mode=False, blend_best_models=None, prepare_model_for_deployment=None, scoring_code_only=None, min_secondary_validation_model_count=None)¶ Chain together project creation, file upload, and target selection.
Note
While this function provides a simple means to get started, it does not expose all possible parameters. For advanced usage, using
create
andset_target
directly is recommended.Parameters: - sourcedata : str or pandas.DataFrame
The path to the file to upload. Can be either a path to a local file or a publicly accessible URL (starting with
http://
,https://
,file://
, ors3://
). If the source is a DataFrame, it will be serialized to a temporary buffer. If using a file, the filename must consist of ASCII characters only.- target : str, optional
The name of the target column in the uploaded file. Should not be provided if
unsupervised_mode
isTrue
.- project_name : str
The project name.
Returns: - project : Project
The newly created and initialized project.
Other Parameters: - worker_count : int, optional
The number of workers that you want to allocate to this project.
- metric : str, optional
The name of metric to use.
- autopilot_on : boolean, default
True
Whether or not to begin modeling automatically.
- blueprint_threshold : int, optional
Number of hours the model is permitted to run. Minimum 1
- response_cap : float, optional
Quantile of the response distribution to use for response capping Must be in range 0.5 .. 1.0
- partitioning_method : PartitioningMethod object, optional
It should be one of PartitioningMethod object.
- positive_class : str, float, or int; optional
Specifies a level of the target column that should treated as the positive class for binary classification. May only be specified for binary classification targets.
- target_type : str, optional
Override the automaticially selected target_type. An example usage would be setting the target_type=’Mutliclass’ when you want to preform a multiclass classification task on a numeric column that has a low cardinality. You can use
TARGET_TYPE
enum.- unsupervised_mode : boolean, default
False
Specifies whether to create an unsupervised project.
- blend_best_models: bool, optional
blend best models during Autopilot run
- scoring_code_only: bool, optional
Keep only models that can be converted to scorable java code during Autopilot run.
- prepare_model_for_deployment: bool, optional
Prepare model for deployment during Autopilot run. The preparation includes creating reduced feature list models, retraining best model on higher sample size, computing insights and assigning “RECOMMENDED FOR DEPLOYMENT” label.
- min_secondary_validation_model_count: int, optional
Compute “All backtest” scores (datetime models) or cross validation scores for the specified number of highest ranking models on the Leaderboard, if over the Autopilot default.
Raises: - AsyncFailureError
Polling for status of async process resulted in response with unsupported status code
- AsyncProcessUnsuccessfulError
Raised if project creation or target setting was unsuccessful
- AsyncTimeoutError
Raised if project creation or target setting timed out
Examples
Project.start("./tests/fixtures/file.csv", "a_target", project_name="test_name", worker_count=4, metric="a_metric")
This is an example of using a URL to specify the datasource:
Project.start("https://example.com/data/file.csv", "a_target", project_name="test_name", worker_count=4, metric="a_metric")
-
classmethod
list
(search_params=None)¶ Returns the projects associated with this account.
Parameters: - search_params : dict, optional.
If not None, the returned projects are filtered by lookup. Currently you can query projects by:
project_name
Returns: - projects : list of Project instances
Contains a list of projects associated with this user account.
Raises: - TypeError
Raised if
search_params
parameter is provided, but is not of supported type.
Examples
List all projects .. code-block:: python
p_list = Project.list() p_list >>> [Project(‘Project One’), Project(‘Two’)]Search for projects by name .. code-block:: python
Project.list(search_params={‘project_name’: ‘red’}) >>> [Project(‘Predtime’), Project(‘Fred Project’)]
-
refresh
()¶ Fetches the latest state of the project, and updates this object with that information. This is an inplace update, not a new object.
Returns: - self : Project
the now-updated project
-
delete
()¶ Removes this project from your account.
-
set_target
(target=None, mode='auto', metric=None, quickrun=None, worker_count=None, positive_class=None, partitioning_method=None, featurelist_id=None, advanced_options=None, max_wait=600, target_type=None, feature_engineering_graphs=None, credentials=None, feature_engineering_prediction_point=None, unsupervised_mode=False)¶ Set target variable of an existing project and begin the autopilot process (unless manual mode is specified).
Target setting is asynchronous process, which means that after initial request we will keep polling status of async process that is responsible for target setting until it’s finished. For SDK users this only means that this method might raise exceptions related to it’s async nature.
When execution returns to the caller, the autopilot process will already have commenced (again, unless manual mode is specified).
Parameters: - target : str, optional
The name of the target column in the uploaded file. Should not be provided if
unsupervised_mode
isTrue
.- mode : str, optional
You can use
AUTOPILOT_MODE
enum to choose betweenAUTOPILOT_MODE.FULL_AUTO
AUTOPILOT_MODE.MANUAL
AUTOPILOT_MODE.QUICK
If unspecified,
FULL_AUTO
is used. If theMANUAL
value is used, the model creation process will need to be started by executing thestart_autopilot
function with the desired featurelist. It will start immediately otherwise.- metric : str, optional
Name of the metric to use for evaluating models. You can query the metrics available for the target by way of
Project.get_metrics
. If none is specified, then the default recommended by DataRobot is used.- quickrun : bool, optional
Deprecated - pass
AUTOPILOT_MODE.QUICK
as mode instead. Sets whether project should be run inquick run
mode. This setting causes DataRobot to recommend a more limited set of models in order to get a base set of models and insights more quickly.- worker_count : int, optional
The number of concurrent workers to request for this project. If None, then the default is used. (New in version v2.14) Setting this to -1 will request the maximum number available to your account.
- partitioning_method : PartitioningMethod object, optional
It should be one of PartitioningMethod object.
- positive_class : str, float, or int; optional
Specifies a level of the target column that should treated as the positive class for binary classification. May only be specified for binary classification targets.
- featurelist_id : str, optional
Specifies which feature list to use.
- advanced_options : AdvancedOptions, optional
Used to set advanced options of project creation.
- max_wait : int, optional
Time in seconds after which target setting is considered unsuccessful.
- target_type : str, optional
Override the automatically selected target_type. An example usage would be setting the target_type=’Mutliclass’ when you want to preform a multiclass classification task on a numeric column that has a low cardinality. You can use
TARGET_TYPE
enum.- feature_engineering_graphs: list, optional
information about feature engineering graph such as id of the graph and linkage_keys used to connect relationships in the graph.
- credentials: list, optional,
a list of credentials for the feature engineering graphs.
- feature_engineering_prediction_point : str, optional
additional aim parameter.
- unsupervised_mode : boolean, default
False
(New in version v2.20) Specifies whether to create an unsupervised project. If
True
,target
may not be provided.
Returns: - project : Project
The instance with updated attributes.
Raises: - AsyncFailureError
Polling for status of async process resulted in response with unsupported status code
- AsyncProcessUnsuccessfulError
Raised if target setting was unsuccessful
- AsyncTimeoutError
Raised if target setting took more time, than specified by
max_wait
parameter- TypeError
Raised if
advanced_options
,partitioning_method
ortarget_type
is provided, but is not of supported type
See also
datarobot.models.Project.start
- combines project creation, file upload, and target selection. Provides fewer options, but is useful for getting started quickly.
-
get_models
(order_by=None, search_params=None, with_metric=None)¶ List all completed, successful models in the leaderboard for the given project.
Parameters: - order_by : str or list of strings, optional
If not None, the returned models are ordered by this attribute. If None, the default return is the order of default project metric.
Allowed attributes to sort by are:
metric
sample_pct
If the sort attribute is preceded by a hyphen, models will be sorted in descending order, otherwise in ascending order.
Multiple sort attributes can be included as a comma-delimited string or in a list e.g. order_by=`sample_pct,-metric` or order_by=[sample_pct, -metric]
Using metric to sort by will result in models being sorted according to their validation score by how well they did according to the project metric.
- search_params : dict, optional.
If not None, the returned models are filtered by lookup. Currently you can query models by:
name
sample_pct
is_starred
- with_metric : str, optional.
If not None, the returned models will only have scores for this metric. Otherwise all the metrics are returned.
Returns: - models : a list of Model instances.
All of the models that have been trained in this project.
Raises: - TypeError
Raised if
order_by
orsearch_params
parameter is provided, but is not of supported type.
Examples
Project.get('pid').get_models(order_by=['-sample_pct', 'metric']) # Getting models that contain "Ridge" in name # and with sample_pct more than 64 Project.get('pid').get_models( search_params={ 'sample_pct__gt': 64, 'name': "Ridge" }) # Filtering models based on 'starred' flag: Project.get('pid').get_models(search_params={'is_starred': True})
-
get_datetime_models
()¶ List all models in the project as DatetimeModels
Requires the project to be datetime partitioned. If it is not, a ClientError will occur.
Returns: - models : list of DatetimeModel
the datetime models
-
get_prime_models
()¶ List all DataRobot Prime models for the project Prime models were created to approximate a parent model, and have downloadable code.
Returns: - models : list of PrimeModel
-
get_prime_files
(parent_model_id=None, model_id=None)¶ List all downloadable code files from DataRobot Prime for the project
Parameters: - parent_model_id : str, optional
Filter for only those prime files approximating this parent model
- model_id : str, optional
Filter for only those prime files with code for this prime model
Returns: - files: list of PrimeFile
-
get_datasets
()¶ List all the datasets that have been uploaded for predictions
Returns: - datasets : list of PredictionDataset instances
-
upload_dataset
(sourcedata, max_wait=600, read_timeout=600, forecast_point=None, predictions_start_date=None, predictions_end_date=None, dataset_filename=None, relax_known_in_advance_features_check=None, credentials=None)¶ Upload a new dataset to make predictions against
Parameters: - sourcedata : str, file or pandas.DataFrame
Data to be used for predictions. If string, can be either a path to a local file, a publicly accessible URL (starting with
http://
,https://
,file://
, ors3://
), or raw file content. If using a file on disk, the filename must consist of ASCII characters only.- max_wait : int, optional
The maximum number of seconds to wait for the uploaded dataset to be processed before raising an error.
- read_timeout : int, optional
The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete
- forecast_point : datetime.datetime or None, optional
(New in version v2.8) May only be specified for time series projects, otherwise the upload will be rejected. The time in the dataset relative to which predictions should be generated in a time series project. See the Time Series documentation for more information. If not provided, will default to using the latest forecast point in the dataset.
- predictions_start_date : datetime.datetime or None, optional
(New in version v2.11) May only be specified for time series projects. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Cannot be provided with theforecast_point
parameter.- predictions_end_date : datetime.datetime or None, optional
(New in version v2.11) May only be specified for time series projects. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Cannot be provided with theforecast_point
parameter.- dataset_filename : string or None, optional
(New in version v2.14) File name to use for the dataset. Ignored for url and file path sources.
- relax_known_in_advance_features_check : bool, optional
(New in version v2.15) For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- credentials: list, optional, a list of credentials for the feature engineering graphs used
in Feature discovery project
- Returns
- ——-
- dataset : PredictionDataset
The newly uploaded dataset.
Raises: - InputNotUnderstoodError
Raised if
sourcedata
isn’t one of supported types.- AsyncFailureError
Raised if polling for the status of an async process resulted in a response with an unsupported status code.
- AsyncProcessUnsuccessfulError
Raised if project creation was unsuccessful (i.e. the server reported an error in uploading the dataset).
- AsyncTimeoutError
Raised if processing the uploaded dataset took more time than specified by the
max_wait
parameter.- ValueError
Raised if
forecast_point
orpredictions_start_date
andpredictions_end_date
are provided, but are not of the supported type.
-
upload_dataset_from_data_source
(data_source_id, username, password, max_wait=600, forecast_point=None, relax_known_in_advance_features_check=None, credentials=None, predictions_start_date=None, predictions_end_date=None)¶ Upload a new dataset from a data source to make predictions against
Parameters: - data_source_id : str
The identifier of the data source.
- username : str
The username for database authentication.
- password : str
The password for database authentication. The password is encrypted at server side and never saved / stored.
- max_wait : int, optional
Optional, the maximum number of seconds to wait before giving up.
- forecast_point : datetime.datetime or None, optional
(New in version v2.8) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- relax_known_in_advance_features_check : bool, optional
(New in version v2.15) For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- credentials: list, optional, a list of credentials for the feature engineering graphs used
in Feature discovery project
- predictions_start_date : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_date : datetime.datetime or None, optional
(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.
Returns: - dataset : PredictionDataset
the newly uploaded dataset
-
get_blueprints
()¶ List all blueprints recommended for a project.
Returns: - menu : list of Blueprint instances
All the blueprints recommended by DataRobot for a project
-
get_features
()¶ List all features for this project
Returns: - list of Feature
all features for this project
-
get_modeling_features
(batch_size=None)¶ List all modeling features for this project
Only available once the target and partitioning settings have been set. For more information on the distinction between input and modeling features, see the time series documentation<input_vs_modeling>.
Parameters: - batch_size : int, optional
The number of features to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of features. If not specified, an appropriate default will be chosen by the server.
Returns: - list of ModelingFeature
All modeling features in this project
-
get_featurelists
()¶ List all featurelists created for this project
Returns: - list of Featurelist
all featurelists created for this project
-
get_associations
(assoc_type, metric, featurelist_id=None)¶ Get the association statistics and metadata for a project’s informative features
New in version v2.17.
Parameters: - assoc_type : string or None
the type of association, must be either ‘association’ or ‘correlation’
- metric : string or None
the specified association metric, belongs under either association or correlation umbrella
- featurelist_id : string or None
the desired featurelist for which to get association statistics (New in version v2.19)
Returns: - association_data : dict
pairwise metric strength data, clustering data, and ordering data for Feature Association Matrix visualization
-
get_association_featurelists
()¶ List featurelists and get feature association status for each
New in version v2.19.
Returns: - feature_lists : dict
dict with ‘featurelists’ as key, with list of featurelists as values
-
get_association_matrix_details
(feature1, feature2)¶ Get a sample of the actual values used to measure the association between a pair of features
New in version v2.17.
Parameters: - feature1 : str
Feature name for the first feature of interest
- feature2 : str
Feature name for the second feature of interest
Returns: - dict
This data has 3 keys: features, values, and types
- values : list
a list of triplet lists e.g. {“values”: [[460.0, 428.5, 0.001], [1679.3, 259.0, 0.001], …] The first entry of each list is a value of feature1, the second entry of each list is a value of feature2, and the third is the relative frequency of the pair of datapoints in the sample.
- features : list of the passed features, [feature1, feature2]
- types : list of the passed features’ types inferred by DataRobot, e.g. [‘N’, ‘N’]
-
get_modeling_featurelists
(batch_size=None)¶ List all modeling featurelists created for this project
Modeling featurelists can only be created after the target and partitioning options have been set for a project. In time series projects, these are the featurelists that can be used for modeling; in other projects, they behave the same as regular featurelists.
See the time series documentation for more information.
Parameters: - batch_size : int, optional
The number of featurelists to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of features. If not specified, an appropriate default will be chosen by the server.
Returns: - list of ModelingFeaturelist
all modeling featurelists in this project
-
create_type_transform_feature
(name, parent_name, variable_type, replacement=None, date_extraction=None, max_wait=600)¶ Create a new feature by transforming the type of an existing feature in the project
Note that only the following transformations are supported:
- Text to categorical or numeric
- Categorical to text or numeric
- Numeric to categorical
- Date to categorical or numeric
Note
Special considerations when casting numeric to categorical
There are two parameters which can be used for
variableType
to convert numeric data to categorical levels. These differ in the assumptions they make about the input data, and are very important when considering the data that will be used to make predictions. The assumptions that each makes are:categorical
: The data in the column is all integral, and there are no missing values. If either of these conditions do not hold in the training set, the transformation will be rejected. During predictions, if any of the values in the parent column are missing, the predictions will errorcategoricalInt
: New in v2.6 All of the data in the column should be considered categorical in its string form when cast to an int by truncation. For example the value3
will be cast as the string3
and the value3.14
will also be cast as the string3
. Further, the value-3.6
will become the string-3
. Missing values will still be recognized as missing.
For convenience these are represented in the enum
VARIABLE_TYPE_TRANSFORM
with the namesCATEGORICAL
andCATEGORICAL_INT
Parameters: - name : str
The name to give to the new feature
- parent_name : str
The name of the feature to transform
- variable_type : str
The type the new column should have. See the values within
datarobot.enums.VARIABLE_TYPE_TRANSFORM
- replacement : str or float, optional
The value that missing or unconverable data should have
- date_extraction : str, optional
Must be specified when parent_name is a date column (and left None otherwise). Specifies which value from a date should be extracted. See the list of values in
datarobot.enums.DATE_EXTRACTION
- max_wait : int, optional
The maximum amount of time to wait for DataRobot to finish processing the new column. This process can take more time with more data to process. If this operation times out, an AsyncTimeoutError will occur. DataRobot continues the processing and the new column may successfully be constructed.
Returns: - Feature
The data of the new Feature
Raises: - AsyncFailureError
If any of the responses from the server are unexpected
- AsyncProcessUnsuccessfulError
If the job being waited for has failed or has been cancelled
- AsyncTimeoutError
If the resource did not resolve in time
-
create_featurelist
(name, features)¶ Creates a new featurelist
Parameters: - name : str
The name to give to this new featurelist. Names must be unique, so an error will be returned from the server if this name has already been used in this project.
- features : list of str
The names of the features. Each feature must exist in the project already.
Returns: - Featurelist
newly created featurelist
Raises: - DuplicateFeaturesError
Raised if features variable contains duplicate features
Examples
project = Project.get('5223deadbeefdeadbeef0101') flists = project.get_featurelists() # Create a new featurelist using a subset of features from an # existing featurelist flist = flists[0] features = flist.features[::2] # Half of the features new_flist = project.create_featurelist(name='Feature Subset', features=features)
-
create_modeling_featurelist
(name, features)¶ Create a new modeling featurelist
Modeling featurelists can only be created after the target and partitioning options have been set for a project. In time series projects, these are the featurelists that can be used for modeling; in other projects, they behave the same as regular featurelists.
See the time series documentation for more information.
Parameters: - name : str
the name of the modeling featurelist to create. Names must be unique within the project, or the server will return an error.
- features : list of str
the names of the features to include in the modeling featurelist. Each feature must be a modeling feature.
Returns: - featurelist : ModelingFeaturelist
the newly created featurelist
Examples
project = Project.get('1234deadbeeffeeddead4321') modeling_features = project.get_modeling_features() selected_features = [feat.name for feat in modeling_features][:5] # select first five new_flist = project.create_modeling_featurelist('Model This', selected_features)
-
get_metrics
(feature_name)¶ Get the metrics recommended for modeling on the given feature.
Parameters: - feature_name : str
The name of the feature to query regarding which metrics are recommended for modeling.
Returns: - feature_name: str
The name of the feature that was looked up
- available_metrics: list of str
An array of strings representing the appropriate metrics. If the feature cannot be selected as the target, then this array will be empty.
- metric_details: list of dict
The list of metricDetails objects
- metric_name: str
Name of the metric
- supports_timeseries: boolean
This metric is valid for timeseries
- supports_multiclass: boolean
This metric is valid for mutliclass classifciaton
- supports_binary: boolean
This metric is valid for binary classifciaton
- supports_regression: boolean
This metric is valid for regression
- ascending: boolean
Should the metric be sorted in ascending order
-
get_status
()¶ Query the server for project status.
Returns: - status : dict
Contains:
autopilot_done
: a boolean.stage
: a short string indicating which stage the project is in.stage_description
: a description of whatstage
means.
Examples
{"autopilot_done": False, "stage": "modeling", "stage_description": "Ready for modeling"}
-
pause_autopilot
()¶ Pause autopilot, which stops processing the next jobs in the queue.
Returns: - paused : boolean
Whether the command was acknowledged
-
unpause_autopilot
()¶ Unpause autopilot, which restarts processing the next jobs in the queue.
Returns: - unpaused : boolean
Whether the command was acknowledged.
-
start_autopilot
(featurelist_id)¶ Starts autopilot on provided featurelist, halting the current autopilot run. Will raise an error if autopilot has already started on this featurelist (whether via
start_autopilot
orset_target
.Only one autopilot can be running at the time. That’s why any ongoing autopilot on a different featurelist will be halted - modeling jobs in queue would not be affected but new jobs would not be added to queue by the halted autopilot.
Parameters: - featurelist_id : str
Identifier of featurelist that should be used for autopilot
Raises: - AppPlatformError
Raised if autopilot is currently running on or has already finished running on the provided featurelist. Also raised if project’s target was not selected.
-
train
(trainable, sample_pct=None, featurelist_id=None, source_project_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)¶ Submit a job to the queue to train a model.
Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.
In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.
Note
If the project uses datetime partitioning, use
Project.train_datetime
instead.Parameters: - trainable : str or Blueprint
For
str
, this is assumed to be a blueprint_id. If nosource_project_id
is provided, theproject_id
will be assumed to be the project that this instance represents.Otherwise, for a
Blueprint
, it contains the blueprint_id and source_project_id that we want to use.featurelist_id
will assume the default for this project if not provided, andsample_pct
will default to using the maximum training value allowed for this project’s partition setup.source_project_id
will be ignored if aBlueprint
instance is used for this parameter- sample_pct : float, optional
The amount of data to use for training, as a percentage of the project dataset from 0 to 100.
- featurelist_id : str, optional
The identifier of the featurelist to use. If not defined, the default for this project is used.
- source_project_id : str, optional
Which project created this blueprint_id. If
None
, it defaults to looking in this project. Note that you must have read permissions in this project.- scoring_type : str, optional
Either
SCORING_TYPE.validation
orSCORING_TYPE.cross_validation
.SCORING_TYPE.validation
is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning,SCORING_TYPE.cross_validation
can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.- training_row_count : int, optional
The number of rows to use to train the requested model.
- monotonic_increasing_featurelist_id : str, optional
(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_id : str, optional
(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.
Returns: - model_job_id : str
id of created job, can be used as parameter to
ModelJob.get
method orwait_for_async_model_creation
function
Examples
Use a
Blueprint
instance:blueprint = project.get_blueprints()[0] model_job_id = project.train(blueprint, training_row_count=project.max_train_rows)
Use a
blueprint_id
, which is a string. In the first case, it is assumed that the blueprint was created by this project. If you are using a blueprint used by another project, you will need to pass the id of that other project as well.blueprint_id = 'e1c7fc29ba2e612a72272324b8a842af' project.train(blueprint, training_row_count=project.max_train_rows) another_project.train(blueprint, source_project_id=project.id)
You can also easily use this interface to train a new model using the data from an existing model:
model = project.get_models()[0] model_job_id = project.train(model.blueprint.id, sample_pct=100)
-
train_datetime
(blueprint_id, featurelist_id=None, training_row_count=None, training_duration=None, source_project_id=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False)¶ Create a new model in a datetime partitioned project
If the project is not datetime partitioned, an error will occur.
Parameters: - blueprint_id : str
the blueprint to use to train the model
- featurelist_id : str, optional
the featurelist to use to train the model. If not specified, the project default will be used.
- training_row_count : int, optional
the number of rows of data that should be used to train the model. If specified, neither
training_duration
noruse_project_settings
may be specified.- training_duration : str, optional
a duration string specifying what time range the data used to train the model should span. If specified, neither
training_row_count
noruse_project_settings
may be specified.- use_project_settings : bool, optional
(New in version v2.20) defaults to
False
. IfTrue
, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neithertraining_row_count
nortraining_duration
may be specified.- source_project_id : str, optional
the id of the project this blueprint comes from, if not this project. If left unspecified, the blueprint must belong to this project.
- monotonic_increasing_featurelist_id : str, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_id : str, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.
Returns: - job : ModelJob
the created job to build the model
-
blend
(model_ids, blender_method)¶ Submit a job for creating blender model. Upon success, the new job will be added to the end of the queue.
Parameters: - model_ids : list of str
List of model ids that will be used to create blender. These models should have completed validation stage without errors, and can’t be blenders, DataRobot Prime or scaleout models.
- blender_method : str
Chosen blend method, one from
datarobot.enums.BLENDER_METHOD
. If this is a time series project, only methods indatarobot.enums.TS_BLENDER_METHOD
are allowed.
Returns: - model_job : ModelJob
New
ModelJob
instance for the blender creation job in queue.
See also
datarobot.models.Project.check_blendable
- to confirm if models can be blended
-
check_blendable
(model_ids, blender_method)¶ Check if the specified models can be successfully blended
Parameters: - model_ids : list of str
List of model ids that will be used to create blender. These models should have completed validation stage without errors, and can’t be blenders, DataRobot Prime or scaleout models.
- blender_method : str
Chosen blend method, one from
datarobot.enums.BLENDER_METHOD
. If this is a time series project, only methods indatarobot.enums.TS_BLENDER_METHOD
are allowed.
Returns: - :class:`EligibilityResult <datarobot.helpers.eligibility_result.EligibilityResult>`
-
get_all_jobs
(status=None)¶ Get a list of jobs
This will give Jobs representing any type of job, including modeling or predict jobs.
Parameters: - status : QUEUE_STATUS enum, optional
If called with QUEUE_STATUS.INPROGRESS, will return the jobs that are currently running.
If called with QUEUE_STATUS.QUEUE, will return the jobs that are waiting to be run.
If called with QUEUE_STATUS.ERROR, will return the jobs that have errored.
If no value is provided, will return all jobs currently running or waiting to be run.
Returns: - jobs : list
Each is an instance of Job
-
get_blenders
()¶ Get a list of blender models.
Returns: - list of BlenderModel
list of all blender models in project.
-
get_frozen_models
()¶ Get a list of frozen models
Returns: - list of FrozenModel
list of all frozen models in project.
-
get_model_jobs
(status=None)¶ Get a list of modeling jobs
Parameters: - status : QUEUE_STATUS enum, optional
If called with QUEUE_STATUS.INPROGRESS, will return the modeling jobs that are currently running.
If called with QUEUE_STATUS.QUEUE, will return the modeling jobs that are waiting to be run.
If called with QUEUE_STATUS.ERROR, will return the modeling jobs that have errored.
If no value is provided, will return all modeling jobs currently running or waiting to be run.
Returns: - jobs : list
Each is an instance of ModelJob
-
get_predict_jobs
(status=None)¶ Get a list of prediction jobs
Parameters: - status : QUEUE_STATUS enum, optional
If called with QUEUE_STATUS.INPROGRESS, will return the prediction jobs that are currently running.
If called with QUEUE_STATUS.QUEUE, will return the prediction jobs that are waiting to be run.
If called with QUEUE_STATUS.ERROR, will return the prediction jobs that have errored.
If called without a status, will return all prediction jobs currently running or waiting to be run.
Returns: - jobs : list
Each is an instance of PredictJob
-
wait_for_autopilot
(check_interval=20.0, timeout=86400, verbosity=1)¶ Blocks until autopilot is finished. This will raise an exception if the autopilot mode is changed from AUTOPILOT_MODE.FULL_AUTO.
It makes API calls to sync the project state with the server and to look at which jobs are enqueued.
Parameters: - check_interval : float or int
The maximum time (in seconds) to wait between checks for whether autopilot is finished
- timeout : float or int or None
After this long (in seconds), we give up. If None, never timeout.
- verbosity:
This should be VERBOSITY_LEVEL.SILENT or VERBOSITY_LEVEL.VERBOSE. For VERBOSITY_LEVEL.SILENT, nothing will be displayed about progress. For VERBOSITY_LEVEL.VERBOSE, the number of jobs in progress or queued is shown. Note that new jobs are added to the queue along the way.
Raises: - AsyncTimeoutError
If autopilot does not finished in the amount of time specified
- RuntimeError
If a condition is detected that indicates that autopilot will not complete on its own
-
rename
(project_name)¶ Update the name of the project.
Parameters: - project_name : str
The new name
-
unlock_holdout
()¶ Unlock the holdout for this project.
This will cause subsequent queries of the models of this project to contain the metric values for the holdout set, if it exists.
Take care, as this cannot be undone. Remember that best practice is to select a model before analyzing the model performance on the holdout set
-
set_worker_count
(worker_count)¶ Sets the number of workers allocated to this project.
Note that this value is limited to the number allowed by your account. Lowering the number will not stop currently running jobs, but will cause the queue to wait for the appropriate number of jobs to finish before attempting to run more jobs.
Parameters: - worker_count : int
The number of concurrent workers to request from the pool of workers. (New in version v2.14) Setting this to -1 will update the number of workers to the maximum available to your account.
-
get_leaderboard_ui_permalink
()¶ Returns: - url : str
Permanent static hyperlink to a project leaderboard.
-
open_leaderboard_browser
()¶ Opens project leaderboard in web browser.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
-
get_rating_table_models
()¶ Get a list of models with a rating table
Returns: - list of RatingTableModel
list of all models with a rating table in project.
-
get_rating_tables
()¶ Get a list of rating tables
Returns: - list of RatingTable
list of rating tables in project.
-
get_access_list
()¶ Retrieve users who have access to this project and their access levels
New in version v2.15.
Returns: - list of :class:`SharingAccess <datarobot.SharingAccess>`
Modify the ability of users to access this project
New in version v2.15.
Parameters: - access_list : list of
SharingAccess
the modifications to make.
Raises: - datarobot.ClientError :
if you do not have permission to share this project, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the project without an owner
Examples
Transfer access to the project from old_user@datarobot.com to new_user@datarobot.com
import datarobot as dr new_access = dr.SharingAccess(new_user@datarobot.com, dr.enums.SHARING_ROLE.OWNER, can_share=True) access_list = [dr.SharingAccess(old_user@datarobot.com, None), new_access] dr.Project.get('my-project-id').share(access_list)
- access_list : list of
-
batch_features_type_transform
(parent_names, variable_type, prefix=None, suffix=None, max_wait=600)¶ Create new features by transforming the type of existing ones.
New in version v2.17.
Note
The following transformations are only supported in batch mode:
- Text to categorical or numeric
- Categorical to text or numeric
- Numeric to categorical
See here for special considerations when casting numeric to categorical. Date to categorical or numeric transformations are not currently supported for batch mode but can be performed individually using
create_type_transform_feature
.Parameters: - parent_names : list
The list of variable names to be transformed.
- variable_type : str
The type new columns should have. Can be one of ‘CATEGORICAL’, ‘CATEGORICAL_INT’, ‘NUMERIC’, and ‘TEXT’ - supported values can be found in
datarobot.enums.VARIABLE_TYPE_TRANSFORM
.- prefix : str, optional
Note
Either
prefix
,suffix
, or both must be provided.The string that will preface all feature names. At least one of
prefix
andsuffix
must be specified.- suffix : str, optional
Note
Either
prefix
,suffix
, or both must be provided.The string that will be appended at the end to all feature names. At least one of
prefix
andsuffix
must be specified.- max_wait : int, optional
The maximum amount of time to wait for DataRobot to finish processing the new column. This process can take more time with more data to process. If this operation times out, an AsyncTimeoutError will occur. DataRobot continues the processing and the new column may successfully be constructed.
Returns: - list of Features
all features for this project after transformation.
Raises: - TypeError:
If parent_names is not a list.
- ValueError
If value of
variable_type
is not fromdatarobot.enums.VARIABLE_TYPE_TRANSFORM
.- AsyncFailureError`
If any of the responses from the server are unexpected.
- AsyncProcessUnsuccessfulError
If the job being waited for has failed or has been cancelled.
- AsyncTimeoutError
If the resource did not resolve in time.
-
clone_project
(new_project_name=None, max_wait=600)¶ Create a fresh (post-EDA1) copy of this project that is ready for setting targets and modeling options.
Parameters: - new_project_name : str, optional
The desired name of the new project. If omitted, the API will default to ‘Copy of <original project>’
- max_wait : int, optional
Time in seconds after which project creation is considered unsuccessful
-
class
datarobot.helpers.eligibility_result.
EligibilityResult
(supported, reason='', context='')¶ Represents whether a particular operation is supported
For instance, a function to check whether a set of models can be blended can return an EligibilityResult specifying whether or not blending is supported and why it may not be supported.
Attributes: - supported : bool
whether the operation this result represents is supported
- reason : str
why the operation is or is not supported
- context : str
what operation isn’t supported
Feature Association¶
-
class
datarobot.models.feature_association.
FeatureAssociation
(metric=None, assoc_type=None, featurelistId=None)¶ Feature association statistics for a project.
Attributes: - type : str
Either ‘association’ or ‘correlation’ the class of the pairwise stats
- metric : str
the metric of either class of pairwise stats ‘spearman’, ‘pearson’, etc for correlation, ‘mutualInfo’, ‘cramersV’ for association
Feature Association Matrix Details¶
-
class
datarobot.models.feature_association.
FeatureAssociationMatrixDetails
(feature1=None, feature2=None)¶ Plotting details for a pair of passed features present in the feature association matrix
Attributes: - feature1 : str
Feature name for the first feature of interest
- feature2 : str
Feature name for the second feature of interest
Feature Association Featurelists¶
-
class
datarobot.models.feature_association.
FeatureAssociationFeaturelists
¶ Get project featurelists and see if they have association statistics
Rating Table¶
-
class
datarobot.models.
RatingTable
(id, rating_table_name, original_filename, project_id, parent_model_id, model_id=None, model_job_id=None, validation_job_id=None, validation_error=None)¶ Interface to modify and download rating tables.
Attributes: - id : str
The id of the rating table.
- project_id : str
The id of the project this rating table belongs to.
- rating_table_name : str
The name of the rating table.
- original_filename : str
The name of the file used to create the rating table.
- parent_model_id : str
The model id of the model the rating table was validated against.
- model_id : str
The model id of the model that was created from the rating table. Can be None if a model has not been created from the rating table.
- model_job_id : str
The id of the job to create a model from this rating table. Can be None if a model has not been created from the rating table.
- validation_job_id : str
The id of the created job to validate the rating table. Can be None if the rating table has not been validated.
- validation_error : str
Contains a description of any errors caused during validation.
-
classmethod
get
(project_id, rating_table_id)¶ Retrieve a single rating table
Parameters: - project_id : str
The ID of the project the rating table is associated with.
- rating_table_id : str
The ID of the rating table
Returns: - rating_table : RatingTable
The queried instance
-
classmethod
create
(project_id, parent_model_id, filename, rating_table_name='Uploaded Rating Table')¶ Uploads and validates a new rating table CSV
Parameters: - project_id : str
id of the project the rating table belongs to
- parent_model_id : str
id of the model for which this rating table should be validated against
- filename : str
The path of the CSV file containing the modified rating table.
- rating_table_name : str, optional
A human friendly name for the new rating table. The string may be truncated and a suffix may be added to maintain unique names of all rating tables.
Returns: - job: Job
an instance of created async job
Raises: - InputNotUnderstoodError
Raised if filename isn’t one of supported types.
- ClientError (400)
Raised if parent_model_id is invalid.
-
download
(filepath)¶ Download a csv file containing the contents of this rating table
Parameters: - filepath : str
The path at which to save the rating table file.
-
rename
(rating_table_name)¶ Renames a rating table to a different name.
Parameters: - rating_table_name : str
The new name to rename the rating table to.
-
create_model
()¶ Creates a new model from this rating table record. This rating table must not already be associated with a model and must be valid.
Returns: - job: Job
an instance of created async job
Raises: - ClientError (422)
Raised if creating model from a RatingTable that failed validation
- JobAlreadyRequested
Raised if creating model from a RatingTable that is already associated with a RatingTableModel
Reason Codes (Deprecated)¶
This interface is considered deprecated. Please use PredictionExplanations instead.
-
class
datarobot.
ReasonCodesInitialization
(project_id, model_id, reason_codes_sample=None)¶ Represents a reason codes initialization of a model.
Attributes: - project_id : str
id of the project the model belongs to
- model_id : str
id of the model reason codes initialization is for
- reason_codes_sample : list of dict
a small sample of reason codes that could be generated for the model
-
classmethod
get
(project_id, model_id)¶ Retrieve the reason codes initialization for a model.
Reason codes initializations are a prerequisite for computing reason codes, and include a sample what the computed reason codes for a prediction dataset would look like.
Parameters: - project_id : str
id of the project the model belongs to
- model_id : str
id of the model reason codes initialization is for
Returns: - reason_codes_initialization : ReasonCodesInitialization
The queried instance.
Raises: - ClientError (404)
If the project or model does not exist or the initialization has not been computed.
-
classmethod
create
(project_id, model_id)¶ Create a reason codes initialization for the specified model.
Parameters: - project_id : str
id of the project the model belongs to
- model_id : str
id of the model for which initialization is requested
Returns: - job : Job
an instance of created async job
-
delete
()¶ Delete this reason codes initialization.
-
class
datarobot.
ReasonCodes
(id, project_id, model_id, dataset_id, max_codes, num_columns, finish_time, reason_codes_location, threshold_low=None, threshold_high=None)¶ Represents reason codes metadata and provides access to computation results.
Examples
reason_codes = dr.ReasonCodes.get(project_id, reason_codes_id) for row in reason_codes.get_rows(): print(row) # row is an instance of ReasonCodesRow
Attributes: - id : str
id of the record and reason codes computation result
- project_id : str
id of the project the model belongs to
- model_id : str
id of the model reason codes initialization is for
- dataset_id : str
id of the prediction dataset reason codes were computed for
- max_codes : int
maximum number of reason codes to supply per row of the dataset
- threshold_low : float
the lower threshold, below which a prediction must score in order for reason codes to be computed for a row in the dataset
- threshold_high : float
the high threshold, above which a prediction must score in order for reason codes to be computed for a row in the dataset
- num_columns : int
the number of columns reason codes were computed for
- finish_time : float
timestamp referencing when computation for these reason codes finished
- reason_codes_location : str
where to retrieve the reason codes
-
classmethod
get
(project_id, reason_codes_id)¶ Retrieve a specific reason codes.
Parameters: - project_id : str
id of the project the model belongs to
- reason_codes_id : str
id of the reason codes
Returns: - reason_codes : ReasonCodes
The queried instance.
-
classmethod
create
(project_id, model_id, dataset_id, max_codes=None, threshold_low=None, threshold_high=None)¶ Create a reason codes for the specified dataset.
In order to create ReasonCodesPage for a particular model and dataset, you must first:
- Compute feature impact for the model via
datarobot.Model.get_feature_impact()
- Compute a ReasonCodesInitialization for the model via
datarobot.ReasonCodesInitialization.create(project_id, model_id)
- Compute predictions for the model and dataset via
datarobot.Model.request_predictions(dataset_id)
threshold_high
andthreshold_low
are optional filters applied to speed up computation. When at least one is specified, only the selected outlier rows will have reason codes computed. Rows are considered to be outliers if their predicted value (in case of regression projects) or probability of being the positive class (in case of classification projects) is less thanthreshold_low
or greater thanthresholdHigh
. If neither is specified, reason codes will be computed for all rows.Parameters: - project_id : str
id of the project the model belongs to
- model_id : str
id of the model for which reason codes are requested
- dataset_id : str
id of the prediction dataset for which reason codes are requested
- threshold_low : float, optional
the lower threshold, below which a prediction must score in order for reason codes to be computed for a row in the dataset. If neither
threshold_high
northreshold_low
is specified, reason codes will be computed for all rows.- threshold_high : float, optional
the high threshold, above which a prediction must score in order for reason codes to be computed. If neither
threshold_high
northreshold_low
is specified, reason codes will be computed for all rows.- max_codes : int, optional
the maximum number of reason codes to supply per row of the dataset, default: 3.
Returns: - job: Job
an instance of created async job
- Compute feature impact for the model via
-
classmethod
list
(project_id, model_id=None, limit=None, offset=None)¶ List of reason codes for a specified project.
Parameters: - project_id : str
id of the project to list reason codes for
- model_id : str, optional
if specified, only reason codes computed for this model will be returned
- limit : int or None
at most this many results are returned, default: no limit
- offset : int or None
this many results will be skipped, default: 0
Returns: - reason_codes : list[ReasonCodes]
-
get_rows
(batch_size=None, exclude_adjusted_predictions=True)¶ Retrieve reason codes rows.
Parameters: - batch_size : int
maximum number of reason codes rows to retrieve per request
- exclude_adjusted_predictions : bool
Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
Yields: - reason_codes_row : ReasonCodesRow
Represents reason codes computed for a prediction row.
-
get_all_as_dataframe
(exclude_adjusted_predictions=True)¶ Retrieve all reason codes rows and return them as a pandas.DataFrame.
Returned dataframe has the following structure:
- row_id : row id from prediction dataset
- prediction : the output of the model for this row
- adjusted_prediction : adjusted prediction values (only appears for projects that utilize prediction adjustments, e.g. projects with an exposure column)
- class_0_label : a class level from the target (only appears for classification projects)
- class_0_probability : the probability that the target is this class (only appears for classification projects)
- class_1_label : a class level from the target (only appears for classification projects)
- class_1_probability : the probability that the target is this class (only appears for classification projects)
- reason_0_feature : the name of the feature contributing to the prediction for this reason
- reason_0_feature_value : the value the feature took on
- reason_0_label : the output being driven by this reason. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.
- reason_0_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this reason
- reason_0_strength : the amount this feature’s value affected the prediction
- …
- reason_N_feature : the name of the feature contributing to the prediction for this reason
- reason_N_feature_value : the value the feature took on
- reason_N_label : the output being driven by this reason. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.
- reason_N_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this reason
- reason_N_strength : the amount this feature’s value affected the prediction
Parameters: - exclude_adjusted_predictions : bool
Optional, defaults to True. Set this to False to include adjusted prediction values in the returned dataframe.
Returns: - dataframe: pandas.DataFrame
-
download_to_csv
(filename, encoding='utf-8', exclude_adjusted_predictions=True)¶ Save reason codes rows into CSV file.
Parameters: - filename : str or file object
path or file object to save reason codes rows
- encoding : string, optional
A string representing the encoding to use in the output file, defaults to ‘utf-8’
- exclude_adjusted_predictions : bool
Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
-
get_reason_codes_page
(limit=None, offset=None, exclude_adjusted_predictions=True)¶ Get reason codes.
If you don’t want use a generator interface, you can access paginated reason codes directly.
Parameters: - limit : int or None
the number of records to return, the server will use a (possibly finite) default if not specified
- offset : int or None
the number of records to skip, default 0
- exclude_adjusted_predictions : bool
Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
Returns: - reason_codes : ReasonCodesPage
-
delete
()¶ Delete this reason codes.
-
class
datarobot.models.reason_codes.
ReasonCodesRow
(row_id, prediction, prediction_values, reason_codes=None, adjusted_prediction=None, adjusted_prediction_values=None)¶ Represents reason codes computed for a prediction row.
Notes
PredictionValue
contains:label
: describes what this model output corresponds to. For regression projects, it is the name of the target feature. For classification projects, it is a level from the target feature.value
: the output of the prediction. For regression projects, it is the predicted value of the target. For classification projects, it is the predicted probability the row belongs to the class identified by the label.
ReasonCode
contains:label
: described what output was driven by this reason code. For regression projects, it is the name of the target feature. For classification projects, it is the class whose probability increasing would correspond to a positive strength of this reason code.feature
: the name of the feature contributing to the predictionfeature_value
: the value the feature took on for this rowstrength
: the amount this feature’s value affected the predictionqualitativate_strength
: a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’)
Attributes: - row_id : int
which row this
ReasonCodeRow
describes- prediction : float
the output of the model for this row
- adjusted_prediction : float or None
adjusted prediction value for projects that provide this information, None otherwise
- prediction_values : list
an array of dictionaries with a schema described as
PredictionValue
- adjusted_prediction_values : list
same as prediction_values but for adjusted predictions
- reason_codes : list
an array of dictionaries with a schema described as
ReasonCode
-
class
datarobot.models.reason_codes.
ReasonCodesPage
(id, count=None, previous=None, next=None, data=None, reason_codes_record_location=None, adjustment_method=None)¶ Represents batch of reason codes received by one request.
Attributes: - id : str
id of the reason codes computation result
- data : list[dict]
list of raw reason codes, each row corresponds to a row of the prediction dataset
- count : int
total number of rows computed
- previous_page : str
where to retrieve previous page of reason codes, None if current page is the first
- next_page : str
where to retrieve next page of reason codes, None if current page is the last
- reason_codes_record_location : str
where to retrieve the reason codes metadata
- adjustment_method : str
Adjustment method that was applied to predictions, or ‘N/A’ if no adjustments were done.
-
classmethod
get
(project_id, reason_codes_id, limit=None, offset=0, exclude_adjusted_predictions=True)¶ Retrieve reason codes.
Parameters: - project_id : str
id of the project the model belongs to
- reason_codes_id : str
id of the reason codes
- limit : int or None
the number of records to return, the server will use a (possibly finite) default if not specified
- offset : int or None
the number of records to skip, default 0
- exclude_adjusted_predictions : bool
Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
Returns: - reason_codes : ReasonCodesPage
The queried instance.
Recommended Models¶
-
class
datarobot.models.
ModelRecommendation
(project_id, model_id, recommendation_type)¶ A collection of information about a recommended model for a project.
Attributes: - project_id : str
the id of the project the model belongs to
- model_id : str
the id of the recommended model
- recommendation_type : str
the type of model recommendation
-
classmethod
get
(project_id, recommendation_type=None)¶ Retrieves the default or specified by recommendation_type recommendation.
Parameters: - project_id : str
The project’s id.
- recommendation_type : enums.RECOMMENDED_MODEL_TYPE
The type of recommendation to get. If None, returns the default recommendation.
Returns: - recommended_model : ModelRecommendation
-
classmethod
get_all
(project_id)¶ Retrieves all of the current recommended models for the project.
Parameters: - project_id : str
The project’s id.
Returns: - recommended_models : list of ModelRecommendation
-
classmethod
get_recommendation
(recommended_models, recommendation_type)¶ Returns the model in the given list with the requested type.
Parameters: - recommended_models : list of ModelRecommendation
- recommendation_type : enums.RECOMMENDED_MODEL_TYPE
the type of model to extract from the recommended_models list
Returns: - recommended_model : ModelRecommendation or None if no model with the requested type exists
-
get_model
()¶ Returns the Model associated with this ModelRecommendation.
Returns: - recommended_model : Model
ROC Curve¶
-
class
datarobot.models.roc_curve.
RocCurve
(source, roc_points, negative_class_predictions, positive_class_predictions, source_model_id)¶ ROC curve data for model.
Attributes: - source : str
ROC curve data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.
- roc_points : list of dict
List of precalculated metrics associated with thresholds for ROC curve.
- negative_class_predictions : list of float
List of predictions from example for negative class
- positive_class_predictions : list of float
List of predictions from example for positive class
- source_model_id : str
ID of the model this ROC curve represents; in some cases, insights from the parent of a frozen model may be used
-
estimate_threshold
(threshold)¶ Return metrics estimation for given threshold.
Parameters: - threshold : float from [0, 1] interval
Threshold we want estimation for
Returns: - dict
Dictionary of estimated metrics in form of {metric_name: metric_value}. Metrics are ‘accuracy’, ‘f1_score’, ‘false_negative_score’, ‘true_negative_score’, ‘true_negative_rate’, ‘matthews_correlation_coefficient’, ‘true_positive_score’, ‘positive_predictive_value’, ‘false_positive_score’, ‘false_positive_rate’, ‘negative_predictive_value’, ‘true_positive_rate’.
Raises: - ValueError
Given threshold isn’t from [0, 1] interval
-
get_best_f1_threshold
()¶ Return value of threshold that corresponds to max F1 score. This is threshold that will be preselected in DataRobot when you open “ROC curve” tab.
Returns: - float
Threhold with best F1 score.
SharingAccess¶
-
class
datarobot.
SharingAccess
(username, role, can_share=None, user_id=None)¶ Represents metadata about whom a entity (e.g. a data store) has been shared with
New in version v2.14.
Currently
DataStores
,DataSources
,Projects
(new in version v2.15) andCalendarFiles
(new in version 2.15) can be shared.This class can represent either access that has already been granted, or be used to grant access to additional users.
Attributes: - username : str
a particular user
- role : str or None
if a string, represents a particular level of access and should be one of
datarobot.enums.SHARING_ROLE
. For more information on the specific access levels, see the sharing documentation. If None, can be passed to a share function to revoke access for a specific user.- can_share : bool or None
if a bool, indicates whether this user is permitted to further share. When False, the user has access to the entity, but can only revoke their own access but not modify any user’s access role. When True, the user can share with any other user at a access role up to their own. May be None if the SharingAccess was not retrieved from the DataRobot server but intended to be passed into a share function; this will be equivalent to passing True.
- user_id : str
the id of the user
Training Predictions¶
-
class
datarobot.models.training_predictions.
TrainingPredictionsIterator
(client, path, limit=None)¶ Lazily fetches training predictions from DataRobot API in chunks of specified size and then iterates rows from responses as named tuples. Each row represents a training prediction computed for a dataset’s row. Each named tuple has the following structure:
Notes
Each
PredictionValue
dict contains these keys:- label
- describes what this model output corresponds to. For regression projects, it is the name of the target feature. For classification and multiclass projects, it is a label from the target feature.
- value
- the output of the prediction. For regression projects, it is the predicted value of the target. For classification and multiclass projects, it is the predicted probability that the row belongs to the class identified by the label.
Examples
import datarobot as dr # Fetch existing training predictions by their id training_predictions = dr.TrainingPredictions.get(project_id, prediction_id) # Iterate over predictions for row in training_predictions.iterate_rows() print(row.row_id, row.prediction)
Attributes: - row_id : int
id of the record in original dataset for which training prediction is calculated
- partition_id : str or float
id of the data partition that the row belongs to
- prediction : float
the model’s prediction for this data row
- prediction_values : list of dictionaries
an array of dictionaries with a schema described as
PredictionValue
- timestamp : str or None
(New in version v2.11) an ISO string representing the time of the prediction in time series project; may be None for non-time series projects
- forecast_point : str or None
(New in version v2.11) an ISO string representing the point in time used as a basis to generate the predictions in time series project; may be None for non-time series projects
- forecast_distance : str or None
(New in version v2.11) how many time steps are between the forecast point and the timestamp in time series project; None for non-time series projects
- series_id : str or None
(New in version v2.11) the id of the series in a multiseries project; may be NaN for single series projects; None for non-time series projects
-
class
datarobot.models.training_predictions.
TrainingPredictions
(project_id, prediction_id, model_id=None, data_subset=None)¶ Represents training predictions metadata and provides access to prediction results.
Examples
Compute training predictions for a model on the whole dataset
import datarobot as dr # Request calculation of training predictions training_predictions_job = model.request_training_predictions(dr.enums.DATA_SUBSET.ALL) training_predictions = training_predictions_job.get_result_when_complete() print('Training predictions {} are ready'.format(training_predictions.prediction_id)) # Iterate over actual predictions for row in training_predictions.iterate_rows(): print(row.row_id, row.partition_id, row.prediction)
List all training predictions for a project
import datarobot as dr # Fetch all training predictions for a project all_training_predictions = dr.TrainingPredictions.list(project_id) # Inspect all calculated training predictions for training_predictions in all_training_predictions: print( 'Prediction {} is made for data subset "{}"'.format( training_predictions.prediction_id, training_predictions.data_subset, ) )
Retrieve training predictions by id
import datarobot as dr # Getting training predictions by id training_predictions = dr.TrainingPredictions.get(project_id, prediction_id) # Iterate over actual predictions for row in training_predictions.iterate_rows(): print(row.row_id, row.partition_id, row.prediction)
Attributes: - project_id : str
id of the project the model belongs to
- model_id : str
id of the model
- prediction_id : str
id of generated predictions
- data_subset : datarobot.enums.DATA_SUBSET
data set definition used to build predictions. Choices are:
- datarobot.enums.DATA_SUBSET.ALL
- for all data available. Not valid for models in datetime partitioned projects.
- datarobot.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT
- for all data except training set. Not valid for models in datetime partitioned projects.
- datarobot.enums.DATA_SUBSET.HOLDOUT
- for holdout data set only.
- datarobot.enums.DATA_SUBSET.ALL_BACKTESTS
- for downloading the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
-
classmethod
list
(project_id)¶ Fetch all the computed training predictions for a project.
Parameters: - project_id : str
id of the project
Returns: - A list of :py:class:`TrainingPredictions` objects
-
classmethod
get
(project_id, prediction_id)¶ Retrieve training predictions on a specified data set.
Parameters: - project_id : str
id of the project the model belongs to
- prediction_id : str
id of the prediction set
Returns: - :py:class:`TrainingPredictions` object which is ready to operate with specified predictions
-
iterate_rows
(batch_size=None)¶ Retrieve training prediction rows as an iterator.
Parameters: - batch_size : int, optional
maximum number of training prediction rows to fetch per request
Returns: - iterator :
TrainingPredictionsIterator
an iterator which yields named tuples representing training prediction rows
-
get_all_as_dataframe
(class_prefix='class_', serializer='json')¶ Retrieve all training prediction rows and return them as a pandas.DataFrame.
- Returned dataframe has the following structure:
- row_id : row id from the original dataset
- prediction : the model’s prediction for this row
- class_<label> : the probability that the target is this class (only appears for classification and multiclass projects)
- timestamp : the time of the prediction (only appears for out of time validation or time series projects)
- forecast_point : the point in time used as a basis to generate the predictions (only appears for time series projects)
- forecast_distance : how many time steps are between timestamp and forecast_point (only appears for time series projects)
- series_id : he id of the series in a multiseries project or None for a single series project (only appears for time series projects)
Parameters: - class_prefix : str, optional
The prefix to append to labels in the final dataframe. Default is
class_
(e.g., apple -> class_apple)- serializer : str, optional
Serializer to use for the download. Options:
json
(default) orcsv
.
Returns: - dataframe: pandas.DataFrame
-
download_to_csv
(filename, encoding='utf-8', serializer='json')¶ Save training prediction rows into CSV file.
Parameters: - filename : str or file object
path or file object to save training prediction rows
- encoding : string, optional
A string representing the encoding to use in the output file, defaults to ‘utf-8’
- serializer : str, optional
Serializer to use for the download. Options:
json
(default) orcsv
.
Word Cloud¶
-
class
datarobot.models.word_cloud.
WordCloud
(ngrams)¶ Word cloud data for the model.
Notes
WordCloudNgram
is a dict containing the following:ngram
(str) Word or ngram value.coefficient
(float) Value from [-1.0, 1.0] range, describes effect of this ngram on the target. Large negative value means strong effect toward negative class in classification and smaller target value in regression models. Large positive - toward positive class and bigger value respectively.count
(int) Number of rows in the training sample where this ngram appears.frequency
(float) Value from (0.0, 1.0] range, relative frequency of given ngram to most frequent ngram.is_stopword
(bool) True for ngrams that DataRobot evaluates as stopwords.class
(str or None) For classification - values of the target class for corresponding word or ngram. For regression - None.
Attributes: - ngrams : list of dicts
List of dicts with schema described as
WordCloudNgram
above.
-
most_frequent
(top_n=5)¶ Return most frequent ngrams in the word cloud.
Parameters: - top_n : int
Number of ngrams to return
Returns: - list of dict
Up to top_n top most frequent ngrams in the word cloud. If top_n bigger then total number of ngrams in word cloud - return all sorted by frequency in descending order.
-
most_important
(top_n=5)¶ Return most important ngrams in the word cloud.
Parameters: - top_n : int
Number of ngrams to return
Returns: - list of dict
Up to top_n top most important ngrams in the word cloud. If top_n bigger then total number of ngrams in word cloud - return all sorted by absolute coefficient value in descending order.
-
ngrams_per_class
()¶ Split ngrams per target class values. Useful for multiclass models.
Returns: - dict
Dictionary in the format of (class label) -> (list of ngrams for that class)
Safer¶
-
class
datarobot.models.
SecondaryDatasetConfigurations
(id=None, project_id=None, config=None)¶ Create secondary dataset configurations for a given project
New in version v2.20.
Attributes: - id : str
id of this secondary dataset configuration
- project_id : str
id of the associated project.
- config: list of DatasetConfiguration
list of secondary dataset configurations
-
classmethod
create
(project_id, dataset_configurations)¶ create secondary dataset configurations
New in version v2.20.
Parameters: - project_id : str
id of the associated project.
- dataset_configurations: list of DatasetConfiguration
list of dataset configurations
Returns: - an instance of SecondaryDatasetConfigurations
Raises: - ClientError
raised if incorrect configuration parameters are provided