API Reference¶

API Object¶

class datarobot.models.api_object.APIObject¶

classmethod from_data(data)¶

Instantiate an object of this class using a dict.

Parameters

datadict: Correctly snake_cased keys and their values.

Return type: TypeVar(T, bound= APIObject)

classmethod from_server_data(data, keep_attrs=None)¶

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters

datadict: The directly translated dict of JSON from the server. No casing fixes have taken place
keep_attrsiterable: List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

Return type: TypeVar(T, bound= APIObject)

Advanced Options¶

class datarobot.helpers.AdvancedOptions(weights=None, response_cap=None, blueprint_threshold=None, seed=None, smart_downsampled=None, majority_downsampling_rate=None, offset=None, exposure=None, accuracy_optimized_mb=None, scaleout_modeling_mode=None, events_count=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, only_include_monotonic_blueprints=None, allowed_pairwise_interaction_groups=None, blend_best_models=None, scoring_code_only=None, prepare_model_for_deployment=None, consider_blenders_in_recommendation=None, min_secondary_validation_model_count=None, shap_only_mode=None, autopilot_data_sampling_method=None, run_leakage_removed_feature_list=None, autopilot_with_feature_discovery=False, feature_discovery_supervised_feature_reduction=None, exponentially_weighted_moving_alpha=None, external_time_series_baseline_dataset_id=None, use_supervised_feature_reduction=True, primary_location_column=None, protected_features=None, preferable_target_value=None, fairness_metrics_set=None, fairness_threshold=None, bias_mitigation_feature_name=None, bias_mitigation_technique=None, include_bias_mitigation_feature_as_predictor_variable=None, default_monotonic_increasing_featurelist_id=None, default_monotonic_decreasing_featurelist_id=None, model_group_id=None, model_regime_id=None, model_baselines=None)¶

Used when setting the target of a project to set advanced options of modeling process.

Parameters

weightsstring, optional: The name of a column indicating the weight of each row
response_capbool or float in [0.5, 1), optional: Defaults to none here, but server defaults to False. If specified, it is the quantile of the response distribution to use for response capping.
blueprint_thresholdint, optional: Number of hours models are permitted to run before being excluded from later autopilot stages Minimum 1
seedint, optional: a seed to use for randomization
smart_downsampledbool, optional: whether to use smart downsampling to throw away excess rows of the majority class. Only applicable to classification and zero-boosted regression projects.
majority_downsampling_ratefloat, optional: the percentage between 0 and 100 of the majority rows that should be kept. Specify only if using smart downsampling. May not cause the majority class to become smaller than the minority class.
offsetlist of str, optional: (New in version v2.6) the list of the names of the columns containing the offset of each row
exposurestring, optional: (New in version v2.6) the name of a column containing the exposure of each row
accuracy_optimized_mbbool, optional: (New in version v2.6) Include additional, longer-running models that will be run by the autopilot and available to run manually.
scaleout_modeling_modestring, optional: (Deprecated in 2.28. Will be removed in 2.30) DataRobot no longer supports scaleout models. Please remove any usage of this parameter as it will be removed from the API soon.
events_countstring, optional: (New in version v2.8) the name of a column specifying events count.
monotonic_increasing_featurelist_idstring, optional: (new in version 2.11) the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overriden at model submission time if desired.
monotonic_decreasing_featurelist_idstring, optional: (new in version 2.11) the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overriden at model submission time if desired.
only_include_monotonic_blueprintsbool, optional: (new in version 2.11) when true, only blueprints that support enforcing monotonic constraints will be available in the project or selected for the autopilot.
allowed_pairwise_interaction_groupslist of tuple, optional: (New in version v2.19) For GA2M models - specify groups of columns for which pairwise interactions will be allowed. E.g. if set to [(A, B, C), (C, D)] then GA2M models will allow interactions between columns AxB, BxC, AxC, CxD. All others (AxD, BxD) will not be considered.
blend_best_models: bool, optional: (New in version v2.19) blend best models during Autopilot run.
scoring_code_only: bool, optional: (New in version v2.19) Keep only models that can be converted to scorable java code during Autopilot run
shap_only_mode: bool, optional: (New in version v2.21) Keep only models that support SHAP values during Autopilot run. Use SHAP-based insights wherever possible. Defaults to False.
prepare_model_for_deployment: bool, optional: (New in version v2.19) Prepare model for deployment during Autopilot run. The preparation includes creating reduced feature list models, retraining best model on higher sample size, computing insights and assigning “RECOMMENDED FOR DEPLOYMENT” label.
consider_blenders_in_recommendation: bool, optional: (New in version 2.22.0) Include blenders when selecting a model to prepare for deployment in an Autopilot Run. Defaults to False.
min_secondary_validation_model_count: int, optional: (New in version v2.19) Compute “All backtest” scores (datetime models) or cross validation scores for the specified number of the highest ranking models on the Leaderboard, if over the Autopilot default.
autopilot_data_sampling_method: str, optional: (New in version v2.23) one of datarobot.enums.DATETIME_AUTOPILOT_DATA_SAMPLING_METHOD. Applicable for OTV projects only, defines if autopilot uses “random” or “latest” sampling when iteratively building models on various training samples. Defaults to “random” for duration-based projects and to “latest” for row-based projects.
run_leakage_removed_feature_list: bool, optional: (New in version v2.23) Run Autopilot on Leakage Removed feature list (if exists).
autopilot_with_feature_discovery: bool, default ``False``, optional: (New in version v2.23) If true, autopilot will run on a feature list that includes features found via search for interactions.
feature_discovery_supervised_feature_reduction: bool, optional: (New in version v2.23) Run supervised feature reduction for feature discovery projects.
exponentially_weighted_moving_alpha: float, optional: (New in version v2.26) defaults to None, value between 0 and 1 (inclusive), indicates alpha parameter used in exponentially weighted moving average within feature derivation window.
external_time_series_baseline_dataset_id: str, optional: (New in version v2.26) If provided, will generate metrics scaled by external model predictions metric for time series projects. The external predictions catalog must be validated before autopilot starts, see Project.validate_external_time_series_baseline and external baseline predictions documentation for further explanation.
use_supervised_feature_reduction: bool, default ``True` optional: Time Series only. When true, during feature generation DataRobot runs a supervised algorithm to retain only qualifying features. Setting to false can severely impact autopilot duration, especially for datasets with many features.
primary_location_column: str, optional.: The name of primary location column.
protected_features: list of str, optional.: (New in version v2.24) A list of project features to mark as protected for Bias and Fairness testing calculations. Max number of protected features allowed is 10.
preferable_target_value: str, optional.: (New in version v2.24) A target value that should be treated as a favorable outcome for the prediction. For example, if we want to check gender discrimination for giving a loan and our target is named is_bad, then the positive outcome for the prediction would be No, which means that the loan is good and that’s what we treat as a favorable result for the loaner.
fairness_metrics_set: str, optional.: (New in version v2.24) Metric to use for calculating fairness. Can be one of proportionalParity, equalParity, predictionBalance, trueFavorableAndUnfavorableRateParity or favorableAndUnfavorablePredictiveValueParity. Used and required only if Bias & Fairness in AutoML feature is enabled.
fairness_threshold: str, optional.: (New in version v2.24) Threshold value for the fairness metric. Can be in a range of [0.0, 1.0]. If the relative (i.e. normalized) fairness score is below the threshold, then the user will see a visual indication on the
bias_mitigation_feature_namestr, optional: The feature from protected features that will be used in a bias mitigation task to mitigate bias
bias_mitigation_techniquestr, optional: One of datarobot.enums.BiasMitigationTechnique Options: - ‘preprocessingReweighing’ - ‘postProcessingRejectionOptionBasedClassification’ The technique by which we’ll mitigate bias, which will inform which bias mitigation task we insert into blueprints
include_bias_mitigation_feature_as_predictor_variablebool, optional: Whether we should also use the mitigation feature as in input to the modeler just like any other categorical used for training, i.e. do we want the model to “train on” this feature in addition to using it for bias mitigation
default_monotonic_increasing_featurelist_idstr, optional: Returned from server on Project GET request - not able to be updated by user
default_monotonic_decreasing_featurelist_idstr, optional: Returned from server on Project GET request - not able to be updated by user
model_group_id: Optional[str] = None,: (New in version v3.3) The name of a column containing the model group ID for each row.
model_regime_id: Optional[str] = None,: (New in version v3.3) The name of a column containing the model regime ID for each row.
model_baselines: Optional[List[str]] = None,: (New in version v3.3) The list of the names of the columns containing the model baselines for each row.

Examples

import datarobot as dr
advanced_options = dr.AdvancedOptions(
    weights='weights_column',
    offset=['offset_column'],
    exposure='exposure_column',
    response_cap=0.7,
    blueprint_threshold=2,
    smart_downsampled=True, majority_downsampling_rate=75.0)

get(_AdvancedOptions__key, _AdvancedOptions__default=None)¶

Return the value for key if key is in the dictionary, else default.

Return type: Optional[Any]

pop(_AdvancedOptions__key)¶

If key is not found, d is returned if given, otherwise KeyError is raised

Return type: Optional[Any]

update_individual_options(**kwargs)¶

Update individual attributes of an instance of AdvancedOptions.

Return type: None

Anomaly Assessment¶

class datarobot.models.anomaly_assessment.AnomalyAssessmentRecord(status, status_details, start_date, end_date, prediction_threshold, preview_location, delete_location, latest_explanations_location, **record_kwargs)¶

Object which keeps metadata about anomaly assessment insight for the particular subset, backtest and series and the links to proceed to get the anomaly assessment data.

New in version v2.25.

Notes

Record contains:

record_id : the ID of the record.
project_id : the project ID of the record.
model_id : the model ID of the record.
backtest : the backtest of the record.
source : the source of the record.
series_id : the series id of the record for the multiseries projects.
status : the status of the insight.
status_details : the explanation of the status.
start_date : the ISO-formatted timestamp of the first prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.
end_date : the ISO-formatted timestamp of the last prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.
prediction_threshold : the threshold, all rows with anomaly scores greater or equal to it have shap explanations computed. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.
preview_location : URL to retrieve predictions preview for the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.
latest_explanations_location : the URL to retrieve the latest predictions with the shap explanations. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.
delete_location : the URL to delete anomaly assessment record and relevant insight data.

Attributes

record_id: str: The ID of the record.
project_id: str: The ID of the project record belongs to.
model_id: str: The ID of the model record belongs to.
backtest: int or “holdout”: The backtest of the record.
source: “training” or “validation”: The source of the record
series_id: str or None: The series id of the record for the multiseries projects. Defined only for the multiseries projects.
status: str: The status of the insight. One of datarobot.enums.AnomalyAssessmentStatus
status_details: str: The explanation of the status.
start_date: str or None: See start_date info in Notes for more details.
end_date: str or None: See end_date info in Notes for more details.
prediction_threshold: float or None: See prediction_threshold info in Notes for more details.
preview_location: str or None: See preview_location info in Notes for more details.
latest_explanations_location: str or None: See latest_explanations_location info in Notes for more details.
delete_location: str: The URL to delete anomaly assessment record and relevant insight data.

classmethod list(project_id, model_id, backtest=None, source=None, series_id=None, limit=100, offset=0, with_data_only=False)¶

Retrieve the list of the anomaly assessment records for the project and model. Output can be filtered and limited.

Parameters

project_id: str: The ID of the project record belongs to.
model_id: str: The ID of the model record belongs to.
backtest: int or “holdout”: The backtest to filter records by.
source: “training” or “validation”: The source to filter records by.
series_id: str, optional: The series id to filter records by. Can be specified for multiseries projects.
limit: int, optional: 100 by default. At most this many results are returned.
offset: int, optional: This many results will be skipped.
with_data_only: bool, False by default: Filter by status == AnomalyAssessmentStatus.COMPLETED. If True, records with no data or not supported will be omitted.

Returns

AnomalyAssessmentRecord: The anomaly assessment record.

Return type: List[AnomalyAssessmentRecord]

classmethod compute(project_id, model_id, backtest, source, series_id=None)¶

Request anomaly assessment insight computation on the specified subset.

Parameters

project_id: str: The ID of the project to compute insight for.
model_id: str: The ID of the model to compute insight for.
backtest: int or “holdout”: The backtest to compute insight for.
source: “training” or “validation”: The source to compute insight for.
series_id: str, optional: The series id to compute insight for. Required for multiseries projects.

Returns

AnomalyAssessmentRecord: The anomaly assessment record.

Return type: AnomalyAssessmentRecord

delete()¶

Delete anomaly assessment record with preview and explanations.

Return type: None

get_predictions_preview()¶

Retrieve aggregated predictions statistics for the anomaly assessment record.

Returns

AnomalyAssessmentPredictionsPreview

Return type: AnomalyAssessmentPredictionsPreview

get_latest_explanations()¶

Retrieve latest predictions along with shap explanations for the most anomalous records.

Returns

AnomalyAssessmentExplanations

Return type: AnomalyAssessmentExplanations

get_explanations(start_date=None, end_date=None, points_count=None)¶

Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.

Parameters

start_date: str, optional: The start of the date range to get explanations in. Example: 2020-01-01T00:00:00.000000Z
end_date: str, optional: The end of the date range to get explanations in. Example: 2020-10-01T00:00:00.000000Z
points_count: int, optional: The number of the rows to return.

Returns

AnomalyAssessmentExplanations

Return type: AnomalyAssessmentExplanations

get_explanations_data_in_regions(regions, prediction_threshold=0.0)¶

Get predictions along with explanations for the specified regions, sorted by predictions in descending order.

Parameters

regions: list of preview_bins: For each region explanations will be retrieved and merged.
prediction_threshold: float, optional: If specified, only points with score greater or equal to the threshold will be returned.

Returns

dict in a form of {‘explanations’: explanations, ‘shap_base_value’: shap_base_value}

Return type: RegionExplanationsData

class datarobot.models.anomaly_assessment.AnomalyAssessmentExplanations(shap_base_value, data, start_date, end_date, count, **record_kwargs)¶

Object which keeps predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points.

New in version v2.25.

Notes

AnomalyAssessmentExplanations contains:

record_id : the id of the corresponding anomaly assessment record.
project_id : the project ID of the corresponding anomaly assessment record.
model_id : the model ID of the corresponding anomaly assessment record.
backtest : the backtest of the corresponding anomaly assessment record.
source : the source of the corresponding anomaly assessment record.
series_id : the series id of the corresponding anomaly assessment record for the multiseries projects.
start_date : the ISO-formatted first timestamp in the response. Will be None of there is no data in the specified range.
end_date : the ISO-formatted last timestamp in the response. Will be None of there is no data in the specified range.
count : The number of points in the response.
shap_base_value : the shap base value.
data : list of DataPoint objects in the specified date range.

DataPoint contains:

shap_explanation : None or an array of up to 10 ShapleyFeatureContribution objects. Only rows with the highest anomaly scores have Shapley explanations calculated. Value is None if prediction is lower than prediction_threshold.

timestamp (str) : ISO-formatted timestamp for the row.

prediction (float) : The output of the model for this row.

ShapleyFeatureContribution contains:

feature_value (str) : the feature value for this row. First 50 characters are returned.

strength (float) : the shap value for this feature and row.

feature (str) : the feature name.

Attributes

record_id: str: The ID of the record.
project_id: str: The ID of the project record belongs to.
model_id: str: The ID of the model record belongs to.
backtest: int or “holdout”: The backtest of the record.
source: “training” or “validation”: The source of the record.
series_id: str or None: The series id of the record for the multiseries projects. Defined only for the multiseries projects.
start_date: str or None: The ISO-formatted datetime of the first row in the data.
end_date: str or None: The ISO-formatted datetime of the last row in the data.
data: array of `data_point` objects or None: See data info in Notes for more details.
shap_base_value: float: Shap base value.
count: int: The number of points in the data.

classmethod get(project_id, record_id, start_date=None, end_date=None, points_count=None)¶

Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.

Parameters

project_id: str: The ID of the project.
record_id: str: The ID of the anomaly assessment record.
start_date: str, optional: The start of the date range to get explanations in. Example: 2020-01-01T00:00:00.000000Z
end_date: str, optional: The end of the date range to get explanations in. Example: 2020-10-01T00:00:00.000000Z
points_count: int, optional: The number of the rows to return.

Returns

AnomalyAssessmentExplanations

Return type: AnomalyAssessmentExplanations

class datarobot.models.anomaly_assessment.AnomalyAssessmentPredictionsPreview(start_date, end_date, preview_bins, **record_kwargs)¶

Aggregated predictions over time for the corresponding anomaly assessment record. Intended to find the bins with highest anomaly scores.

New in version v2.25.

Notes

AnomalyAssessmentPredictionsPreview contains:

record_id : the id of the corresponding anomaly assessment record.
project_id : the project ID of the corresponding anomaly assessment record.
model_id : the model ID of the corresponding anomaly assessment record.
backtest : the backtest of the corresponding anomaly assessment record.
source : the source of the corresponding anomaly assessment record.
series_id : the series id of the corresponding anomaly assessment record for the multiseries projects.
start_date : the ISO-formatted timestamp of the first prediction in the subset.
end_date : the ISO-formatted timestamp of the last prediction in the subset.
preview_bins : list of PreviewBin objects. The aggregated predictions for the subset. Bins boundaries may differ from actual start/end dates because this is an aggregation.

PreviewBin contains:

start_date (str) : the ISO-formatted datetime of the start of the bin.
end_date (str) : the ISO-formatted datetime of the end of the bin.
avg_predicted (float or None) : the average prediction of the model in the bin. None if there are no entries in the bin.
max_predicted (float or None) : the maximum prediction of the model in the bin. None if there are no entries in the bin.
frequency (int) : the number of the rows in the bin.

Attributes

record_id: str: The ID of the record.
project_id: str: The ID of the project record belongs to.
model_id: str: The ID of the model record belongs to.
backtest: int or “holdout”: The backtest of the record.
source: “training” or “validation”: The source of the record
series_id: str or None: The series id of the record for the multiseries projects. Defined only for the multiseries projects.
start_date: str: the ISO-formatted timestamp of the first prediction in the subset.
end_date: str: the ISO-formatted timestamp of the last prediction in the subset.
preview_bins: list of preview_bin objects.: The aggregated predictions for the subset. See more info in Notes.

classmethod get(project_id, record_id)¶

Retrieve aggregated predictions over time.

Parameters

project_id: str: The ID of the project.
record_id: str: The ID of the anomaly assessment record.

Returns

AnomalyAssessmentPredictionsPreview

Return type: AnomalyAssessmentPredictionsPreview

find_anomalous_regions(max_prediction_threshold=0.0)¶

Sort preview bins by max_predicted value and select those with max predicted value: greater or equal to max prediction threshold. Sort the result by max predicted value in descending order.

Parameters

max_prediction_threshold: float, optional: Return bins with maximum anomaly score greater or equal to max_prediction_threshold.

Returns

preview_bins: list of preview_bin: Filtered and sorted preview bins

Return type: List[AnomalyAssessmentPreviewBin]

Application¶

class datarobot.Application(id, application_type_id, user_id, model_deployment_id, name, created_by, created_at, updated_at, datasets, cloud_provider, deployment_ids, pool_used, permissions, has_custom_logo, org_id, deployment_status_id=None, description=None, related_entities=None, application_template_type=None, deployment_name=None, deactivation_status_id=None, created_first_name=None, creator_last_name=None, creator_userhash=None, deployments=None)¶

An entity associated with a DataRobot Application.

Attributes

idstr: The ID of the created application.
application_type_idstr: The ID of the type of the application.
user_idstr: The ID of the user which created the application.
model_deployment_idstr: The ID of the associated model deployment.
deactivation_status_idstr or None: The ID of the status object to track the asynchronous app deactivation process status. Will be None if the app was never deactivated.
namestr: The name of the application.
created_bystr: The username of the user created the application.
created_atstr: The timestamp when the application was created.
updated_atstr: The timestamp when the application was updated.
datasetsList[str]: The list of datasets IDs associated with the application.
creator_first_nameOptional[str]: Application creator first name. Optional.
creator_last_nameOptional[str]: Application creator last name. Optional.
creator_userhashOptional[str]: Application creator userhash. Optional.
deployment_status_idstr: The ID of the status object to track the asynchronous deployment process status.
descriptionstr: A description of the application.
cloud_providerstr: The host of this application.
deploymentsOptional[List[ApplicationDeployment]]: A list of deployment details. Optional.
deployment_idsList[str]: A list of deployment IDs for this app.
deployment_nameOptional[str]: Name of the deployment. Optional.
application_template_typeOptional[str]: Application template type, purpose. Optional.
pool_usedbool: Whether the pool where used for last app deployment.
permissionsList[str]: The list of permitted actions, which the authenticated user can perform on this application. Permissions should be ApplicationPermission options.
has_custom_logobool: Whether the app has a custom logo.
related_entitiesOptional[ApplcationRelatedEntity]: IDs of entities, related to app for easy search.
org_idstr: ID of the app’s organization.

classmethod list(offset=None, limit=None, use_cases=None)¶

Retrieve a list of user applications.

Parameters

offsetOptional[int]: Optional. Retrieve applications in a list after this number.
limitOptional[int]: Optional. Retrieve only this number of applications.
use_cases: Optional[Union[UseCase, List[UseCase], str, List[str]]]: Optional. Filter available Applications by a specific Use Case or Use Cases. Accepts either the entity or the ID. If set to [None], the method filters the project’s applications by those not linked to a UseCase.

Returns

applicationsList[Application]: The requested list of user applications.

Return type: List[Application]

classmethod get(application_id)¶

Retrieve a single application.

Parameters

application_idstr: The ID of the application to retrieve.

Returns

applicationApplication: The requested application.

Return type: Application

Batch Predictions¶

class datarobot.models.BatchPredictionJob(data, completed_resource_url=None)¶

A Batch Prediction Job is used to score large data sets on prediction servers using the Batch Prediction API.

Attributes

idstr: the id of the job

classmethod score(deployment, intake_settings=None, output_settings=None, csv_settings=None, timeseries_settings=None, num_concurrent=None, chunk_size=None, passthrough_columns=None, passthrough_columns_set=None, max_explanations=None, max_ngram_explanations=None, threshold_high=None, threshold_low=None, prediction_warning_enabled=None, include_prediction_status=False, skip_drift_tracking=False, prediction_instance=None, abort_on_error=True, column_names_remapping=None, include_probabilities=True, include_probabilities_classes=None, download_timeout=120, download_read_timeout=660, upload_read_timeout=600, explanations_mode=None)¶

Create new batch prediction job, upload the scoring dataset and return a batch prediction job.

The default intake and output options are both localFile which requires the caller to pass the file parameter and either download the results using the download() method afterwards or pass a path to a file where the scored data will be downloaded to afterwards.

Returns

BatchPredictionJob: Instance of BatchPredictionJob

Attributes

deploymentDeployment or string ID

Deployment which will be used for scoring.

intake_settingsdict (optional)

A dict configuring how data is coming from. Supported options:

type : string, either localFile, s3, azure, gcp, dataset, jdbc snowflake, synapse or bigquery

Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a dr.Dataset object.

To score from a local file, add the this parameter to the settings:

file : file-like object, string path to file or a pandas.DataFrame of scoring data

To score from S3, add the next parameters to the settings:

url : string, the URL to score (e.g.: s3://bucket/key)

credential_id : string (optional)

endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default)

To score from JDBC, add the next parameters to the settings:

data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).

query : string (optional if table, schema and/or catalog is specified), a self-supplied SELECT statement of the data set you wish to predict.

table : string (optional if query is specified), the name of specified database table.

schema : string (optional if query is specified), the name of specified database schema.

catalog : string (optional if query is specified), (new in v2.22) the name of specified database catalog.

fetch_size : int (optional), Changing the fetchSize can be used to balance throughput and memory usage.

credential_id : string (optional) the ID of the credentials holding information about a user with read-access to the JDBC data source (see Credentials).

output_settingsdict (optional)

A dict configuring how scored data is to be saved. Supported options:

type : string, either localFile, s3, azure, gcp, jdbc, snowflake, synapse or bigquery

To save scored data to a local file, add this parameters to the settings:

path : string (optional), path to save the scored data as CSV. If a path is not specified, you must download the scored data yourself with job.download(). If a path is specified, the call will block until the job is done. if there are no other jobs currently processing for the targeted prediction instance, uploading, scoring, downloading will happen in parallel without waiting for a full job to complete. Otherwise, it will still block, but start downloading the scored data as soon as it starts generating data. This is the fastest method to get predictions.

To save scored data to S3, add the next parameters to the settings:

url : string, the URL for storing the results (e.g.: s3://bucket/key)

credential_id : string (optional)

endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default)

To save scored data to JDBC, add the next parameters to the settings:

data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).

table : string, the name of specified database table.

schema : string (optional), the name of specified database schema.

catalog : string (optional), (new in v2.22) the name of specified database catalog.

statement_type : string, the type of insertion statement to create, one of datarobot.enums.AVAILABLE_STATEMENT_TYPES.

update_columns : list(string) (optional), a list of strings containing those column names to be updated in case statement_type is set to a value related to update or upsert.

where_columns : list(string) (optional), a list of strings containing those column names to be selected in case statement_type is set to a value related to insert or update.

credential_id : string, the ID of the credentials holding information about a user with write-access to the JDBC data source (see Credentials).

create_table_if_not_exists : bool (optional), If no existing table is detected, attempt to create it before writing data with the strategy defined in the statementType parameter.

csv_settingsdict (optional)

CSV intake and output settings. Supported options:

delimiter : string (optional, default ,), fields are delimited by this character. Use the string tab to denote TSV (TAB separated values). Must be either a one-character string or the string tab.
quotechar : string (optional, default “), fields containing the delimiter must be quoted using this character.
encoding : string (optional, default utf-8), encoding for the CSV files. For example (but not limited to): shift_jis, latin_1 or mskanji.

timeseries_settingsdict (optional)

Configuration for time-series scoring. Supported options:

type : string, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point or rows in the dataset without target. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.
forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions, by default value will be inferred from the dataset. May be passed if timeseries_settings.type=forecast.
predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

num_concurrentint (optional)

Number of concurrent chunks to score simultaneously. Defaults to the available number of cores of the deployment. Lower it to leave resources for real-time scoring.

chunk_sizestring or int (optional)

Which strategy should be used to determine the chunk size. Can be either a named strategy or a fixed size in bytes. - auto: use fixed or dynamic based on flipper - fixed: use 1MB for explanations, 5MB for regular requests - dynamic: use dynamic chunk sizes - int: use this many bytes per chunk

passthrough_columnslist[string] (optional)

Keep these columns from the scoring dataset in the scored dataset. This is useful for correlating predictions with source data.

passthrough_columns_setstring (optional)

To pass through every column from the scoring dataset, set this to all. Takes precedence over passthrough_columns if set.

max_explanationsint (optional)

Compute prediction explanations for this amount of features.

max_ngram_explanationsint or str (optional)

Compute text explanations for this amount of ngrams. Set to all to return all ngram explanations, or set to a positive integer value to limit the amount of ngram explanations returned. By default no ngram explanations will be computed and returned.

threshold_highfloat (optional)

Only compute prediction explanations for predictions above this threshold. Can be combined with threshold_low.

threshold_lowfloat (optional)

Only compute prediction explanations for predictions below this threshold. Can be combined with threshold_high.

explanations_modePredictionExplanationsMode, optional

Mode of prediction explanations calculation for multiclass and clustering models, if not specified - server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).

prediction_warning_enabledboolean (optional)

Add prediction warnings to the scored data. Currently only supported for regression models.

include_prediction_statusboolean (optional)

Include the prediction_status column in the output, defaults to False.

skip_drift_trackingboolean (optional)

Skips drift tracking on any predictions made from this job. This is useful when running non-production workloads to not affect drift tracking and cause unnecessary alerts. Defaults to False.

prediction_instancedict (optional)

Defaults to instance specified by deployment or system configuration. Supported options:

hostName : string

sslEnabled : boolean (optional, default true). Set to false to run prediction requests from the batch prediction job without SSL.

datarobotKey : string (optional), if running a job against a prediction instance in the Managed AI Cloud, you must provide the organization level DataRobot-Key

apiKey : string (optional), by default, prediction requests will use the API key of the user that created the job. This allows you to make requests on behalf of other users.

abort_on_errorboolean (optional)

Default behavior is to abort the job if too many rows fail scoring. This will free up resources for other jobs that may score successfully. Set to false to unconditionally score every row no matter how many errors are encountered. Defaults to True.

column_names_remappingdict (optional)

Mapping with column renaming for output table. Defaults to {}.

include_probabilitiesboolean (optional)

Flag that enables returning of all probability columns. Defaults to True.

include_probabilities_classeslist (optional)

List the subset of classes if a user doesn’t want all the classes. Defaults to [].

download_timeoutint (optional)

New in version 2.22.

If using localFile output, wait this many seconds for the download to become available. See download().

download_read_timeoutint (optional, default 660)

New in version 2.22.

If using localFile output, wait this many seconds for the server to respond between chunks.

upload_read_timeout: int (optional, default 600)

New in version 2.28.

If using localFile intake, wait this many seconds for the server to respond after whole dataset upload.

Return type: BatchPredictionJob

classmethod apply_time_series_data_prep_and_score(deployment, intake_settings, timeseries_settings, **kwargs)¶

Prepare the dataset with time series data prep, create new batch prediction job, upload the scoring dataset, and return a batch prediction job.

The supported intake_settings are of type localFile or dataset.

For timeseries_settings of type forecast the forecast_point must be specified.

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

New in version v3.1.

Returns

BatchPredictionJob: Instance of BatchPredictionJob

Raises

InvalidUsageError: If the deployment does not support time series data prep. If the intake type is not supported for time series data prep.

Attributes

deploymentDeployment

Deployment which will be used for scoring.

intake_settingsdict

A dict configuring where data is coming from. Supported options:

type : string, either localFile, dataset

Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a Dataset object.

To score from a local file, add this parameter to the settings:

file : file-like object, string path to file or a pandas.DataFrame of scoring data.

timeseries_settingsdict

Configuration for time-series scoring. Supported options:

type : string, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.
forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions. Must be passed if timeseries_settings.type=forecast.
predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

Return type: BatchPredictionJob

classmethod score_to_file(deployment, intake_path, output_path, **kwargs)¶

Create new batch prediction job, upload the scoring dataset and download the scored CSV file concurrently.

Will block until the entire file is scored.

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Returns

BatchPredictionJob: Instance of BatchPredictionJob

Attributes

deploymentDeployment or string ID: Deployment which will be used for scoring.
intake_pathfile-like object/string path to file/pandas.DataFrame: Scoring data
output_pathstr: Filename to save the result under

classmethod apply_time_series_data_prep_and_score_to_file(deployment, intake_path, output_path, timeseries_settings, **kwargs)¶

Prepare the input dataset with time series data prep. Then, create a new batch prediction job using the prepared AI catalog item as input and concurrently download the scored CSV file.

The function call will return when the entire file is scored.

For timeseries_settings of type forecast the forecast_point must be specified.

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

New in version v3.1.

Returns

BatchPredictionJob: Instance of BatchPredictionJob.

Raises

InvalidUsageError: If the deployment does not support time series data prep.

Attributes

deploymentDeployment

The deployment which will be used for scoring.

intake_pathfile-like object/string path to file/pandas.DataFrame

The scoring data.

output_pathstr

The filename under which you save the result.

timeseries_settingsdict

Configuration for time-series scoring. Supported options:

type : string, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.
forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions. Must be passed if timeseries_settings.type=forecast.
predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

Return type: BatchPredictionJob

classmethod score_s3(deployment, source_url, destination_url, credential=None, endpoint_url=None, **kwargs)¶

Create new batch prediction job, with a scoring dataset from S3 and writing the result back to S3.

This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job)

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Returns

BatchPredictionJob: Instance of BatchPredictionJob

Attributes

deploymentDeployment or string ID: Deployment which will be used for scoring.
source_urlstring: The URL for the prediction dataset (e.g.: s3://bucket/key)
destination_urlstring: The URL for the scored dataset (e.g.: s3://bucket/key)
credentialstring or Credential (optional): The AWS Credential object or credential id
endpoint_urlstring (optional): Any non-default endpoint URL for S3 access (omit to use the default)

classmethod score_azure(deployment, source_url, destination_url, credential=None, **kwargs)¶

Create new batch prediction job, with a scoring dataset from Azure blob storage and writing the result back to Azure blob storage.

This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job).

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Returns

BatchPredictionJob: Instance of BatchPredictionJob

Attributes

deploymentDeployment or string ID: Deployment which will be used for scoring.
source_urlstring: The URL for the prediction dataset (e.g.: https://storage_account.blob.endpoint/container/blob_name)
destination_urlstring: The URL for the scored dataset (e.g.: https://storage_account.blob.endpoint/container/blob_name)
credentialstring or Credential (optional): The Azure Credential object or credential id

classmethod score_gcp(deployment, source_url, destination_url, credential=None, **kwargs)¶

Create new batch prediction job, with a scoring dataset from Google Cloud Storage and writing the result back to one.

This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job).

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Returns

BatchPredictionJob: Instance of BatchPredictionJob

Attributes

deploymentDeployment or string ID: Deployment which will be used for scoring.
source_urlstring: The URL for the prediction dataset (e.g.: http(s)://storage.googleapis.com/[bucket]/[object])
destination_urlstring: The URL for the scored dataset (e.g.: http(s)://storage.googleapis.com/[bucket]/[object])
credentialstring or Credential (optional): The GCP Credential object or credential id

classmethod score_from_existing(batch_prediction_job_id)¶

Create a new batch prediction job based on the settings from a previously created one

Returns

BatchPredictionJob: Instance of BatchPredictionJob

Attributes

batch_prediction_job_id: str: ID of the previous batch prediction job

Return type: BatchPredictionJob

classmethod score_pandas(deployment, df, read_timeout=660, **kwargs)¶

Run a batch prediction job, with a scoring dataset from a pandas dataframe. The output from the prediction will be joined to the passed DataFrame and returned.

Use columnNamesRemapping to drop or rename columns in the output

This method blocks until the job has completed or raises an exception on errors.

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Returns

BatchPredictionJob: Instance of BatchPredictonJob
pandas.DataFrame: The original dataframe merged with the predictions

Attributes

deploymentDeployment or string ID: Deployment which will be used for scoring.
dfpandas.DataFrame: The dataframe to score

Return type: Tuple[BatchPredictionJob, DataFrame]

classmethod score_with_leaderboard_model(model, intake_settings=None, output_settings=None, csv_settings=None, timeseries_settings=None, passthrough_columns=None, passthrough_columns_set=None, max_explanations=None, max_ngram_explanations=None, threshold_high=None, threshold_low=None, prediction_warning_enabled=None, include_prediction_status=False, abort_on_error=True, column_names_remapping=None, include_probabilities=True, include_probabilities_classes=None, download_timeout=120, download_read_timeout=660, upload_read_timeout=600, explanations_mode=None)¶

Creates a new batch prediction job for a Leaderboard model by uploading the scoring dataset. Returns a batch prediction job.

The default intake and output options are both localFile, which requires the caller to pass the file parameter and either download the results using the download() method afterwards or pass a path to a file where the scored data will be downloaded to.

Returns

BatchPredictionJob: Instance of BatchPredictionJob

Attributes

modelModel or DatetimeModel or string ID

Model which will be used for scoring.

intake_settingsdict (optional)

A dict configuring how data is coming from. Supported options:

type : string, either localFile, dataset, or dss.

Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a dr.Dataset object.

To score from a local file, add the this parameter to the settings:

file : file-like object, string path to file or a pandas.DataFrame of scoring data.

To score subset of training data, use dss intake type and specify following parameters:

project_id : project to fetch training data from. Access to project is required.

partition : subset of training data to score, one of datarobot.enums.TrainingDataSubsets.

output_settingsdict (optional)

A dict configuring how scored data is to be saved. Supported options:

type : string, localFile

To save scored data to a local file, add this parameters to the settings:

path : string (optional) The path to save the scored data as a CSV file. If a path is not specified, you must download the scored data yourself with job.download(). If a path is specified, the call is blocked until the job is done. If there are no other jobs currently processing for the targeted prediction instance, uploading, scoring, and downloading will happen in parallel without waiting for a full job to complete. Otherwise, it will still block, but start downloading the scored data as soon as it starts generating data. This is the fastest method to get predictions.

csv_settingsdict (optional)

CSV intake and output settings. Supported options:

delimiter : string (optional, default ,), fields are delimited by this character. Use the string tab to denote TSV (TAB separated values). Must be either a one-character string or the string tab.
quotechar : string (optional, default “), fields containing the delimiter must be quoted using this character.
encoding : string (optional, default utf-8), encoding for the CSV files. For example (but not limited to): shift_jis, latin_1 or mskanji.

timeseries_settingsdict (optional)

Configuration for time-series scoring. Supported options:

type : string, must be forecast, historical (default if not passed is forecast), or training. forecast mode makes predictions using forecast_point or rows in the dataset without target. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range. training mode is a special case for predictions on subsets of training data. Note, that it must be used in conjunction with dss intake type only.
forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions, by default value will be inferred from the dataset. May be passed if timeseries_settings.type=forecast.
predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

passthrough_columnslist[string] (optional)

Keep these columns from the scoring dataset in the scored dataset. This is useful for correlating predictions with source data.

passthrough_columns_setstring (optional)

To pass through every column from the scoring dataset, set this to all. Takes precedence over passthrough_columns if set.

max_explanationsint (optional)

Compute prediction explanations for this amount of features.

max_ngram_explanationsint or str (optional)

Compute text explanations for this amount of ngrams. Set to all to return all ngram explanations, or set to a positive integer value to limit the amount of ngram explanations returned. By default no ngram explanations will be computed and returned.

threshold_highfloat (optional)

Only compute prediction explanations for predictions above this threshold. Can be combined with threshold_low.

threshold_lowfloat (optional)

Only compute prediction explanations for predictions below this threshold. Can be combined with threshold_high.

explanations_modePredictionExplanationsMode, optional

Mode of prediction explanations calculation for multiclass and clustering models, if not specified - server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).

prediction_warning_enabledboolean (optional)

Add prediction warnings to the scored data. Currently only supported for regression models.

include_prediction_statusboolean (optional)

Include the prediction_status column in the output, defaults to False.

abort_on_errorboolean (optional)

Default behavior is to abort the job if too many rows fail scoring. This will free up resources for other jobs that may score successfully. Set to false to unconditionally score every row no matter how many errors are encountered. Defaults to True.

column_names_remappingdict (optional)

Mapping with column renaming for output table. Defaults to {}.

include_probabilitiesboolean (optional)

Flag that enables returning of all probability columns. Defaults to True.

include_probabilities_classeslist (optional)

List the subset of classes if you do not want all the classes. Defaults to [].

download_timeoutint (optional)

New in version 2.22.

If using localFile output, wait this many seconds for the download to become available. See download().

download_read_timeoutint (optional, default 660)

New in version 2.22.

If using localFile output, wait this many seconds for the server to respond between chunks.

upload_read_timeout: int (optional, default 600)

New in version 2.28.

If using localFile intake, wait this many seconds for the server to respond after whole dataset upload.

Return type: BatchPredictionJob

classmethod get(batch_prediction_job_id)¶

Get batch prediction job

Returns

BatchPredictionJob: Instance of BatchPredictionJob

Attributes

batch_prediction_job_id: str: ID of batch prediction job

Return type: BatchPredictionJob

download(fileobj, timeout=120, read_timeout=660)¶

Downloads the CSV result of a prediction job

Attributes

fileobj: A file-like object where the CSV prediction results will be

written to. Examples include an in-memory buffer (e.g., io.BytesIO) or a file on disk (opened for binary writing).

timeoutint (optional, default 120)

New in version 2.22.

Seconds to wait for the download to become available.

The download will not be available before the job has started processing. In case other jobs are occupying the queue, processing may not start immediately.

If the timeout is reached, the job will be aborted and RuntimeError is raised.

Set to -1 to wait infinitely.

read_timeoutint (optional, default 660)

New in version 2.22.

Seconds to wait for the server to respond between chunks.

Return type: None

delete(ignore_404_errors=False)¶

Cancel this job. If this job has not finished running, it will be removed and canceled.

Return type: None

get_status()¶

Get status of batch prediction job

Returns

BatchPredictionJob status data: Dict with job status

classmethod list_by_status(statuses=None)¶

Get jobs collection for specific set of statuses

Returns

BatchPredictionJob statuses: List of job statuses dicts with specific statuses

Attributes

statuses: List of statuses to filter jobs ([ABORTED|COMPLETED…]) if statuses is not provided, returns all jobs for user

Return type: List[BatchPredictionJob]

class datarobot.models.BatchPredictionJobDefinition(id=None, name=None, enabled=None, schedule=None, batch_prediction_job=None, created=None, updated=None, created_by=None, updated_by=None, last_failed_run_time=None, last_successful_run_time=None, last_started_job_status=None, last_scheduled_run_time=None)¶

classmethod get(batch_prediction_job_definition_id)¶

Get batch prediction job definition

Returns

BatchPredictionJobDefinition: Instance of BatchPredictionJobDefinition

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.get('5a8ac9ab07a57a0001be501f')
>>> definition
BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)

Attributes

batch_prediction_job_definition_id: str: ID of batch prediction job definition

Return type: BatchPredictionJobDefinition

classmethod list()¶

Get job all definitions

Returns

List[BatchPredictionJobDefinition]: List of job definitions the user has access to see

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.list()
>>> definition
[
    BatchPredictionJobDefinition(60912e09fd1f04e832a575c1),
    BatchPredictionJobDefinition(6086ba053f3ef731e81af3ca)
]

Return type: List[BatchPredictionJobDefinition]

classmethod create(enabled, batch_prediction_job, name=None, schedule=None)¶

Creates a new batch prediction job definition to be run either at scheduled interval or as a manual run.

Returns

BatchPredictionJobDefinition: Instance of BatchPredictionJobDefinition

Examples

>>> import datarobot as dr
>>> job_spec = {
...    "num_concurrent": 4,
...    "deployment_id": "foobar",
...    "intake_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...    "output_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...}
>>> schedule = {
...    "day_of_week": [
...        1
...    ],
...    "month": [
...        "*"
...    ],
...    "hour": [
...        16
...    ],
...    "minute": [
...        0
...    ],
...    "day_of_month": [
...        1
...    ]
...}
>>> definition = BatchPredictionJobDefinition.create(
...    enabled=False,
...    batch_prediction_job=job_spec,
...    name="some_definition_name",
...    schedule=schedule
... )
>>> definition
BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)

Attributes

enabledbool (default False)

Whether or not the definition should be active on a scheduled basis. If True, schedule is required.

batch_prediction_job: dict

The job specifications for your batch prediction job. It requires the same job input parameters as used with score(), only it will not initialize a job scoring, only store it as a definition for later use.

namestring (optional)

The name you want your job to be identified with. Must be unique across the organization’s existing jobs. If you don’t supply a name, a random one will be generated for you.

scheduledict (optional)

The schedule payload defines at what intervals the job should run, which can be combined in various ways to construct complex scheduling terms if needed. In all of the elements in the objects, you can supply either an asterisk ["*"] denoting “every” time denomination or an array of integers (e.g. [1, 2, 3]) to define a specific interval.

The schedule payload is split up in the following items:

Minute:

The minute(s) of the day that the job will run. Allowed values are either ["*"] meaning every minute of the day or [0 ... 59]

Hour: The hour(s) of the day that the job will run. Allowed values are either ["*"] meaning every hour of the day or [0 ... 23].

Day of Month: The date(s) of the month that the job will run. Allowed values are either [1 ... 31] or ["*"] for all days of the month. This field is additive with dayOfWeek, meaning the job will run both on the date(s) defined in this field and the day specified by dayOfWeek (for example, dates 1st, 2nd, 3rd, plus every Tuesday). If dayOfMonth is set to ["*"] and dayOfWeek is defined, the scheduler will trigger on every day of the month that matches dayOfWeek (for example, Tuesday the 2nd, 9th, 16th, 23rd, 30th). Invalid dates such as February 31st are ignored.

Month: The month(s) of the year that the job will run. Allowed values are either [1 ... 12] or ["*"] for all months of the year. Strings, either 3-letter abbreviations or the full name of the month, can be used interchangeably (e.g., “jan” or “october”). Months that are not compatible with dayOfMonth are ignored, for example {"dayOfMonth": [31], "month":["feb"]}

Day of Week: The day(s) of the week that the job will run. Allowed values are [0 .. 6], where (Sunday=0), or ["*"], for all days of the week. Strings, either 3-letter abbreviations or the full name of the day, can be used interchangeably (e.g., “sunday”, “Sunday”, “sun”, or “Sun”, all map to [0]. This field is additive with dayOfMonth, meaning the job will run both on the date specified by dayOfMonth and the day defined in this field.

Return type: BatchPredictionJobDefinition

update(enabled, batch_prediction_job=None, name=None, schedule=None)¶

Updates a job definition with the changed specs.

Takes the same input as create()

Returns

BatchPredictionJobDefinition: Instance of the updated BatchPredictionJobDefinition

Examples

>>> import datarobot as dr
>>> job_spec = {
...    "num_concurrent": 5,
...    "deployment_id": "foobar_new",
...    "intake_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...    "output_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...}
>>> schedule = {
...    "day_of_week": [
...        1
...    ],
...    "month": [
...        "*"
...    ],
...    "hour": [
...        "*"
...    ],
...    "minute": [
...        30, 59
...    ],
...    "day_of_month": [
...        1, 2, 6
...    ]
...}
>>> definition = BatchPredictionJobDefinition.create(
...    enabled=False,
...    batch_prediction_job=job_spec,
...    name="updated_definition_name",
...    schedule=schedule
... )
>>> definition
BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)

Attributes

enabledbool (default False): Same as enabled in create().
batch_prediction_job: dict: Same as batch_prediction_job in create().
namestring (optional): Same as name in create().
scheduledict: Same as schedule in create().

Return type: BatchPredictionJobDefinition

run_on_schedule(schedule)¶

Sets the run schedule of an already created job definition.

If the job was previously not enabled, this will also set the job to enabled.

Returns

BatchPredictionJobDefinition: Instance of the updated BatchPredictionJobDefinition with the new / updated schedule.

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.create('...')
>>> schedule = {
...    "day_of_week": [
...        1
...    ],
...    "month": [
...        "*"
...    ],
...    "hour": [
...        "*"
...    ],
...    "minute": [
...        30, 59
...    ],
...    "day_of_month": [
...        1, 2, 6
...    ]
...}
>>> definition.run_on_schedule(schedule)
BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)

Attributes

scheduledict: Same as schedule in create().

Return type: BatchPredictionJobDefinition

run_once()¶

Manually submits a batch prediction job to the queue, based off of an already created job definition.

Returns

BatchPredictionJob: Instance of BatchPredictionJob

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.create('...')
>>> job = definition.run_once()
>>> job.wait_for_completion()

Return type: BatchPredictionJob

delete()¶

Deletes the job definition and disables any future schedules of this job if any. If a scheduled job is currently running, this will not be cancelled.

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.get('5a8ac9ab07a57a0001be501f')
>>> definition.delete()

Return type: None

Batch Monitoring¶

class datarobot.models.BatchMonitoringJob(data, completed_resource_url=None)¶

A Batch Monitoring Job is used to monitor data sets outside DataRobot app.

Attributes

idstr: the id of the job

classmethod get(project_id, job_id)¶

Get batch monitoring job

Returns

BatchMonitoringJob: Instance of BatchMonitoringJob

Attributes

job_id: str: ID of batch job

Return type: BatchMonitoringJob

download(fileobj, timeout=120, read_timeout=660)¶

Downloads the results of a monitoring job as a CSV.

Attributes

fileobj: A file-like object where the CSV monitoring results will be

written to. Examples include an in-memory buffer (e.g., io.BytesIO) or a file on disk (opened for binary writing).

timeoutint (optional, default 120)

Seconds to wait for the download to become available.

The download will not be available before the job has started processing. In case other jobs are occupying the queue, processing may not start immediately.

If the timeout is reached, the job will be aborted and RuntimeError is raised.

Set to -1 to wait infinitely.

read_timeoutint (optional, default 660)

Seconds to wait for the server to respond between chunks.

Return type: None

classmethod run(deployment, intake_settings=None, output_settings=None, csv_settings=None, num_concurrent=None, chunk_size=None, abort_on_error=True, monitoring_aggregation=None, monitoring_columns=None, monitoring_output_settings=None, download_timeout=120, download_read_timeout=660, upload_read_timeout=600)¶

Create new batch monitoring job, upload the dataset, and return a batch monitoring job.

Returns

BatchMonitoringJob: Instance of BatchMonitoringJob

Examples

>>> import datarobot as dr
>>> job_spec = {
...     "intake_settings": {
...         "type": "jdbc",
...         "data_store_id": "645043933d4fbc3215f17e34",
...         "catalog": "SANDBOX",
...         "table": "10kDiabetes_output_actuals",
...         "schema": "SCORING_CODE_UDF_SCHEMA",
...         "credential_id": "645043b61a158045f66fb329"
...     },
>>>     "monitoring_columns": {
...         "predictions_columns": [
...             {
...                 "class_name": "True",
...                 "column_name": "readmitted_True_PREDICTION"
...             },
...             {
...                 "class_name": "False",
...                 "column_name": "readmitted_False_PREDICTION"
...             }
...         ],
...         "association_id_column": "rowID",
...         "actuals_value_column": "ACTUALS"
...     }
... }
>>> deployment_id = "foobar"
>>> job = dr.BatchMonitoringJob.run(deployment_id, **job_spec)
>>> job.wait_for_completion()

Attributes

deploymentDeployment or string ID

Deployment which will be used for monitoring.

intake_settingsdict

A dict configuring how data is coming from. Supported options:

type : string, either localFile, s3, azure, gcp, dataset, jdbc snowflake, synapse or bigquery

Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a dr.Dataset object.

To monitor from a local file, add this parameter to the settings:

file : A file-like object, string path to a file or a pandas.DataFrame of scoring data.

To monitor from S3, add the next parameters to the settings:

url : string, the URL to score (e.g.: s3://bucket/key).

credential_id : string (optional).

endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default).

To monitor from JDBC, add the next parameters to the settings:

data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).

query : string (optional if table, schema and/or catalog is specified), a self-supplied SELECT statement of the data set you wish to predict.

table : string (optional if query is specified), the name of specified database table.

schema : string (optional if query is specified), the name of specified database schema.

catalog : string (optional if query is specified), (new in v2.22) the name of specified database catalog.

fetch_size : int (optional), Changing the fetchSize can be used to balance throughput and memory usage.

credential_id : string (optional) the ID of the credentials holding information about a user with read-access to the JDBC data source (see Credentials).

output_settingsdict (optional)

A dict configuring how monitored data is to be saved. Supported options:

type : string, either localFile, s3, azure, gcp, jdbc, snowflake, synapse or bigquery

To save monitored data to a local file, add parameters to the settings:

path : string (optional), path to save the scored data as CSV. If a path is not specified, you must download the scored data yourself with job.download(). If a path is specified, the call will block until the job is done. if there are no other jobs currently processing for the targeted prediction instance, uploading, scoring, downloading will happen in parallel without waiting for a full job to complete. Otherwise, it will still block, but start downloading the scored data as soon as it starts generating data. This is the fastest method to get predictions.

To save monitored data to S3, add the next parameters to the settings:

url : string, the URL for storing the results (e.g.: s3://bucket/key).

credential_id : string (optional).

endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default).

To save monitored data to JDBC, add the next parameters to the settings:

data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).

table : string, the name of specified database table.

schema : string (optional), the name of specified database schema.

catalog : string (optional), (new in v2.22) the name of specified database catalog.

statement_type : string, the type of insertion statement to create, one of datarobot.enums.AVAILABLE_STATEMENT_TYPES.

update_columns : list(string) (optional), a list of strings containing those column names to be updated in case statement_type is set to a value related to update or upsert.

where_columns : list(string) (optional), a list of strings containing those column names to be selected in case statement_type is set to a value related to insert or update.

credential_id : string, the ID of the credentials holding information about a user with write-access to the JDBC data source (see Credentials).

create_table_if_not_exists : bool (optional), If no existing table is detected, attempt to create it before writing data with the strategy defined in the statementType parameter.

csv_settingsdict (optional)

CSV intake and output settings. Supported options:

delimiter : string (optional, default ,), fields are delimited by this character. Use the string tab to denote TSV (TAB separated values). Must be either a one-character string or the string tab.
quotechar : string (optional, default “), fields containing the delimiter must be quoted using this character.
encoding : string (optional, default utf-8), encoding for the CSV files. For example (but not limited to): shift_jis, latin_1 or mskanji.

num_concurrentint (optional)

Number of concurrent chunks to score simultaneously. Defaults to the available number of cores of the deployment. Lower it to leave resources for real-time scoring.

chunk_sizestring or int (optional)

Which strategy should be used to determine the chunk size. Can be either a named strategy or a fixed size in bytes. - auto: use fixed or dynamic based on flipper. - fixed: use 1MB for explanations, 5MB for regular requests. - dynamic: use dynamic chunk sizes. - int: use this many bytes per chunk.

abort_on_errorboolean (optional)

Default behavior is to abort the job if too many rows fail scoring. This will free up resources for other jobs that may score successfully. Set to false to unconditionally score every row no matter how many errors are encountered. Defaults to True.

download_timeoutint (optional)

New in version 2.22.

If using localFile output, wait this many seconds for the download to become available. See download().

download_read_timeoutint (optional, default 660)

New in version 2.22.

If using localFile output, wait this many seconds for the server to respond between chunks.

upload_read_timeout: int (optional, default 600)

New in version 2.28.

If using localFile intake, wait this many seconds for the server to respond after whole dataset upload.

Return type: BatchMonitoringJob

cancel(ignore_404_errors=False)¶

Cancel this job. If this job has not finished running, it will be removed and canceled.

Return type: None

get_status()¶

Get status of batch monitoring job

Returns

BatchMonitoringJob status data: Dict with job status

Return type: Any

class datarobot.models.BatchMonitoringJobDefinition(id=None, name=None, enabled=None, schedule=None, batch_monitoring_job=None, created=None, updated=None, created_by=None, updated_by=None, last_failed_run_time=None, last_successful_run_time=None, last_started_job_status=None, last_scheduled_run_time=None)¶

classmethod get(batch_monitoring_job_definition_id)¶

Get batch monitoring job definition

Returns

BatchMonitoringJobDefinition: Instance of BatchMonitoringJobDefinition

Examples

>>> import datarobot as dr
>>> definition = dr.BatchMonitoringJobDefinition.get('5a8ac9ab07a57a0001be501f')
>>> definition
BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1)

Attributes

batch_monitoring_job_definition_id: str: ID of batch monitoring job definition

Return type: BatchMonitoringJobDefinition

classmethod list()¶

Get job all monitoring job definitions

Returns

List[BatchMonitoringJobDefinition]: List of job definitions the user has access to see

Examples

>>> import datarobot as dr
>>> definition = dr.BatchMonitoringJobDefinition.list()
>>> definition
[
    BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1),
    BatchMonitoringJobDefinition(6086ba053f3ef731e81af3ca)
]

Return type: List[BatchMonitoringJobDefinition]

classmethod create(enabled, batch_monitoring_job, name=None, schedule=None)¶

Creates a new batch monitoring job definition to be run either at scheduled interval or as a manual run.

Returns

BatchMonitoringJobDefinition: Instance of BatchMonitoringJobDefinition

Examples

>>> import datarobot as dr
>>> job_spec = {
...    "num_concurrent": 4,
...    "deployment_id": "foobar",
...    "intake_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...    "output_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...}
>>> schedule = {
...    "day_of_week": [
...        1
...    ],
...    "month": [
...        "*"
...    ],
...    "hour": [
...        16
...    ],
...    "minute": [
...        0
...    ],
...    "day_of_month": [
...        1
...    ]
...}
>>> definition = BatchMonitoringJobDefinition.create(
...    enabled=False,
...    batch_monitoring_job=job_spec,
...    name="some_definition_name",
...    schedule=schedule
... )
>>> definition
BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1)

Attributes

enabledbool (default False)

Whether the definition should be active on a scheduled basis. If True, schedule is required.

batch_monitoring_job: dict

The job specifications for your batch monitoring job. It requires the same job input parameters as used with BatchMonitoringJob

namestring (optional)

The name you want your job to be identified with. Must be unique across the organization’s existing jobs. If you don’t supply a name, a random one will be generated for you.

scheduledict (optional)

The schedule payload defines at what intervals the job should run, which can be combined in various ways to construct complex scheduling terms if needed. In all the elements in the objects, you can supply either an asterisk ["*"] denoting “every” time denomination or an array of integers (e.g. [1, 2, 3]) to define a specific interval.

The schedule payload is split up in the following items:

Minute:

The minute(s) of the day that the job will run. Allowed values are either ["*"] meaning every minute of the day or [0 ... 59]

Hour: The hour(s) of the day that the job will run. Allowed values are either ["*"] meaning every hour of the day or [0 ... 23].

Day of Month: The date(s) of the month that the job will run. Allowed values are either [1 ... 31] or ["*"] for all days of the month. This field is additive with dayOfWeek, meaning the job will run both on the date(s) defined in this field and the day specified by dayOfWeek (for example, dates 1st, 2nd, 3rd, plus every Tuesday). If dayOfMonth is set to ["*"] and dayOfWeek is defined, the scheduler will trigger on every day of the month that matches dayOfWeek (for example, Tuesday the 2nd, 9th, 16th, 23rd, 30th). Invalid dates such as February 31st are ignored.

Month: The month(s) of the year that the job will run. Allowed values are either [1 ... 12] or ["*"] for all months of the year. Strings, either 3-letter abbreviations or the full name of the month, can be used interchangeably (e.g., “jan” or “october”). Months that are not compatible with dayOfMonth are ignored, for example {"dayOfMonth": [31], "month":["feb"]}

Day of Week: The day(s) of the week that the job will run. Allowed values are [0 .. 6], where (Sunday=0), or ["*"], for all days of the week. Strings, either 3-letter abbreviations or the full name of the day, can be used interchangeably (e.g., “sunday”, “Sunday”, “sun”, or “Sun”, all map to [0]. This field is additive with dayOfMonth, meaning the job will run both on the date specified by dayOfMonth and the day defined in this field.

Return type: BatchMonitoringJobDefinition

update(enabled, batch_monitoring_job=None, name=None, schedule=None)¶

Updates a job definition with the changed specs.

Takes the same input as create()

Returns

BatchMonitoringJobDefinition: Instance of the updated BatchMonitoringJobDefinition

Examples

>>> import datarobot as dr
>>> job_spec = {
...    "num_concurrent": 5,
...    "deployment_id": "foobar_new",
...    "intake_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...    "output_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...}
>>> schedule = {
...    "day_of_week": [
...        1
...    ],
...    "month": [
...        "*"
...    ],
...    "hour": [
...        "*"
...    ],
...    "minute": [
...        30, 59
...    ],
...    "day_of_month": [
...        1, 2, 6
...    ]
...}
>>> definition = BatchMonitoringJobDefinition.create(
...    enabled=False,
...    batch_monitoring_job=job_spec,
...    name="updated_definition_name",
...    schedule=schedule
... )
>>> definition
BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1)

Attributes

enabledbool (default False): Same as enabled in create().
batch_monitoring_job: dict: Same as batch_monitoring_job in create().
namestring (optional): Same as name in create().
scheduledict: Same as schedule in create().

Return type: BatchMonitoringJobDefinition

run_on_schedule(schedule)¶

Sets the run schedule of an already created job definition.

If the job was previously not enabled, this will also set the job to enabled.

Returns

BatchMonitoringJobDefinition: Instance of the updated BatchMonitoringJobDefinition with the new / updated schedule.

Examples

>>> import datarobot as dr
>>> definition = dr.BatchMonitoringJobDefinition.create('...')
>>> schedule = {
...    "day_of_week": [
...        1
...    ],
...    "month": [
...        "*"
...    ],
...    "hour": [
...        "*"
...    ],
...    "minute": [
...        30, 59
...    ],
...    "day_of_month": [
...        1, 2, 6
...    ]
...}
>>> definition.run_on_schedule(schedule)
BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1)

Attributes

scheduledict: Same as schedule in create().

Return type: BatchMonitoringJobDefinition

run_once()¶

Manually submits a batch monitoring job to the queue, based off of an already created job definition.

Returns

BatchMonitoringJob: Instance of BatchMonitoringJob

Examples

>>> import datarobot as dr
>>> definition = dr.BatchMonitoringJobDefinition.create('...')
>>> job = definition.run_once()
>>> job.wait_for_completion()

Return type: BatchMonitoringJob

delete()¶

Deletes the job definition and disables any future schedules of this job if any. If a scheduled job is currently running, this will not be cancelled.

Examples

>>> import datarobot as dr
>>> definition = dr.BatchMonitoringJobDefinition.get('5a8ac9ab07a57a0001be501f')
>>> definition.delete()

Return type: None

Status Check Job¶

class datarobot.models.StatusCheckJob(job_id, resource_type=None)¶

Tracks asynchronous task status

Attributes

job_idstr: The ID of the status the job belongs to.

wait_for_completion(max_wait=600)¶

Waits for job to complete.

Parameters

max_waitint, optional: How long to wait for the job to finish. If the time expires, DataRobot returns the current status.

Returns

statusJobStatusResult: Returns the current status of the job.

Return type: JobStatusResult

get_status()¶

Retrieve JobStatusResult object with the latest job status data from the server.

Return type: JobStatusResult

get_result_when_complete(max_wait=600)¶

Wait for the job to complete, then attempt to convert the resulting json into an object of type self.resource_type Returns ——- A newly created resource of type self.resource_type

Return type: APIObject

class datarobot.models.JobStatusResult(status: Optional[str], status_id: Optional[str], completed_resource_url: Optional[str], message: Optional[str])¶

This class represents a result of status check for submitted async jobs.

property status¶: Alias for field number 0

property status_id¶: Alias for field number 1

property completed_resource_url¶: Alias for field number 2

property message¶: Alias for field number 3

Blueprint¶

class datarobot.models.Blueprint(id=None, processes=None, model_type=None, project_id=None, blueprint_category=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, recommended_featurelist_id=None, supports_composable_ml=None, supports_incremental_learning=None)¶

A Blueprint which can be used to fit models

Attributes

idstr: the id of the blueprint
processeslist of str: the processes used by the blueprint
model_typestr: the model produced by the blueprint
project_idstr: the project the blueprint belongs to
blueprint_categorystr: (New in version v2.6) Describes the category of the blueprint and the kind of model it produces.
recommended_featurelist_id: str or null: (New in v2.18) The ID of the feature list recommended for this blueprint. If this field is not present, then there is no recommended feature list.
supports_composable_mlbool or None: (New in version v2.26) whether this blueprint is supported in the Composable ML.
supports_incremental_learningbool or None: (New in version v3.3) whether this blueprint supports incremental learning.

classmethod get(project_id, blueprint_id)¶

Retrieve a blueprint.

Parameters

project_idstr: The project’s id.
blueprint_idstr: Id of blueprint to retrieve.

Returns

blueprintBlueprint: The queried blueprint.

Return type: Blueprint

get_json()¶

Get the blueprint json representation used by this model.

Returns

BlueprintJson: Json representation of the blueprint stages.

Return type: Dict[str, Tuple[List[str], List[str], str]]

get_chart()¶

Retrieve a chart.

Returns

BlueprintChart: The current blueprint chart.

Return type: BlueprintChart

get_documents()¶

Get documentation for tasks used in the blueprint.

Returns

list of BlueprintTaskDocument: All documents available for blueprint.

Return type: List[BlueprintTaskDocument]

classmethod from_data(data)¶

Instantiate an object of this class using a dict.

Parameters

datadict: Correctly snake_cased keys and their values.

Return type: TypeVar(T, bound= APIObject)

classmethod from_server_data(data, keep_attrs=None)¶

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters

datadict: The directly translated dict of JSON from the server. No casing fixes have taken place
keep_attrsiterable: List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

Return type: TypeVar(T, bound= APIObject)

class datarobot.models.BlueprintTaskDocument(title=None, task=None, description=None, parameters=None, links=None, references=None)¶

Document describing a task from a blueprint.

Attributes

titlestr: Title of document.
taskstr: Name of the task described in document.
descriptionstr: Task description.
parameterslist of dict(name, type, description): Parameters that task can receive in human-readable format.
linkslist of dict(name, url): External links used in document
referenceslist of dict(name, url): References used in document. When no link available url equals None.

class datarobot.models.BlueprintChart(nodes, edges)¶

A Blueprint chart that can be used to understand data flow in blueprint.

Attributes

nodeslist of dict (id, label): Chart nodes, id unique in chart.
edgeslist of tuple (id1, id2): Directions of data flow between blueprint chart nodes.

classmethod get(project_id, blueprint_id)¶

Retrieve a blueprint chart.

Parameters

project_idstr: The project’s id.
blueprint_idstr: Id of blueprint to retrieve chart.

Returns

BlueprintChart: The queried blueprint chart.

Return type: BlueprintChart

to_graphviz()¶

Get blueprint chart in graphviz DOT format.

Returns

unicode: String representation of chart in graphviz DOT language.

Return type: str

class datarobot.models.ModelBlueprintChart(nodes, edges)¶

A Blueprint chart that can be used to understand data flow in model. Model blueprint chart represents reduced repository blueprint chart with only elements that used to build this particular model.

Attributes

nodeslist of dict (id, label): Chart nodes, id unique in chart.
edgeslist of tuple (id1, id2): Directions of data flow between blueprint chart nodes.

classmethod get(project_id, model_id)¶

Retrieve a model blueprint chart.

Parameters

project_idstr: The project’s id.
model_idstr: Id of model to retrieve model blueprint chart.

Returns

ModelBlueprintChart: The queried model blueprint chart.

Return type: ModelBlueprintChart

to_graphviz()¶

Get blueprint chart in graphviz DOT format.

Returns

unicode: String representation of chart in graphviz DOT language.

Return type: str

Calendar File¶

class datarobot.CalendarFile(calendar_end_date=None, calendar_start_date=None, created=None, id=None, name=None, num_event_types=None, num_events=None, project_ids=None, role=None, multiseries_id_columns=None)¶

Represents the data for a calendar file.

For more information about calendar files, see the calendar documentation.

Attributes

idstr: The id of the calendar file.
calendar_start_datestr: The earliest date in the calendar.
calendar_end_datestr: The last date in the calendar.
createdstr: The date this calendar was created, i.e. uploaded to DR.
namestr: The name of the calendar.
num_event_typesint: The number of different event types.
num_eventsint: The number of events this calendar has.
project_idslist of strings: A list containing the projectIds of the projects using this calendar.
multiseries_id_columns: list of str or None: A list of columns in calendar which uniquely identify events for different series. Currently, only one column is supported. If multiseries id columns are not provided, calendar is considered to be single series.
rolestr: The access role the user has for this calendar.

classmethod create(file_path, calendar_name=None, multiseries_id_columns=None)¶

Creates a calendar using the given file. For information about calendar files, see the calendar documentation

The provided file must be a CSV in the format:

Date,   Event,          Series ID,    Event Duration
<date>, <event_type>,   <series id>,  <event duration>
<date>, <event_type>,              ,  <event duration>

A header row is required, and the “Series ID” and “Event Duration” columns are optional.

Once the CalendarFile has been created, pass its ID with the DatetimePartitioningSpecification when setting the target for a time series project in order to use it.

Parameters

file_pathstring: A string representing a path to a local csv file.
calendar_namestring, optional: A name to assign to the calendar. Defaults to the name of the file if not provided.
multiseries_id_columnslist of str or None: A list of the names of multiseries id columns to define which series an event belongs to. Currently only one multiseries id column is supported.

Returns

calendar_fileCalendarFile: Instance with initialized data.

Raises

AsyncProcessUnsuccessfulError: Raised if there was an error processing the provided calendar file.

Examples

# Creating a calendar with a specified name
cal = dr.CalendarFile.create('/home/calendars/somecalendar.csv',
                                         calendar_name='Some Calendar Name')
cal.id
>>> 5c1d4904211c0a061bc93013
cal.name
>>> Some Calendar Name

# Creating a calendar without specifying a name
cal = dr.CalendarFile.create('/home/calendars/somecalendar.csv')
cal.id
>>> 5c1d4904211c0a061bc93012
cal.name
>>> somecalendar.csv

# Creating a calendar with multiseries id columns
cal = dr.CalendarFile.create('/home/calendars/somemultiseriescalendar.csv',
                             calendar_name='Some Multiseries Calendar Name',
                             multiseries_id_columns=['series_id'])
cal.id
>>> 5da9bb21962d746f97e4daee
cal.name
>>> Some Multiseries Calendar Name
cal.multiseries_id_columns
>>> ['series_id']

Return type: CalendarFile

classmethod create_calendar_from_dataset(dataset_id, dataset_version_id=None, calendar_name=None, multiseries_id_columns=None, delete_on_error=False)¶

Creates a calendar using the given dataset. For information about calendar files, see the calendar documentation

The provided dataset have the following format:

Date,   Event,          Series ID,    Event Duration
<date>, <event_type>,   <series id>,  <event duration>
<date>, <event_type>,              ,  <event duration>

The “Series ID” and “Event Duration” columns are optional.

Once the CalendarFile has been created, pass its ID with the DatetimePartitioningSpecification when setting the target for a time series project in order to use it.

Parameters

dataset_idstring: The identifier of the dataset from which to create the calendar.
dataset_version_idstring, optional: The identifier of the dataset version from which to create the calendar.
calendar_namestring, optional: A name to assign to the calendar. Defaults to the name of the dataset if not provided.
multiseries_id_columnslist of str, optional: A list of the names of multiseries id columns to define which series an event belongs to. Currently only one multiseries id column is supported.
delete_on_errorboolean, optional: Whether delete calendar file from Catalog if it’s not valid.

Returns

calendar_fileCalendarFile: Instance with initialized data.

Raises

AsyncProcessUnsuccessfulError: Raised if there was an error processing the provided calendar file.

Examples

# Creating a calendar from a dataset
dataset = dr.Dataset.create_from_file('/home/calendars/somecalendar.csv')
cal = dr.CalendarFile.create_calendar_from_dataset(
    dataset.id, calendar_name='Some Calendar Name'
)
cal.id
>>> 5c1d4904211c0a061bc93013
cal.name
>>> Some Calendar Name

# Creating a calendar from a new dataset version
new_dataset_version = dr.Dataset.create_version_from_file(
    dataset.id, '/home/calendars/anothercalendar.csv'
)
cal = dr.CalendarFile.create(
    new_dataset_version.id, dataset_version_id=new_dataset_version.version_id
)
cal.id
>>> 5c1d4904211c0a061bc93012
cal.name
>>> anothercalendar.csv

Return type: CalendarFile

classmethod create_calendar_from_country_code(country_code, start_date, end_date)¶

Generates a calendar based on the provided country code and dataset start date and end dates. The provided country code should be uppercase and 2-3 characters long. See CalendarFile.get_allowed_country_codes for a list of allowed country codes.

Parameters

country_codestring: The country code for the country to use for generating the calendar.
start_datedatetime.datetime: The earliest date to include in the generated calendar.
end_datedatetime.datetime: The latest date to include in the generated calendar.

Returns

calendar_fileCalendarFile: Instance with initialized data.

Return type: CalendarFile

classmethod get_allowed_country_codes(offset=None, limit=None)¶

Retrieves the list of allowed country codes that can be used for generating the preloaded calendars.

Parameters

offsetint: Optional, defaults to 0. This many results will be skipped.
limitint: Optional, defaults to 100, maximum 1000. At most this many results are returned.

Returns

list

A list dicts, each of which represents an allowed country codes. Each item has the following structure:

name : (str) The name of the country.
code : (str) The code for this country. This is the value that should be supplied to CalendarFile.create_calendar_from_country_code.

Return type: List[CountryCode]

classmethod get(calendar_id)¶

Gets the details of a calendar, given the id.

Parameters

calendar_idstr: The identifier of the calendar.

Returns

calendar_fileCalendarFile: The requested calendar.

Raises

DataError: Raised if the calendar_id is invalid, i.e. the specified CalendarFile does not exist.

Examples

cal = dr.CalendarFile.get(some_calendar_id)
cal.id
>>> some_calendar_id

Return type: CalendarFile

classmethod list(project_id=None, batch_size=None)¶

Gets the details of all calendars this user has view access for.

Parameters

project_idstr, optional: If provided, will filter for calendars associated only with the specified project.
batch_sizeint, optional: The number of calendars to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of calendars. If not specified, an appropriate default will be chosen by the server.

Returns

calendar_listlist of CalendarFile: A list of CalendarFile objects.

Examples

calendars = dr.CalendarFile.list()
len(calendars)
>>> 10

Return type: List[CalendarFile]

classmethod delete(calendar_id)¶

Deletes the calendar specified by calendar_id.

Parameters

calendar_idstr: The id of the calendar to delete. The requester must have OWNER access for this calendar.

Raises

ClientError: Raised if an invalid calendar_id is provided.

Examples

# Deleting with a valid calendar_id
status_code = dr.CalendarFile.delete(some_calendar_id)
status_code
>>> 204
dr.CalendarFile.get(some_calendar_id)
>>> ClientError: Item not found

Return type: None

classmethod update_name(calendar_id, new_calendar_name)¶

Changes the name of the specified calendar to the specified name. The requester must have at least READ_WRITE permissions on the calendar.

Parameters

calendar_idstr: The id of the calendar to update.
new_calendar_namestr: The new name to set for the specified calendar.

Returns

status_codeint: 200 for success

Raises

ClientError: Raised if an invalid calendar_id is provided.

Examples

response = dr.CalendarFile.update_name(some_calendar_id, some_new_name)
response
>>> 200
cal = dr.CalendarFile.get(some_calendar_id)
cal.name
>>> some_new_name

Return type: int

classmethod share(calendar_id, access_list)¶

Shares the calendar with the specified users, assigning the specified roles.

Parameters

calendar_idstr: The id of the calendar to update
access_list:: A list of dr.SharingAccess objects. Specify None for the role to delete a user’s access from the specified CalendarFile. For more information on specific access levels, see the sharing documentation.

Returns

status_codeint: 200 for success

Raises

ClientError: Raised if unable to update permissions for a user.
AssertionError: Raised if access_list is invalid.

Examples

# assuming some_user is a valid user, share this calendar with some_user
sharing_list = [dr.SharingAccess(some_user_username,
                                 dr.enums.SHARING_ROLE.READ_WRITE)]
response = dr.CalendarFile.share(some_calendar_id, sharing_list)
response.status_code
>>> 200

# delete some_user from this calendar, assuming they have access of some kind already
delete_sharing_list = [dr.SharingAccess(some_user_username,
                                        None)]
response = dr.CalendarFile.share(some_calendar_id, delete_sharing_list)
response.status_code
>>> 200

# Attempt to add an invalid user to a calendar
invalid_sharing_list = [dr.SharingAccess(invalid_username,
                                         dr.enums.SHARING_ROLE.READ_WRITE)]
dr.CalendarFile.share(some_calendar_id, invalid_sharing_list)
>>> ClientError: Unable to update access for this calendar

Return type: int

classmethod get_access_list(calendar_id, batch_size=None)¶

Retrieve a list of users that have access to this calendar.

Parameters

calendar_idstr: The id of the calendar to retrieve the access list for.
batch_sizeint, optional: The number of access records to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of calendars. If not specified, an appropriate default will be chosen by the server.

Returns

access_control_listlist of SharingAccess: A list of SharingAccess objects.

Raises

ClientError: Raised if user does not have access to calendar or calendar does not exist.

Return type: List[SharingAccess]

class datarobot.models.calendar_file.CountryCode() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶

Automated Documentation¶

class datarobot.models.automated_documentation.AutomatedDocument(entity_id=None, document_type=None, output_format=None, locale=None, template_id=None, id=None, filepath=None, created_at=None)¶

An automated documentation object.

New in version v2.24.

Attributes

document_typestr or None: Type of automated document. You can specify: MODEL_COMPLIANCE, AUTOPILOT_SUMMARY depending on your account settings. Required for document generation.
entity_idstr or None: ID of the entity to generate the document for. It can be model ID or project ID. Required for document generation.
output_formatstr or None: Format of the generate document, either docx or html. Required for document generation.
localestr or None: Localization of the document, dependent on your account settings. Default setting is EN_US.
template_idstr or None: Template ID to use for the document outline. Defaults to standard DataRobot template. See the documentation for ComplianceDocTemplate for more information.
idstr or None: ID of the document. Required to download or delete a document.
filepathstr or None: Path to save a downloaded document to. Either include a file path and name or the file will be saved to the directory from which the script is launched.
created_atdatetime or None: Document creation timestamp.

classmethod list_available_document_types()¶

Get a list of all available document types and locales.

Returns

List of dicts

Examples

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
doc_types = dr.AutomatedDocument.list_available_document_types()

Return type: List[DocumentOption]

property is_model_compliance_initialized: Tuple[bool, str]¶

Check if model compliance documentation pre-processing is initialized. Model compliance documentation pre-processing must be initialized before generating documentation for a custom model.

Returns

Tuple of (boolean, string)

boolean flag is whether model compliance documentation pre-processing is initialized
string value is the initialization status

Return type: Tuple[bool, str]

initialize_model_compliance()¶

Initialize model compliance documentation pre-processing. Must be called before generating documentation for a custom model.

Returns

Tuple of (boolean, string)

boolean flag is whether model compliance documentation pre-processing is initialized
string value is the initialization status

Examples

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)

# NOTE: entity_id is either a model id or a model package id
doc = dr.AutomatedDocument(
        document_type="MODEL_COMPLIANCE",
        entity_id="6f50cdb77cc4f8d1560c3ed5",
        output_format="docx",
        locale="EN_US")

doc.initialize_model_compliance()

Return type: Tuple[bool, str]

generate(max_wait=600)¶

Request generation of an automated document.

Required attributes to request document generation: document_type, entity_id, and output_format.

Returns

requests.models.Response

Examples

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)

doc = dr.AutomatedDocument(
        document_type="MODEL_COMPLIANCE",
        entity_id="6f50cdb77cc4f8d1560c3ed5",
        output_format="docx",
        locale="EN_US",
        template_id="50efc9db8aff6c81a374aeec",
        filepath="/Users/username/Documents/example.docx"
        )

doc.generate()
doc.download()

Return type: Response

download()¶

Download a generated Automated Document. Document ID is required to download a file.

Returns

requests.models.Response

Examples

Generating and downloading the generated document:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)

doc = dr.AutomatedDocument(
        document_type="AUTOPILOT_SUMMARY",
        entity_id="6050d07d9da9053ebb002ef7",
        output_format="docx",
        filepath="/Users/username/Documents/Project_Report_1.docx"
        )

doc.generate()
doc.download()

Downloading an earlier generated document when you know the document ID:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
doc = dr.AutomatedDocument(id='5e8b6a34d2426053ab9a39ed')
doc.download()

Notice that filepath was not set for this document. In this case, the file is saved to the directory from which the script was launched.

Downloading a document chosen from a list of earlier generated documents:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)

model_id = "6f5ed3de855962e0a72a96fe"
docs = dr.AutomatedDocument.list_generated_documents(entity_ids=[model_id])
doc = docs[0]
doc.filepath = "/Users/me/Desktop/Recommended_model_doc.docx"
doc.download()

Return type: Response

delete()¶

Delete a document using its ID.

Returns

requests.models.Response

Examples

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
doc = dr.AutomatedDocument(id="5e8b6a34d2426053ab9a39ed")
doc.delete()

If you don’t know the document ID, you can follow the same workflow to get the ID as in the examples for the AutomatedDocument.download method.

Return type: Response

classmethod list_generated_documents(document_types=None, entity_ids=None, output_formats=None, locales=None, offset=None, limit=None)¶

Get information about all previously generated documents available for your account. The information includes document ID and type, ID of the entity it was generated for, time of creation, and other information.

Parameters

document_typesList of str or None: Query for one or more document types.
entity_idsList of str or None: Query generated documents by one or more entity IDs.
output_formatsList of str or None: Query for one or more output formats.
localesList of str or None: Query generated documents by one or more locales.
offset: int or None: Number of items to skip. Defaults to 0 if not provided.
limit: int or None: Number of items to return, maximum number of items is 1000.

Returns

List of AutomatedDocument objects, where each object contains attributes described in
AutomatedDocument

Examples

To get a list of all generated documents:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
docs = AutomatedDocument.list_generated_documents()

To get a list of all AUTOPILOT_SUMMARY documents:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
docs = AutomatedDocument.list_generated_documents(document_types=["AUTOPILOT_SUMMARY"])

To get a list of 5 recently created automated documents in html format:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
docs = AutomatedDocument.list_generated_documents(output_formats=["html"], limit=5)

To get a list of automated documents created for specific entities (projects or models):

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
docs = AutomatedDocument.list_generated_documents(
    entity_ids=["6051d3dbef875eb3be1be036",
                "6051d3e1fbe65cd7a5f6fde6",
                "6051d3e7f86c04486c2f9584"]
    )

Note, that the list of results contains AutomatedDocument objects, which means that you can execute class-related methods on them. Here’s how you can list, download, and then delete from the server all automated documents related to a certain entity:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)

ids = ["6051d3dbef875eb3be1be036", "5fe1d3d55cd810ebdb60c517f"]
docs = AutomatedDocument.list_generated_documents(entity_ids=ids)
for doc in docs:
    doc.download()
    doc.delete()

Return type: List[AutomatedDocument]

class datarobot.models.automated_documentation.DocumentOption() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶

Class Mapping Aggregation Settings¶

For multiclass projects with a lot of unique values in target column you can specify the parameters for aggregation of rare values to improve the modeling performance and decrease the runtime and resource usage of resulting models.

class datarobot.helpers.ClassMappingAggregationSettings(max_unaggregated_class_values=None, min_class_support=None, excluded_from_aggregation=None, aggregation_class_name=None)¶

Class mapping aggregation settings. For multiclass projects allows fine control over which target values will be preserved as classes. Classes which aren’t preserved will be - aggregated into a single “catch everything else” class in case of multiclass - or will be ignored in case of multilabel. All attributes are optional, if not specified - server side defaults will be used.

Attributes

max_unaggregated_class_valuesint, optional: Maximum amount of unique values allowed before aggregation kicks in.
min_class_supportint, optional: Minimum number of instances necessary for each target value in the dataset. All values with less instances will be aggregated.
excluded_from_aggregationlist, optional: List of target values that should be guaranteed to kept as is, regardless of other settings.
aggregation_class_namestr, optional: If some of the values will be aggregated - this is the name of the aggregation class that will replace them.

Client Configuration¶

datarobot.client.Client(token=None, endpoint=None, config_path=None, connect_timeout=None, user_agent_suffix=None, ssl_verify=True, max_retries=None, token_type='Token', default_use_case=None, enable_api_consumer_tracking=None, trace_context=None)¶

Configures the global API client for the Python SDK. The client will be configured in one of the following ways, in order of priority.

From call args iff token and endpoint kwargs are specified;

From a YAML file at the path specified in the config_path kwarg;

From a YAML file at the path specified in the env var DATAROBOT_CONFIG_FILE;

From env vars, iff DATAROBOT_ENDPOINT and DATAROBOT_API_TOKEN are specified;

From a YAML file at the path $HOME/.config/datarobot/drconfig.yaml.

Note

All client configuration must be done via a single method; there is no fall back to lower priority methods.

This can also have the side effect of setting a default Use Case for client API requests.

Parameters

tokenstr, optional: API token
endpointstr, optional: Base url of API
config_pathstr, optional: Alternate location of config file
connect_timeoutint, optional: How long the client should be willing to wait before establishing a connection with the server.
user_agent_suffixstr, optional: Additional text that is appended to the User-Agent HTTP header when communicating with the DataRobot REST API. This can be useful for identifying different applications that are built on top of the DataRobot Python Client, which can aid debugging and help track usage.
ssl_verifybool or str, optional: Whether to check SSL certificate. Could be set to path with certificates of trusted certification authorities.
max_retriesint or datarobot.rest.Retry, optional: Either an integer number of times to retry connection errors, or a urllib3.util.retry.Retry object to configure retries.
token_type: str, “Token” by default: Authentication token type: Token, Bearer. “Bearer” is for DataRobot OAuth2 token, “Token” for token generated in Developer Tools.
default_use_case: str, optional: The entity ID of the default Use Case to use with any requests made by the client.
enable_api_consumer_tracking: bool, optional: Enable and disable user metrics tracking within the datarobot module. Default: False.
trace_context: str, optional: An ID or other string for identifying which code template or AI Accelerator was used to make a request.
Returns
——-: The RESTClientObject instance created.

Return type: RESTClientObject

datarobot.client.get_client()¶

Returns the global HTTP client for the Python SDK, instantiating it if necessary.

Return type: RESTClientObject

datarobot.client.set_client(client)¶

Configure the global HTTP client for the Python SDK. Returns previous instance.

Return type: Optional[RESTClientObject]

datarobot.client.client_configuration(*args, **kwargs)¶

This context manager can be used to temporarily change the global HTTP client.

In multithreaded scenarios, it is highly recommended to use a fresh manager object per thread.

DataRobot does not recommend nesting these contexts.

Parameters

argsParameters passed to datarobot.client.Client()
kwargsKeyword arguments passed to datarobot.client.Client()

Examples

from datarobot.client import client_configuration
from datarobot.models import Project

with client_configuration(token="api-key-here", endpoint="https://host-name.com"):
    Project.list()

from datarobot.client import Client, client_configuration
from datarobot.models import Project

Client()  # Interact with DataRobot using the default configuration.
Project.list()

with client_configuration(config_path="/path/to/a/drconfig.yaml"):
    # Interact with DataRobot using a different configuration.
    Project.list()

class datarobot.rest.RESTClientObject(auth, endpoint, connect_timeout=6.05, verify=True, user_agent_suffix=None, max_retries=None, authentication_type=None)¶

Parameters

connect_timeout: timeout for http request and connection
headers: headers for outgoing requests

open_in_browser()¶

Opens the DataRobot app in a web browser, or logs the URL if a browser is not available.

Return type: None

Clustering¶

class datarobot.models.ClusteringModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, is_n_clusters_dynamically_determined=None, blueprint_id=None, metrics=None, project=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, n_clusters=None, has_empty_clusters=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, use_project_settings=None, supports_composable_ml=None)¶

ClusteringModel extends Model class. It provides provides properties and methods specific to clustering projects.

compute_insights(max_wait=600)¶

Compute and retrieve cluster insights for model. This method awaits completion of job computing cluster insights and returns results after it is finished. If computation takes longer than specified max_wait exception will be raised.

Parameters

project_id: str: Project to start creation in.
model_id: str: Project’s model to start creation in.
max_wait: int: Maximum number of seconds to wait before giving up

Returns

List of ClusterInsight

Raises

ClientError: Server rejected creation due to client error. Most likely cause is bad project_id or model_id.
AsyncFailureError: If any of the responses from the server are unexpected
AsyncProcessUnsuccessfulError: If the cluster insights computation has failed or was cancelled.
AsyncTimeoutError: If the cluster insights computation did not resolve in time

Return type: List[ClusterInsight]

property insights: List[datarobot.models.cluster_insight.ClusterInsight]¶

Return actual list of cluster insights if already computed.

Returns

List of ClusterInsight

Return type: List[ClusterInsight]

property clusters: List[datarobot.models.cluster.Cluster]¶

Return actual list of Clusters.

Returns

List of Cluster

Return type: List[Cluster]

update_cluster_names(cluster_name_mappings)¶

Change many cluster names at once based on list of name mappings.

Parameters

cluster_name_mappings: List of tuples

Cluster names mapping consisting of current cluster name and old cluster name. Example:

cluster_name_mappings = [
    ("current cluster name 1", "new cluster name 1"),
    ("current cluster name 2", "new cluster name 2")]

Returns

List of Cluster

Raises

datarobot.errors.ClientError: Server rejected update of cluster names. Possible reasons include: incorrect format of mapping, mapping introduces duplicates.

Return type: List[Cluster]

update_cluster_name(current_name, new_name)¶

Change cluster name from current_name to new_name.

Parameters

current_name: str: Current cluster name.
new_name: str: New cluster name.

Returns

List of Cluster

Raises

datarobot.errors.ClientError: Server rejected update of cluster names.

Return type: List[Cluster]

class datarobot.models.cluster.Cluster(**kwargs)¶

Representation of a single cluster.

Attributes

name: str: Current cluster name
percent: float: Percent of data contained in the cluster. This value is reported after cluster insights are computed for the model.

classmethod list(project_id, model_id)¶

Retrieve a list of clusters in the model.

Parameters

project_id: str: ID of the project that the model is part of.
model_id: str: ID of the model.

Returns

List of clusters

Return type: List[Cluster]

classmethod update_multiple_names(project_id, model_id, cluster_name_mappings)¶

Update many clusters at once based on list of name mappings.

Parameters

project_id: str

ID of the project that the model is part of.

model_id: str

ID of the model.

cluster_name_mappings: List of tuples

Cluster name mappings, consisting of current and previous names for each cluster. Example:

cluster_name_mappings = [
    ("current cluster name 1", "new cluster name 1"),
    ("current cluster name 2", "new cluster name 2")]

Returns

List of clusters

Raises

datarobot.errors.ClientError: Server rejected update of cluster names.
ValueError: Invalid cluster name mapping provided.

Return type: List[Cluster]

classmethod update_name(project_id, model_id, current_name, new_name)¶

Change cluster name from current_name to new_name

Parameters

project_id: str: ID of the project that the model is part of.
model_id: str: ID of the model.
current_name: str: Current cluster name
new_name: str: New cluster name

Returns

List of Cluster

Return type: List[Cluster]

class datarobot.models.cluster_insight.ClusterInsight(**kwargs)¶

Holds data on all insights related to feature as well as breakdown per cluster.

Parameters

feature_name: str: Name of a feature from the dataset.
feature_type: str: Type of feature.
insightsList of classes (ClusterInsight): List provides information regarding the importance of a specific feature in relation to each cluster. Results help understand how the model is grouping data and what each cluster represents.
feature_impact: float: Impact of a feature ranging from 0 to 1.

classmethod compute(project_id, model_id, max_wait=600)¶

Starts creation of cluster insights for the model and if successful, returns computed ClusterInsights. This method allows calculation to continue for a specified time and if not complete, cancels the request.

Parameters

project_id: str: ID of the project to begin creation of cluster insights for.
model_id: str: ID of the project model to begin creation of cluster insights for.
max_wait: int: Maximum number of seconds to wait canceling the request.

Returns

List[ClusterInsight]

Raises

ClientError: Server rejected creation due to client error. Most likely cause is bad project_id or model_id.
AsyncFailureError: Indicates whether any of the responses from the server are unexpected.
AsyncProcessUnsuccessfulError: Indicates whether the cluster insights computation failed or was cancelled.
AsyncTimeoutError: Indicates whether the cluster insights computation did not resolve within the specified time limit (max_wait).

Return type: List[ClusterInsight]

Compliance Documentation Templates¶

class datarobot.models.compliance_doc_template.ComplianceDocTemplate(id, creator_id, creator_username, name, org_id=None, sections=None)¶

A compliance documentation template. Templates are used to customize contents of AutomatedDocument.

New in version v2.14.

Notes

Each section dictionary has the following schema:

title : title of the section
type : type of section. Must be one of “datarobot”, “user” or “table_of_contents”.

Each type of section has a different set of attributes described bellow.

Section of type "datarobot" represent a section owned by DataRobot. DataRobot sections have the following additional attributes:

content_id : The identifier of the content in this section. You can get the default template with get_default for a complete list of possible DataRobot section content ids.
sections : list of sub-section dicts nested under the parent section.

Section of type "user" represent a section with user-defined content. Those sections may contain text generated by user and have the following additional fields:

regularText : regular text of the section, optionally separated by \n to split paragraphs.
highlightedText : highlighted text of the section, optionally separated by \n to split paragraphs.
sections : list of sub-section dicts nested under the parent section.

Section of type "table_of_contents" represent a table of contents and has no additional attributes.

Attributes

idstr: the id of the template
namestr: the name of the template.
creator_idstr: the id of the user who created the template
creator_usernamestr: username of the user who created the template
org_idstr: the id of the organization the template belongs to
sectionslist of dicts: the sections of the template describing the structure of the document. Section schema is described in Notes section above.

classmethod get_default(template_type=None)¶

Get a default DataRobot template. This template is used for generating compliance documentation when no template is specified.

Parameters

template_typestr or None: Type of the template. Currently supported values are “normal” and “time_series”

Returns

templateComplianceDocTemplate: the default template object with sections attribute populated with default sections.

Return type: ComplianceDocTemplate

classmethod create_from_json_file(name, path)¶

Create a template with the specified name and sections in a JSON file.

This is useful when working with sections in a JSON file. Example:

default_template = ComplianceDocTemplate.get_default()
default_template.sections_to_json_file('path/to/example.json')
# ... edit example.json in your editor
my_template = ComplianceDocTemplate.create_from_json_file(
    name='my template',
    path='path/to/example.json'
)

Parameters

namestr: the name of the template. Must be unique for your user.
pathstr: the path to find the JSON file at

Returns

templateComplianceDocTemplate: the created template

Return type: ComplianceDocTemplate

classmethod create(name, sections)¶

Create a template with the specified name and sections.

Parameters

namestr: the name of the template. Must be unique for your user.
sectionslist: list of section objects

Returns

templateComplianceDocTemplate: the created template

Return type: ComplianceDocTemplate

classmethod get(template_id)¶

Retrieve a specific template.

Parameters

template_idstr: the id of the template to retrieve

Returns

templateComplianceDocTemplate: the retrieved template

Return type: ComplianceDocTemplate

classmethod list(name_part=None, limit=None, offset=None)¶

Get a paginated list of compliance documentation template objects.

Parameters

name_partstr or None: Return only the templates with names matching specified string. The matching is case-insensitive.
limitint: The number of records to return. The server will use a (possibly finite) default if not specified.
offsetint: The number of records to skip.

Returns

templateslist of ComplianceDocTemplate: the list of template objects

Return type: List[ComplianceDocTemplate]

sections_to_json_file(path, indent=2)¶

Save sections of the template to a json file at the specified path

Parameters

pathstr: the path to save the file to
indentint: indentation to use in the json file.

Return type: None

update(name=None, sections=None)¶

Update the name or sections of an existing doc template.

Note that default or non-existent templates can not be updated.

Parameters

namestr, optional: the new name for the template
sectionslist of dicts: list of sections

Return type: None

delete()¶

Delete the compliance documentation template.

Return type: None

Confusion Chart¶

class datarobot.models.confusion_chart.ConfusionChart(source, data, source_model_id)¶

Confusion Chart data for model.

Notes

ClassMetrics is a dict containing the following:

class_name (string) name of the class

actual_count (int) number of times this class is seen in the validation data

predicted_count (int) number of times this class has been predicted for the validation data

f1 (float) F1 score

recall (float) recall score

precision (float) precision score

was_actual_percentages (list of dict) one vs all actual percentages in format specified below.

other_class_name (string) the name of the other class

percentage (float) the percentage of the times this class was predicted when is was actually class (from 0 to 1)

was_predicted_percentages (list of dict) one vs all predicted percentages in format specified below.

other_class_name (string) the name of the other class

percentage (float) the percentage of the times this class was actual predicted (from 0 to 1)

confusion_matrix_one_vs_all (list of list) 2d list representing 2x2 one vs all matrix.

This represents the True/False Negative/Positive rates as integer for each class. The data structure looks like:

[ [ True Negative, False Positive ], [ False Negative, True Positive ] ]

Attributes

sourcestr: Confusion Chart data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.
raw_datadict: All of the raw data for the Confusion Chart
confusion_matrixlist of list: The NxN confusion matrix
classeslist: The names of each of the classes
class_metricslist of dicts: List of dicts with schema described as ClassMetrics above.
source_model_idstr: ID of the model this Confusion chart represents; in some cases, insights from the parent of a frozen model may be used

Credentials¶

class datarobot.models.Credential(credential_id=None, name=None, credential_type=None, creation_date=None, description=None)¶

classmethod list()¶

Returns list of available credentials.

Returns

credentialslist of Credential instances: contains a list of available credentials.

Examples

>>> import datarobot as dr
>>> data_sources = dr.Credential.list()
>>> data_sources
[
    Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'),
    Credential('5e42cc4dcf8a5f3256865840', 'my_jdbc_cred', 'jdbc'),
]

Return type: List[Credential]

classmethod get(credential_id)¶

Gets the Credential.

Parameters

credential_idstr: the identifier of the credential.

Returns

credentialCredential: the requested credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.get('5a8ac9ab07a57a0001be501f')
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'),

Return type: Credential

delete()¶

Deletes the Credential the store.

Parameters

credential_idstr: the identifier of the credential.

Returns

credentialCredential: the requested credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.get('5a8ac9ab07a57a0001be501f')
>>> cred.delete()

Return type: None

classmethod create_basic(name, user, password, description=None)¶

Creates the credentials.

Parameters

namestr: the name to use for this set of credentials.
userstr: the username to store for this set of credentials.
passwordstr: the password to store for this set of credentials.
descriptionstr, optional: the description to use for this set of credentials.

Returns

credentialCredential: the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_basic(
...     name='my_basic_cred',
...     user='username',
...     password='password',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_basic_cred', 'basic'),

Return type: Credential

classmethod create_oauth(name, token, refresh_token, description=None)¶

Creates the OAUTH credentials.

Parameters

namestr: the name to use for this set of credentials.
token: str: the OAUTH token
refresh_token: str: The OAUTH token
descriptionstr, optional: the description to use for this set of credentials.

Returns

credentialCredential: the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_oauth(
...     name='my_oauth_cred',
...     token='XXX',
...     refresh_token='YYY',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_oauth_cred', 'oauth'),

Return type: Credential

classmethod create_s3(name, aws_access_key_id=None, aws_secret_access_key=None, aws_session_token=None, description=None)¶

Creates the S3 credentials.

Parameters

namestr: the name to use for this set of credentials.
aws_access_key_idstr, optional: the AWS access key id.
aws_secret_access_keystr, optional: the AWS secret access key.
aws_session_tokenstr, optional: the AWS session token.
descriptionstr, optional: the description to use for this set of credentials.

Returns

credentialCredential: the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_s3(
...     name='my_s3_cred',
...     aws_access_key_id='XXX',
...     aws_secret_access_key='YYY',
...     aws_session_token='ZZZ',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'),

Return type: Credential

classmethod create_azure(name, azure_connection_string, description=None)¶

Creates the Azure storage credentials.

Parameters

namestr: the name to use for this set of credentials.
azure_connection_stringstr: the Azure connection string.
descriptionstr, optional: the description to use for this set of credentials.

Returns

credentialCredential: the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_azure(
...     name='my_azure_cred',
...     azure_connection_string='XXX',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_azure_cred', 'azure'),

Return type: Credential

classmethod create_gcp(name, gcp_key=None, description=None)¶

Creates the GCP credentials.

Parameters

namestr: the name to use for this set of credentials.
gcp_keystr | dict: the GCP key in json format or parsed as dict.
descriptionstr, optional: the description to use for this set of credentials.

Returns

credentialCredential: the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_gcp(
...     name='my_gcp_cred',
...     gcp_key='XXX',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_gcp_cred', 'gcp'),

Return type: Credential

update(name=None, description=None, **kwargs)¶

Update the credential values of an existing credential. Updates this object in place.

New in version v3.2.

Parameters

namestr: The name to use for this set of credentials.
descriptionstr, optional: The description to use for this set of credentials; if omitted, and name is not omitted, then it clears any previous description for that name.
kwargsKeyword arguments specific to the given credential_type that should be updated.

Return type: None

Prediction Environment¶

class datarobot.models.PredictionEnvironment(id, name, platform, description=None, permissions=None, is_deleted=None, supported_model_formats=None, import_meta=None, management_meta=None, health=None, is_managed_by_management_agent=None, plugin=None, datastore_id=None, credential_id=None)¶

A prediction environment entity.

New in version v3.3.0.

Attributes

id: str: The ID of the prediction environment.
name: str: The name of the prediction environment.
description: str, optional: The description of the prediction environment.
platform: str, optional: Indicates which platform is in use (AWS, GCP, DataRobot, etc.).
permissions: list, optional: A set of permissions for the prediction environment.
is_deleted: boolean, optional: The flag that shows if this prediction environment deleted.
supported_model_formats: list, optional: The list of supported model formats (datarobot, datarobotScoringCode, customModel, externalModel).
is_managed_by_management_agentboolean, optional: Determines if the prediction environment should be managed by the management agent. False by default.
datastore_idstr, optional: The ID of the data store connection configuration. Only applicable for external prediction environments managed by DataRobot.
credential_idstr, optional: The ID of the credential associated with the data connection. Only applicable for external prediction environments managed by DataRobot.

classmethod list()¶

Returns list of available extrenal prediction environments.

Returns

prediction_environmentslist of PredictionEnvironment instances: contains a list of available prediction environments.

Examples

>>> import datarobot as dr
>>> prediction_environments = dr.PredictionEnvironment.list()
>>> prediction_environments
[
    PredictionEnvironment('5e429d6ecf8a5f36c5693e03', 'demo_pe', 'aws', 'env for demo testing'),
    PredictionEnvironment('5e42cc4dcf8a5f3256865840', 'azure_pe', 'azure', 'env for azure demo testing'),
]

Return type: List[PredictionEnvironment]

classmethod get(pe_id)¶

Gets the PredictionEnvironment by id.

Parameters

pe_idstr: the identifier of the PredictionEnvironment.

Returns

prediction_environmentPredictionEnvironment: the requested prediction environment object.

Examples

>>> import datarobot as dr
>>> pe = dr.PredictionEnvironment.get('5a8ac9ab07a57a1231be501f')
>>> pe
PredictionEnvironment('5a8ac9ab07a57a1231be501f', 'my_predict_env', 'aws', 'demo env'),

Return type: PredictionEnvironment

delete()¶

Deletes the prediction environment.

Examples

>>> import datarobot as dr
>>> pe = dr.PredictionEnvironment.get('5a8ac9ab07a57a1231be501f')
>>> pe.delete()

Return type: None

classmethod create(name, platform, description=None, plugin=None, supported_model_formats=None, is_managed_by_management_agent=False, datastore=None, credential=None)¶

Create a prediction environment.

Parameters

namestr: The name of the prediction environment.
descriptionstr, optional: The description of the prediction environment.
platformstr: Indicates which platform is in use (AWS, GCP, DataRobot, etc.).
pluginstr: Optional. The plugin name to use.
supported_model_formatslist, optional: The list of supported model formats [datarobot, datarobotScoringCode, customModel, externalModel]. When not provided , the default value is inferred based on platform, (DataRobot platform: DataRobot, Custom Models; All other platforms: DataRobot, Custom Models, External Models).
is_managed_by_management_agentboolean, optional: Determines if this prediction environment should be managed by the management agent. default: False
datastoreDataStore|str, optional]: The datastore object or ID of the data store connection configuration. Only applicable for external Prediction Environments managed by DataRobot.
credentialCredential|str, optional]: The credential object or ID of the credential associated with the data connection. Only applicable for external Prediction Environments managed by DataRobot.

Returns

prediction_environmentPredictionEnvironment: the prediction environment was created

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Examples

>>> import datarobot as dr
>>> pe = dr.PredictionEnvironment.create(
...     name='my_predict_env',
...     platform=PredictionEnvironmentPlatform.AWS,
...     description='demo prediction env',
... )
>>> pe
PredictionEnvironment('5e429d6ecf8a5f36c5693e99', 'my_predict_env', 'aws', 'demo prediction env'),

Return type: PredictionEnvironment

Custom Models¶

class datarobot.models.custom_model_version.CustomModelFileItem(id, file_name, file_path, file_source, created_at=None)¶

A file item attached to a DataRobot custom model version.

New in version v2.21.

Attributes

id: str: The ID of the file item.
file_name: str: The name of the file item.
file_path: str: The path of the file item.
file_source: str: The source of the file item.
created_at: str, optional: ISO-8601 formatted timestamp of when the version was created.

class datarobot.CustomInferenceModel(**kwargs)¶

A custom inference model.

New in version v2.21.

Attributes

id: str: The ID of the custom model.
name: str: The name of the custom model.
language: str: The programming language of the custom inference model. Can be “python”, “r”, “java” or “other”.
description: str: The description of the custom inference model.
target_type: datarobot.TARGET_TYPE: Target type of the custom inference model. Values: [datarobot.TARGET_TYPE.BINARY, datarobot.TARGET_TYPE.REGRESSION, datarobot.TARGET_TYPE.MULTICLASS, datarobot.TARGET_TYPE.UNSTRUCTURED, datarobot.TARGET_TYPE.ANOMALY]
target_name: str, optional: Target feature name. It is optional(ignored if provided) for datarobot.TARGET_TYPE.UNSTRUCTURED or datarobot.TARGET_TYPE.ANOMALY target type.
latest_version: datarobot.CustomModelVersion or None: The latest version of the custom model if the model has a latest version.
deployments_count: int: Number of a deployments of the custom models.
target_name: str: The custom model target name.
positive_class_label: str: For binary classification projects, a label of a positive class.
negative_class_label: str: For binary classification projects, a label of a negative class.
prediction_threshold: float: For binary classification projects, a threshold used for predictions.
training_data_assignment_in_progress: bool: Flag describing if training data assignment is in progress.
training_dataset_id: str, optional: The ID of a dataset assigned to the custom model.
training_dataset_version_id: str, optional: The ID of a dataset version assigned to the custom model.
training_data_file_name: str, optional: The name of assigned training data file.
training_data_partition_column: str, optional: The name of a partition column in a training dataset assigned to the custom model.
created_by: str: The username of a user who created the custom model.
updated_at: str: ISO-8601 formatted timestamp of when the custom model was updated
created_at: str: ISO-8601 formatted timestamp of when the custom model was created
network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional: Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS, datarobot.NETWORK_EGRESS_POLICY.PUBLIC]. Note: datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS value is only supported by the SaaS (cloud) environment.
maximum_memory: int, optional: The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s.
replicas: int, optional: A fixed number of replicas that will be deployed in the cluster
is_training_data_for_versions_permanently_enabled: bool, optional: Whether training data assignment on the version level is permanently enabled for the model.

classmethod list(is_deployed=None, search_for=None, order_by=None)¶

List custom inference models available to the user.

New in version v2.21.

Parameters

is_deployed: bool, optional: Flag for filtering custom inference models. If set to True, only deployed custom inference models are returned. If set to False, only not deployed custom inference models are returned.
search_for: str, optional: String for filtering custom inference models - only custom inference models that contain the string in name or description will be returned. If not specified, all custom models will be returned
order_by: str, optional: Property to sort custom inference models by. Supported properties are “created” and “updated”. Prefix the attribute name with a dash to sort in descending order, e.g. order_by=’-created’. By default, the order_by parameter is None which will result in custom models being returned in order of creation time descending.

Returns

List[CustomInferenceModel]: A list of custom inference models.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status
datarobot.errors.ServerError: If the server responded with 5xx status

Return type: List[CustomInferenceModel]

classmethod get(custom_model_id)¶

Get custom inference model by id.

New in version v2.21.

Parameters

custom_model_id: str: The ID of the custom inference model.

Returns

CustomInferenceModel: Retrieved custom inference model.

Raises

datarobot.errors.ClientError: The ID the server responded with 4xx status.
datarobot.errors.ServerError: The ID the server responded with 5xx status.

Return type: CustomInferenceModel

download_latest_version(file_path)¶

Download the latest custom inference model version.

New in version v2.21.

Parameters

file_path: str: Path to create a file with custom model version content.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: None

classmethod create(name, target_type, target_name=None, language=None, description=None, positive_class_label=None, negative_class_label=None, prediction_threshold=None, class_labels=None, class_labels_file=None, network_egress_policy=None, maximum_memory=None, replicas=None, is_training_data_for_versions_permanently_enabled=None)¶

Create a custom inference model.

New in version v2.21.

Parameters

name: str: Name of the custom inference model.
target_type: datarobot.TARGET_TYPE: Target type of the custom inference model. Values: [datarobot.TARGET_TYPE.BINARY, datarobot.TARGET_TYPE.REGRESSION, datarobot.TARGET_TYPE.MULTICLASS, datarobot.TARGET_TYPE.UNSTRUCTURED]
target_name: str, optional: Target feature name. It is optional(ignored if provided) for datarobot.TARGET_TYPE.UNSTRUCTURED target type.
language: str, optional: Programming language of the custom learning model.
description: str, optional: Description of the custom learning model.
positive_class_label: str, optional: Custom inference model positive class label for binary classification.
negative_class_label: str, optional: Custom inference model negative class label for binary classification.
prediction_threshold: float, optional: Custom inference model prediction threshold.
class_labels: List[str], optional: Custom inference model class labels for multiclass classification. Cannot be used with class_labels_file.
class_labels_file: str, optional: Path to file containing newline separated class labels for multiclass classification. Cannot be used with class_labels.
network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional: Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS, datarobot.NETWORK_EGRESS_POLICY.PUBLIC] Note: datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS value is only supported by the SaaS (cloud) environment.
maximum_memory: int, optional: The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s.
replicas: int, optional: A fixed number of replicas that will be deployed in the cluster.
is_training_data_for_versions_permanently_enabled: bool, optional: Permanently enable training data assignment on the version level for the current model, instead of training data assignment on the model level.

Returns

CustomInferenceModel: Created a custom inference model.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: CustomInferenceModel

classmethod copy_custom_model(custom_model_id)¶

Create a custom inference model by copying existing one.

New in version v2.21.

Parameters

custom_model_id: str: The ID of the custom inference model to copy.

Returns

CustomInferenceModel: Created a custom inference model.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: CustomInferenceModel

update(name=None, language=None, description=None, target_name=None, positive_class_label=None, negative_class_label=None, prediction_threshold=None, class_labels=None, class_labels_file=None, is_training_data_for_versions_permanently_enabled=None)¶

Update custom inference model properties.

New in version v2.21.

Parameters

name: str, optional: New custom inference model name.
language: str, optional: New custom inference model programming language.
description: str, optional: New custom inference model description.
target_name: str, optional: New custom inference model target name.
positive_class_label: str, optional: New custom inference model positive class label.
negative_class_label: str, optional: New custom inference model negative class label.
prediction_threshold: float, optional: New custom inference model prediction threshold.
class_labels: List[str], optional: custom inference model class labels for multiclass classification Cannot be used with class_labels_file
class_labels_file: str, optional: Path to file containing newline separated class labels for multiclass classification. Cannot be used with class_labels
is_training_data_for_versions_permanently_enabled: bool, optional: Permanently enable training data assignment on the version level for the current model, instead of training data assignment on the model level.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: None

refresh()¶

Update custom inference model with the latest data from server.

New in version v2.21.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: None

delete()¶

Delete custom inference model.

New in version v2.21.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: None

assign_training_data(dataset_id, partition_column=None, max_wait=600)¶

Assign training data to the custom inference model.

New in version v2.21.

Parameters

dataset_id: str: The id of the training dataset to be assigned.
partition_column: str, optional: Name of a partition column in the training dataset.
max_wait: int, optional: Max time to wait for a training data assignment. If set to None - method will return without waiting. Defaults to 10 min.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status
datarobot.errors.ServerError: If the server responded with 5xx status

Return type: None

class datarobot.CustomModelTest(**kwargs)¶

An custom model test.

New in version v2.21.

Attributes

id: str: test id
custom_model_image_id: str: id of a custom model image
image_type: str: the type of the image, either CUSTOM_MODEL_IMAGE_TYPE.CUSTOM_MODEL_IMAGE if the testing attempt is using a CustomModelImage as its model or CUSTOM_MODEL_IMAGE_TYPE.CUSTOM_MODEL_VERSION if the testing attempt is using a CustomModelVersion with dependency management
overall_status: str: a string representing testing status. Status can be - ‘not_tested’: the check not run - ‘failed’: the check failed - ‘succeeded’: the check succeeded - ‘warning’: the check resulted in a warning, or in non-critical failure - ‘in_progress’: the check is in progress
detailed_status: dict: detailed testing status - maps the testing types to their status and message. The keys of the dict are one of ‘errorCheck’, ‘nullValueImputation’, ‘longRunningService’, ‘sideEffects’. The values are dict with ‘message’ and ‘status’ keys.
created_by: str: a user who created a test
dataset_id: str, optional: id of a dataset used for testing
dataset_version_id: str, optional: id of a dataset version used for testing
completed_at: str, optional: ISO-8601 formatted timestamp of when the test has completed
created_at: str, optional: ISO-8601 formatted timestamp of when the version was created
network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional: Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS, datarobot.NETWORK_EGRESS_POLICY.PUBLIC]. Note: datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS value is only supported by the SaaS (cloud) environment.
maximum_memory: int, optional: The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s
replicas: int, optional: A fixed number of replicas that will be deployed in the cluster

classmethod create(custom_model_id, custom_model_version_id, dataset_id=None, max_wait=600, network_egress_policy=None, maximum_memory=None, replicas=None)¶

Create and start a custom model test.

New in version v2.21.

Parameters

custom_model_id: str: the id of the custom model
custom_model_version_id: str: the id of the custom model version
dataset_id: str, optional: The id of the testing dataset for non-unstructured custom models. Ignored and not required for unstructured models.
max_wait: int, optional: max time to wait for a test completion. If set to None - method will return without waiting.
network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional: Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS, datarobot.NETWORK_EGRESS_POLICY.PUBLIC]. Note: datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS value is only supported by the SaaS (cloud) environment.
maximum_memory: int, optional: The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s
replicas: int, optional: A fixed number of replicas that will be deployed in the cluster

Returns

CustomModelTest: created custom model test

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

classmethod list(custom_model_id)¶

List custom model tests.

New in version v2.21.

Parameters

custom_model_id: str: the id of the custom model

Returns

List[CustomModelTest]: a list of custom model tests

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

classmethod get(custom_model_test_id)¶

Get custom model test by id.

New in version v2.21.

Parameters

custom_model_test_id: str: the id of the custom model test

Returns

CustomModelTest: retrieved custom model test

Raises

datarobot.errors.ClientError: if the server responded with 4xx status.
datarobot.errors.ServerError: if the server responded with 5xx status.

get_log()¶

Get log of a custom model test.

New in version v2.21.

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

get_log_tail()¶

Get log tail of a custom model test.

New in version v2.21.

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

cancel()¶

Cancel custom model test that is in progress.

New in version v2.21.

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

refresh()¶

Update custom model test with the latest data from server.

New in version v2.21.

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

class datarobot.CustomModelVersion(**kwargs)¶

A version of a DataRobot custom model.

New in version v2.21.

Attributes

id: str: The ID of the custom model version.
custom_model_id: str: The ID of the custom model.
version_minor: int: A minor version number of the custom model version.
version_major: int: A major version number of the custom model version.
is_frozen: bool: A flag if the custom model version is frozen.
items: List[CustomModelFileItem]: A list of file items attached to the custom model version.
base_environment_id: str: The ID of the environment to use with the model.
base_environment_version_id: str: The ID of the environment version to use with the model.
label: str, optional: A short human readable string to label the version.
description: str, optional: The custom model version description.
created_at: str, optional: ISO-8601 formatted timestamp of when the version was created.
dependencies: List[CustomDependency]: The parsed dependencies of the custom model version if the version has a valid requirements.txt file.
network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional: Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS, datarobot.NETWORK_EGRESS_POLICY.PUBLIC]. Note: datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS value is only supported by the SaaS (cloud) environment.
maximum_memory: int, optional: The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s.
replicas: int, optional: A fixed number of replicas that will be deployed in the cluster.
required_metadata_values: List[RequiredMetadataValue]: Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
training_data: TrainingData, optional: The information about the training data assigned to the model version.
holdout_data: HoldoutData, optional: The information about the holdout data assigned to the model version.

classmethod from_server_data(data, keep_attrs=None)¶

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters

datadict: The directly translated dict of JSON from the server. No casing fixes have taken place
keep_attrsiterable: List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

Return type: CustomModelVersion

classmethod create_clean(custom_model_id, base_environment_id, is_major_update=True, folder_path=None, files=None, network_egress_policy=None, maximum_memory=None, replicas=None, required_metadata_values=None, training_dataset_id=None, partition_column=None, holdout_dataset_id=None, keep_training_holdout_data=None, max_wait=600)¶

Create a custom model version without files from previous versions.

Create a version with training or holdout data: If training/holdout data related parameters are provided, the training data is assigned asynchronously. In this case: * if max_wait is not None, the function returns once the job is finished. * if max_wait is None, the function returns immediately. Progress can be polled by the user (see examples).

If training data assignment fails, new version is still created, but it is not allowed to create a model package for the model version and to deploy it. To check for training data assignment error, check version.training_data.assignment_error[“message”].

New in version v2.21.

Parameters

custom_model_id: str: The ID of the custom model.
base_environment_id: str: The ID of the base environment to use with the custom model version.
is_major_update: bool: The flag defining if a custom model version will be a minor or a major version. Default to True
folder_path: str, optional: The path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path.
files: list, optional: The list of tuples, where values in each tuple are the local filesystem path and the path the file should be placed in the model. If the list is of strings, then basenames will be used for tuples. Example: [(“/home/user/Documents/myModel/file1.txt”, “file1.txt”), (“/home/user/Documents/myModel/folder/file2.txt”, “folder/file2.txt”)] or [“/home/user/Documents/myModel/file1.txt”, “/home/user/Documents/myModel/folder/file2.txt”]
network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional: Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS, datarobot.NETWORK_EGRESS_POLICY.PUBLIC]. Note: datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS value is only supported by the SaaS (cloud) environment.
maximum_memory: int, optional: The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s.
replicas: int, optional: A fixed number of replicas that will be deployed in the cluster.
required_metadata_values: List[RequiredMetadataValue]: Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
training_dataset_id: str, optional: The ID of the training dataset to assign to the custom model.
partition_column: str, optional: Name of a partition column in a training dataset assigned to the custom model. Can only be assigned for structured models.
holdout_dataset_id: str, optional: The ID of the holdout dataset to assign to the custom model. Can only be assigned for unstructured models.
keep_training_holdout_data: bool, optional: If the version should inherit training and holdout data from the previous version. Defaults to True. This field is only applicable if the model has training data for versions enabled, otherwise the field value will be ignored.
max_wait: int, optional: Max time to wait for training data assignment. If set to None - method will return without waiting. Defaults to 10 minutes.

Returns

CustomModelVersion: Created custom model version.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.
datarobot.errors.InvalidUsageError: If wrong parameters are provided.
datarobot.errors.TrainingDataAssignmentError: If training data assignment fails.

Examples

Create a version with blocking (default max_wait=600) training data assignment:

import datarobot as dr
from datarobot.errors import TrainingDataAssignmentError

dr.Client(token=my_token, endpoint=endpoint)

try:
    version = dr.CustomModelVersion.create_clean(
        custom_model_id="6444482e5583f6ee2e572265",
        base_environment_id="642209acc563893014a41e24",
        training_dataset_id="6421f2149a4f9b1bec6ad6dd",
    )
except TrainingDataAssignmentError as e:
    print(e)

Create a version with non-blocking training data assignment:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)

version = dr.CustomModelVersion.create_clean(
    custom_model_id="6444482e5583f6ee2e572265",
    base_environment_id="642209acc563893014a41e24",
    training_dataset_id="6421f2149a4f9b1bec6ad6dd",
    max_wait=None,
)

while version.training_data.assignment_in_progress:
    time.sleep(10)
    version.refresh()
if version.training_data.assignment_error:
    print(version.training_data.assignment_error["message"])

Return type: CustomModelVersion

classmethod create_from_previous(custom_model_id, base_environment_id, is_major_update=True, folder_path=None, files=None, files_to_delete=None, network_egress_policy=None, maximum_memory=None, replicas=None, required_metadata_values=None, training_dataset_id=None, partition_column=None, holdout_dataset_id=None, keep_training_holdout_data=None, max_wait=600)¶

Create a custom model version containing files from a previous version.

Create a version with training/holdout data: If training/holdout data related parameters are provided, the training data is assigned asynchronously. In this case: * if max_wait is not None, function returns once job is finished. * if max_wait is None, function returns immediately, progress can be polled by the user, see examples.

If training data assignment fails, new version is still created, but it is not allowed to create a model package for the model version and to deploy it. To check for training data assignment error, check version.training_data.assignment_error[“message”].

New in version v2.21.

Parameters

custom_model_id: str: The ID of the custom model.
base_environment_id: str: The ID of the base environment to use with the custom model version.
is_major_update: bool, optional: The flag defining if a custom model version will be a minor or a major version. Defaults to True.
folder_path: str, optional: The path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path.
files: list, optional: The list of tuples, where values in each tuple are the local filesystem path and the path the file should be placed in the model. If list is of strings, then basenames will be used for tuples Example: [(“/home/user/Documents/myModel/file1.txt”, “file1.txt”), (“/home/user/Documents/myModel/folder/file2.txt”, “folder/file2.txt”)] or [“/home/user/Documents/myModel/file1.txt”, “/home/user/Documents/myModel/folder/file2.txt”]
files_to_delete: list, optional: The list of a file items ids to be deleted. Example: [“5ea95f7a4024030aba48e4f9”, “5ea6b5da402403181895cc51”]
network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional: Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS, datarobot.NETWORK_EGRESS_POLICY.PUBLIC]. Note: datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS value is only supported by the SaaS (cloud) environment.
maximum_memory: int, optional: The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s
replicas: int, optional: A fixed number of replicas that will be deployed in the cluster
required_metadata_values: List[RequiredMetadataValue]: Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
training_dataset_id: str, optional: The ID of the training dataset to assign to the custom model.
partition_column: str, optional: Name of a partition column in a training dataset assigned to the custom model. Can only be assigned for structured models.
holdout_dataset_id: str, optional: The ID of the holdout dataset to assign to the custom model. Can only be assigned for unstructured models.
keep_training_holdout_data: bool, optional: If the version should inherit training and holdout data from the previous version. Defaults to True. This field is only applicable if the model has training data for versions enabled, otherwise the field value will be ignored.
max_wait: int, optional: Max time to wait for training data assignment. If set to None - method will return without waiting. Defaults to 10 minutes.

Returns

CustomModelVersion: created custom model version

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.
datarobot.errors.InvalidUsageError: If wrong parameters are provided.
datarobot.errors.TrainingDataAssignmentError: If training data assignment fails.

Examples

Create a version with blocking (default max_wait=600) training data assignment:

import datarobot as dr
from datarobot.errors import TrainingDataAssignmentError

dr.Client(token=my_token, endpoint=endpoint)

try:
    version = dr.CustomModelVersion.create_from_previous(
        custom_model_id="6444482e5583f6ee2e572265",
        base_environment_id="642209acc563893014a41e24",
        training_dataset_id="6421f2149a4f9b1bec6ad6dd",
    )
except TrainingDataAssignmentError as e:
    print(e)

Create a version with non-blocking training data assignment:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)

version = dr.CustomModelVersion.create_from_previous(
    custom_model_id="6444482e5583f6ee2e572265",
    base_environment_id="642209acc563893014a41e24",
    training_dataset_id="6421f2149a4f9b1bec6ad6dd",
    max_wait=None,
)

while version.training_data.assignment_in_progress:
    time.sleep(10)
    version.refresh()
if version.training_data.assignment_error:
    print(version.training_data.assignment_error["message"])

Return type: CustomModelVersion

classmethod list(custom_model_id)¶

List custom model versions.

New in version v2.21.

Parameters

custom_model_id: str: The ID of the custom model.

Returns

List[CustomModelVersion]: A list of custom model versions.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: List[CustomModelVersion]

classmethod get(custom_model_id, custom_model_version_id)¶

Get custom model version by id.

New in version v2.21.

Parameters

custom_model_id: str: The ID of the custom model.
custom_model_version_id: str: The id of the custom model version to retrieve.

Returns

CustomModelVersion: Retrieved custom model version.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: CustomModelVersion

download(file_path)¶

Download custom model version.

New in version v2.21.

Parameters

file_path: str: Path to create a file with custom model version content.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: None

update(description=None, required_metadata_values=None)¶

Update custom model version properties.

New in version v2.21.

Parameters

description: str, optional: New custom model version description.
required_metadata_values: List[RequiredMetadataValue], optional: Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: None

refresh()¶

Update custom model version with the latest data from server.

New in version v2.21.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: None

get_feature_impact(with_metadata=False)¶

Get custom model feature impact.

New in version v2.23.

Parameters

with_metadatabool: The flag indicating if the result should include the metadata as well.

Returns

feature_impactslist of dict: The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: List[Dict[str, Any]]

calculate_feature_impact(max_wait=600)¶

Calculate custom model feature impact.

New in version v2.23.

Parameters

max_wait: int, optional: Max time to wait for feature impact calculation. If set to None - method will return without waiting. Defaults to 10 min

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: None

class datarobot.models.execution_environment.RequiredMetadataKey(**kwargs)¶

Definition of a metadata key that custom models using this environment must define

New in version v2.25.

Attributes

field_name: str: The required field key. This value will be added as an environment variable when running custom models.
display_name: str: A human readable name for the required field.

class datarobot.models.CustomModelVersionConversion(**kwargs)¶

A conversion of a DataRobot custom model version.

New in version v2.27.

Attributes

id: str: The ID of the custom model version conversion.
custom_model_version_id: str: The ID of the custom model version.
created: str: ISO-8601 timestamp of when the custom model conversion created.
main_program_item_id: str or None: The ID of the main program item.
log_message: str or None: The conversion output log message.
generated_metadata: dict or None: The dict contains two items: ‘outputDataset’ & ‘outputColumns’.
conversion_succeeded: bool: Whether the conversion succeeded or not.
conversion_in_progress: bool: Whether a given conversion is in progress or not.
should_stop: bool: Whether the user asked to stop a conversion.

classmethod run_conversion(custom_model_id, custom_model_version_id, main_program_item_id, max_wait=None)¶

Initiate a new custom model version conversion.

Parameters

custom_model_idstr: The associated custom model ID.
custom_model_version_idstr: The associated custom model version ID.
main_program_item_idstr: The selected main program item ID. This should be one of the SAS items in the associated custom model version.
max_wait: int or None: Max wait time in seconds. If None, then don’t wait.

Returns

conversion_idstr: The ID of the newly created conversion entity.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: str

classmethod stop_conversion(custom_model_id, custom_model_version_id, conversion_id)¶

Stop a conversion that is in progress.

Parameters

custom_model_idstr: The ID of the associated custom model.
custom_model_version_idstr: The ID of the associated custom model version.
conversion_id: THe ID of a conversion that is in-progress.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: Response

classmethod get(custom_model_id, custom_model_version_id, conversion_id)¶

Get custom model version conversion by id.

New in version v2.27.

Parameters

custom_model_id: str: The ID of the custom model.
custom_model_version_id: str: The ID of the custom model version.
conversion_id: str: The ID of the conversion to retrieve.

Returns

CustomModelVersionConversion: Retrieved custom model version conversion.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: CustomModelVersionConversion

classmethod get_latest(custom_model_id, custom_model_version_id)¶

Get latest custom model version conversion for a given custom model version.

New in version v2.27.

Parameters

custom_model_id: str: The ID of the custom model.
custom_model_version_id: str: The ID of the custom model version.

Returns

CustomModelVersionConversion or None: Retrieved latest conversion for a given custom model version.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: Optional[CustomModelVersionConversion]

classmethod list(custom_model_id, custom_model_version_id)¶

Get custom model version conversions list per custom model version.

New in version v2.27.

Parameters

custom_model_id: str: The ID of the custom model.
custom_model_version_id: str: The ID of the custom model version.

Returns

List[CustomModelVersionConversion]: Retrieved conversions for a given custom model version.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: List[CustomModelVersionConversion]

class datarobot.CustomModelVersionDependencyBuild(**kwargs)¶

Metadata about a DataRobot custom model version’s dependency build

New in version v2.22.

Attributes

custom_model_id: str: The ID of the custom model.
custom_model_version_id: str: The ID of the custom model version.
build_status: str: The status of the custom model version’s dependency build.
started_at: str: ISO-8601 formatted timestamp of when the build was started.
completed_at: str, optional: ISO-8601 formatted timestamp of when the build has completed.

classmethod get_build_info(custom_model_id, custom_model_version_id)¶

Retrieve information about a custom model version’s dependency build

New in version v2.22.

Parameters

custom_model_id: str: The ID of the custom model.
custom_model_version_id: str: The ID of the custom model version.

Returns

CustomModelVersionDependencyBuild: The dependency build information.

Return type: CustomModelVersionDependencyBuild

classmethod start_build(custom_model_id, custom_model_version_id, max_wait=600)¶

Start the dependency build for a custom model version dependency build

New in version v2.22.

Parameters

custom_model_id: str: The ID of the custom model
custom_model_version_id: str: the ID of the custom model version
max_wait: int, optional: Max time to wait for a build completion. If set to None - method will return without waiting.

Return type: Optional[CustomModelVersionDependencyBuild]

get_log()¶

Get log of a custom model version dependency build.

New in version v2.22.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: str

cancel()¶

Cancel custom model version dependency build that is in progress.

New in version v2.22.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: None

refresh()¶

Update custom model version dependency build with the latest data from server.

New in version v2.22.

Raises

datarobot.errors.ClientError: If the server responded with 4xx status.
datarobot.errors.ServerError: If the server responded with 5xx status.

Return type: None

class datarobot.ExecutionEnvironment(**kwargs)¶

An execution environment entity.

New in version v2.21.

Attributes

id: str: the id of the execution environment
name: str: the name of the execution environment
description: str, optional: the description of the execution environment
programming_language: str, optional: the programming language of the execution environment. Can be “python”, “r”, “java” or “other”
is_public: bool, optional: public accessibility of environment, visible only for admin user
created_at: str, optional: ISO-8601 formatted timestamp of when the execution environment version was created
latest_version: ExecutionEnvironmentVersion, optional: the latest version of the execution environment

classmethod create(name, description=None, programming_language=None, required_metadata_keys=None)¶

Create an execution environment.

New in version v2.21.

Parameters

name: str: execution environment name
description: str, optional: execution environment description
programming_language: str, optional: programming language of the environment to be created. Can be “python”, “r”, “java” or “other”. Default value - “other”
required_metadata_keys: List[RequiredMetadataKey]: Definition of a metadata keys that custom models using this environment must define

Returns

ExecutionEnvironment: created execution environment

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

classmethod list(search_for=None)¶

List execution environments available to the user.

New in version v2.21.

Parameters

search_for: str, optional: the string for filtering execution environment - only execution environments that contain the string in name or description will be returned.

Returns

List[ExecutionEnvironment]: a list of execution environments.

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

classmethod get(execution_environment_id)¶

Get execution environment by it’s id.

New in version v2.21.

Parameters

execution_environment_id: str: ID of the execution environment to retrieve

Returns

ExecutionEnvironment: retrieved execution environment

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

delete()¶

Delete execution environment.

New in version v2.21.

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

update(name=None, description=None, required_metadata_keys=None)¶

Update execution environment properties.

New in version v2.21.

Parameters

name: str, optional: new execution environment name
description: str, optional: new execution environment description
required_metadata_keys: List[RequiredMetadataKey]: Definition of a metadata keys that custom models using this environment must define

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

refresh()¶

Update execution environment with the latest data from server.

New in version v2.21.

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

class datarobot.ExecutionEnvironmentVersion(**kwargs)¶

A version of a DataRobot execution environment.

New in version v2.21.

Attributes

id: str: the id of the execution environment version
environment_id: str: the id of the execution environment the version belongs to
build_status: str: the status of the execution environment version build
label: str, optional: the label of the execution environment version
description: str, optional: the description of the execution environment version
created_at: str, optional: ISO-8601 formatted timestamp of when the execution environment version was created
docker_context_size: int, optional: The size of the uploaded Docker context in bytes if available or None if not
docker_image_size: int, optional: The size of the built Docker image in bytes if available or None if not

classmethod create(execution_environment_id, docker_context_path, label=None, description=None, max_wait=600)¶

Create an execution environment version.

New in version v2.21.

Parameters

execution_environment_id: str: the id of the execution environment
docker_context_path: str: the path to a docker context archive or folder
label: str, optional: short human readable string to label the version
description: str, optional: execution environment version description
max_wait: int, optional: max time to wait for a final build status (“success” or “failed”). If set to None - method will return without waiting.

Returns

ExecutionEnvironmentVersion: created execution environment version

Raises

datarobot.errors.AsyncTimeoutError: if version did not reach final state during timeout seconds
datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

classmethod list(execution_environment_id, build_status=None)¶

List execution environment versions available to the user.

New in version v2.21.

Parameters

execution_environment_id: str: the id of the execution environment
build_status: str, optional: build status of the execution environment version to filter by. See datarobot.enums.EXECUTION_ENVIRONMENT_VERSION_BUILD_STATUS for valid options

Returns

List[ExecutionEnvironmentVersion]: a list of execution environment versions.

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

classmethod get(execution_environment_id, version_id)¶

Get execution environment version by id.

New in version v2.21.

Parameters

execution_environment_id: str: the id of the execution environment
version_id: str: the id of the execution environment version to retrieve

Returns

ExecutionEnvironmentVersion: retrieved execution environment version

Raises

datarobot.errors.ClientError: if the server responded with 4xx status.
datarobot.errors.ServerError: if the server responded with 5xx status.

download(file_path)¶

Download execution environment version.

New in version v2.21.

Parameters

file_path: str: path to create a file with execution environment version content

Returns

ExecutionEnvironmentVersion: retrieved execution environment version

Raises

datarobot.errors.ClientError: if the server responded with 4xx status.
datarobot.errors.ServerError: if the server responded with 5xx status.

get_build_log()¶

Get execution environment version build log and error.

New in version v2.21.

Returns

Tuple[str, str]: retrieved execution environment version build log and error. If there is no build error - None is returned.

Raises

datarobot.errors.ClientError: if the server responded with 4xx status.
datarobot.errors.ServerError: if the server responded with 5xx status.

refresh()¶

Update execution environment version with the latest data from server.

New in version v2.21.

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

class datarobot.models.custom_model_version.HoldoutData(dataset_id=None, dataset_version_id=None, dataset_name=None, partition_column=None)¶

Holdout data assigned to a DataRobot custom model version.

New in version v3.2.

Attributes

dataset_id: str: The ID of the dataset.
dataset_version_id: str: The ID of the dataset version.
dataset_name: str: The name of the dataset.
partition_column: str: The name of the partitions column.

class datarobot.models.custom_model_version.TrainingData(dataset_id=None, dataset_version_id=None, dataset_name=None, assignment_in_progress=None, assignment_error=None)¶

Training data assigned to a DataRobot custom model version.

New in version v3.2.

Attributes

dataset_id: str: The ID of the dataset.
dataset_version_id: str: The ID of the dataset version.
dataset_name: str: The name of the dataset.
assignment_in_progress: bool: The status of the assignment in progress.
assignment_error: dict: The assignment error message.

Custom Tasks¶

class datarobot.CustomTask(id, target_type, latest_version, created_at, updated_at, name, description, language, created_by, calibrate_predictions=None)¶

A custom task. This can be in a partial state or a complete state. When the latest_version is None, the empty task has been initialized with some metadata. It is not yet use-able for actual training. Once the first CustomTaskVersion has been created, you can put the CustomTask in UserBlueprints to train Models in Projects

New in version v2.26.

Attributes

id: str

id of the custom task

name: str

name of the custom task

language: str

programming language of the custom task. Can be “python”, “r”, “java” or “other”

description: str

description of the custom task

target_type: datarobot.enums.CUSTOM_TASK_TARGET_TYPE

the target type of the custom task. One of:

datarobot.enums.CUSTOM_TASK_TARGET_TYPE.BINARY
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.REGRESSION
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.MULTICLASS
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.ANOMALY
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.TRANSFORM

latest_version: datarobot.CustomTaskVersion or None

latest version of the custom task if the task has a latest version. If the latest version is None, the custom task is not ready for use in user blueprints. You must create its first CustomTaskVersion before you can use the CustomTask

created_by: str

The username of the user who created the custom task.

updated_at: str

An ISO-8601 formatted timestamp of when the custom task was updated.

created_at: str

ISO-8601 formatted timestamp of when the custom task was created

calibrate_predictions: bool

whether anomaly predictions should be calibrated to be between 0 and 1 by DR. only applies to custom estimators with target type datarobot.enums.CUSTOM_TASK_TARGET_TYPE.ANOMALY

classmethod from_server_data(data, keep_attrs=None)¶

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters

datadict: The directly translated dict of JSON from the server. No casing fixes have taken place
keep_attrsiterable: List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

Return type: CustomTask

classmethod list(order_by=None, search_for=None)¶

List custom tasks available to the user.

New in version v2.26.

Parameters

search_for: str, optional: string for filtering custom tasks - only tasks that contain the string in name or description will be returned. If not specified, all custom task will be returned
order_by: str, optional: property to sort custom tasks by. Supported properties are “created” and “updated”. Prefix the attribute name with a dash to sort in descending order, e.g. order_by=’-created’. By default, the order_by parameter is None which will result in custom tasks being returned in order of creation time descending

Returns

List[CustomTask]: a list of custom tasks.

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: List[CustomTask]

classmethod get(custom_task_id)¶

Get custom task by id.

New in version v2.26.

Parameters

custom_task_id: str: id of the custom task

Returns

CustomTask: retrieved custom task

Raises

datarobot.errors.ClientError: if the server responded with 4xx status.
datarobot.errors.ServerError: if the server responded with 5xx status.

Return type: CustomTask

classmethod copy(custom_task_id)¶

Create a custom task by copying existing one.

New in version v2.26.

Parameters

custom_task_id: str: id of the custom task to copy

Returns

CustomTask

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: CustomTask

classmethod create(name, target_type, language=None, description=None, calibrate_predictions=None, **kwargs)¶

Creates only the metadata for a custom task. This task will not be use-able until you have created a CustomTaskVersion attached to this task.

New in version v2.26.

Parameters

name: str

name of the custom task

target_type: datarobot.enums.CUSTOM_TASK_TARGET_TYPE

the target typed based on the following values. Anything else will raise an error

datarobot.enums.CUSTOM_TASK_TARGET_TYPE.BINARY
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.REGRESSION
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.MULTICLASS
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.ANOMALY
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.TRANSFORM

language: str, optional

programming language of the custom task. Can be “python”, “r”, “java” or “other”

description: str, optional

description of the custom task

calibrate_predictions: bool, optional

whether anomaly predictions should be calibrated to be between 0 and 1 by DR. if None, uses default value from DR app (True). only applies to custom estimators with target type datarobot.enums.CUSTOM_TASK_TARGET_TYPE.ANOMALY

Returns

CustomTask

Raises

datarobot.errors.ClientError: if the server responded with 4xx status.
datarobot.errors.ServerError: if the server responded with 5xx status.

Return type: CustomTask

update(name=None, language=None, description=None, **kwargs)¶

Update custom task properties.

New in version v2.26.

Parameters

name: str, optional: new custom task name
language: str, optional: new custom task programming language
description: str, optional: new custom task description

Raises

datarobot.errors.ClientError: if the server responded with 4xx status.
datarobot.errors.ServerError: if the server responded with 5xx status.

Return type: None

refresh()¶

Update custom task with the latest data from server.

New in version v2.26.

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: None

delete()¶

Delete custom task.

New in version v2.26.

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: None

download_latest_version(file_path)¶

Download the latest custom task version.

New in version v2.26.

Parameters

file_path: str: the full path of the target zip file

Raises

datarobot.errors.ClientError: if the server responded with 4xx status.
datarobot.errors.ServerError: if the server responded with 5xx status.

Return type: None

get_access_list()¶

Retrieve access control settings of this custom task.

New in version v2.27.

Returns

list ofclass:SharingAccess <datarobot.SharingAccess>

Return type: List[SharingAccess]

share(access_list)¶

Update the access control settings of this custom task.

New in version v2.27.

Parameters

access_listlist of SharingAccess: A list of SharingAccess to update.

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Examples

Transfer access to the custom task from old_user@datarobot.com to new_user@datarobot.com

import datarobot as dr

new_access = dr.SharingAccess(new_user@datarobot.com,
                              dr.enums.SHARING_ROLE.OWNER, can_share=True)
access_list = [dr.SharingAccess(old_user@datarobot.com, None), new_access]

dr.CustomTask.get('custom-task-id').share(access_list)

Return type: None

class datarobot.models.custom_task_version.CustomTaskFileItem(id, file_name, file_path, file_source, created_at=None)¶

A file item attached to a DataRobot custom task version.

New in version v2.26.

Attributes

id: str: id of the file item
file_name: str: name of the file item
file_path: str: path of the file item
file_source: str: source of the file item
created_at: str: ISO-8601 formatted timestamp of when the version was created

class datarobot.enums.CustomTaskOutgoingNetworkPolicy(value)¶: The way to set and view a CustomTaskVersions outgoing network policy.

class datarobot.CustomTaskVersion(id, custom_task_id, version_major, version_minor, label, created_at, is_frozen, items, description=None, base_environment_id=None, maximum_memory=None, base_environment_version_id=None, dependencies=None, required_metadata_values=None, arguments=None, outgoing_network_policy=None)¶

A version of a DataRobot custom task.

New in version v2.26.

Attributes

id: str: id of the custom task version
custom_task_id: str: id of the custom task
version_minor: int: a minor version number of custom task version
version_major: int: a major version number of custom task version
label: str: short human readable string to label the version
created_at: str: ISO-8601 formatted timestamp of when the version was created
is_frozen: bool: a flag if the custom task version is frozen
items: List[CustomTaskFileItem]: a list of file items attached to the custom task version
description: str, optional: custom task version description
base_environment_id: str, optional: id of the environment to use with the task
base_environment_version_id: str, optional: id of the environment version to use with the task
dependencies: List[CustomDependency]: the parsed dependencies of the custom task version if the version has a valid requirements.txt file
required_metadata_values: List[RequiredMetadataValue]: Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
arguments: List[UserBlueprintTaskArgument]: A list of custom task version arguments.
outgoing_network_policy: Optional[CustomTaskOutgoingNetworkPolicy]

classmethod from_server_data(data, keep_attrs=None)¶

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters

datadict: The directly translated dict of JSON from the server. No casing fixes have taken place
keep_attrsiterable: List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

classmethod create_clean(custom_task_id, base_environment_id, maximum_memory=None, is_major_update=True, folder_path=None, required_metadata_values=None, outgoing_network_policy=None)¶

Create a custom task version without files from previous versions.

New in version v2.26.

Parameters

custom_task_id: str: the id of the custom task
base_environment_id: str: the id of the base environment to use with the custom task version
maximum_memory: Optional[int]: A number in bytes about how much memory custom tasks’ inference containers can run with.
is_major_update: bool: If the current version is 2.3, True would set the new version at 3.0. False would set the new version at 2.4. Defaults to True.
folder_path: Optional[str]: The path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path.
required_metadata_values: Optional[List[RequiredMetadataValue]]: Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
outgoing_network_policy: Optional[CustomTaskOutgoingNetworkPolicy]: You must enable custom task network access permissions to pass any value other than None! Specifies if you custom task version is able to make network calls. None will set the value to DataRobot’s default.

Returns

CustomTaskVersion: created custom task version

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

classmethod create_from_previous(custom_task_id, base_environment_id, maximum_memory=None, is_major_update=True, folder_path=None, files_to_delete=None, required_metadata_values=None, outgoing_network_policy=None)¶

Create a custom task version containing files from a previous version.

New in version v2.26.

Parameters

custom_task_id: str: the id of the custom task
base_environment_id: str: the id of the base environment to use with the custom task version
maximum_memory: Optional[int]: A number in bytes about how much memory custom tasks’ inference containers can run with.
is_major_update: bool: If the current version is 2.3, True would set the new version at 3.0. False would set the new version at 2.4. Defaults to True.
folder_path: Optional[str]: The path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path.
files_to_delete: Optional[List[str]]: the list of a file items ids to be deleted Example: [“5ea95f7a4024030aba48e4f9”, “5ea6b5da402403181895cc51”]
required_metadata_values: Optional[List[RequiredMetadataValue]]: Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
outgoing_network_policy: Optional[CustomTaskOutgoingNetworkPolicy]: You must enable custom task network access permissions to pass any value other than None! Specifies if you custom task version is able to make network calls. None will get the value from the previous version if you have the proper permissions or use DataRobot’s default.

Returns

CustomTaskVersion: created custom task version

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

classmethod list(custom_task_id)¶

List custom task versions.

New in version v2.26.

Parameters

custom_task_id: str: the id of the custom task

Returns

List[CustomTaskVersion]: a list of custom task versions

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

classmethod get(custom_task_id, custom_task_version_id)¶

Get custom task version by id.

New in version v2.26.

Parameters

custom_task_id: str: the id of the custom task
custom_task_version_id: str: the id of the custom task version to retrieve

Returns

CustomTaskVersion: retrieved custom task version

Raises

datarobot.errors.ClientError: if the server responded with 4xx status.
datarobot.errors.ServerError: if the server responded with 5xx status.

download(file_path)¶

Download custom task version.

New in version v2.26.

Parameters

file_path: str: path to create a file with custom task version content

Raises

datarobot.errors.ClientError: if the server responded with 4xx status.
datarobot.errors.ServerError: if the server responded with 5xx status.

update(description=None, required_metadata_values=None)¶

Update custom task version properties.

New in version v2.26.

Parameters

description: str: new custom task version description
required_metadata_values: List[RequiredMetadataValue]: Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.

Raises

datarobot.errors.ClientError: if the server responded with 4xx status.
datarobot.errors.ServerError: if the server responded with 5xx status.

refresh()¶

Update custom task version with the latest data from server.

New in version v2.26.

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

start_dependency_build()¶

Start the dependency build for a custom task version and return build status. .. versionadded:: v2.27

Returns

CustomTaskVersionDependencyBuild: DTO of custom task version dependency build.

start_dependency_build_and_wait(max_wait)¶

Start the dependency build for a custom task version and wait while pulling status. .. versionadded:: v2.27

Parameters

max_wait: int: max time to wait for a build completion

Returns

CustomTaskVersionDependencyBuild: DTO of custom task version dependency build.

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status
datarobot.errors.AsyncTimeoutError: Raised if the dependency build is not finished after max_wait.

cancel_dependency_build()¶

Cancel custom task version dependency build that is in progress. .. versionadded:: v2.27

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

get_dependency_build()¶

Retrieve information about a custom task version’s dependency build. .. versionadded:: v2.27

Returns

CustomTaskVersionDependencyBuild: DTO of custom task version dependency build.

download_dependency_build_log(file_directory='.')¶

Get log of a custom task version dependency build. .. versionadded:: v2.27

Parameters

file_directory: str (optional, default is “.”): Directory path where downloaded file is to save.

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Database Connectivity¶

class datarobot.DataDriver(id=None, creator=None, base_names=None, class_name=None, canonical_name=None)¶

A data driver

Attributes

idstr: the id of the driver.
class_namestr: the Java class name for the driver.
canonical_namestr: the user-friendly name of the driver.
creatorstr: the id of the user who created the driver.
base_nameslist of str: a list of the file name(s) of the jar files.

classmethod list()¶

Returns list of available drivers.

Returns

driverslist of DataDriver instances: contains a list of available drivers.

Examples

>>> import datarobot as dr
>>> drivers = dr.DataDriver.list()
>>> drivers
[DataDriver('mysql'), DataDriver('RedShift'), DataDriver('PostgreSQL')]

Return type: List[DataDriver]

classmethod get(driver_id)¶

Gets the driver.

Parameters

driver_idstr: the identifier of the driver.

Returns

driverDataDriver: the required driver.

Examples

>>> import datarobot as dr
>>> driver = dr.DataDriver.get('5ad08a1889453d0001ea7c5c')
>>> driver
DataDriver('PostgreSQL')

Return type: DataDriver

classmethod create(class_name, canonical_name, files)¶

Creates the driver. Only available to admin users.

Parameters

class_namestr: the Java class name for the driver.
canonical_namestr: the user-friendly name of the driver.
fileslist of str: a list of the file paths on file system file_path(s) for the driver.

Returns

driverDataDriver: the created driver.

Raises

ClientError: raised if user is not granted for Can manage JDBC database drivers feature

Examples

>>> import datarobot as dr
>>> driver = dr.DataDriver.create(
...     class_name='org.postgresql.Driver',
...     canonical_name='PostgreSQL',
...     files=['/tmp/postgresql-42.2.2.jar']
... )
>>> driver
DataDriver('PostgreSQL')

Return type: DataDriver

update(class_name=None, canonical_name=None)¶

Updates the driver. Only available to admin users.

Parameters

class_namestr: the Java class name for the driver.
canonical_namestr: the user-friendly name of the driver.

Raises

ClientError: raised if user is not granted for Can manage JDBC database drivers feature

Examples

>>> import datarobot as dr
>>> driver = dr.DataDriver.get('5ad08a1889453d0001ea7c5c')
>>> driver.canonical_name
'PostgreSQL'
>>> driver.update(canonical_name='postgres')
>>> driver.canonical_name
'postgres'

Return type: None

delete()¶

Removes the driver. Only available to admin users.

Raises

ClientError: raised if user is not granted for Can manage JDBC database drivers feature

Return type: None

class datarobot.Connector(id=None, creator_id=None, configuration_id=None, base_name=None, canonical_name=None)¶

A connector

Attributes

idstr: the id of the connector.
creator_idstr: the id of the user who created the connector.
base_namestr: the file name of the jar file.
canonical_namestr: the user-friendly name of the connector.
configuration_idstr: the id of the configuration of the connector.

classmethod list()¶

Returns list of available connectors.

Returns

connectorslist of Connector instances: contains a list of available connectors.

Examples

>>> import datarobot as dr
>>> connectors = dr.Connector.list()
>>> connectors
[Connector('ADLS Gen2 Connector'), Connector('S3 Connector')]

Return type: List[Connector]

classmethod get(connector_id)¶

Gets the connector.

Parameters

connector_idstr: the identifier of the connector.

Returns

connectorConnector: the required connector.

Examples

>>> import datarobot as dr
>>> connector = dr.Connector.get('5fe1063e1c075e0245071446')
>>> connector
Connector('ADLS Gen2 Connector')

Return type: Connector

classmethod create(file_path)¶

Creates the connector from a jar file. Only available to admin users.

Parameters

file_pathstr: the file path on file system file_path(s) for the connector.

Returns

connectorConnector: the created connector.

Raises

ClientError: raised if user is not granted for Can manage connectors feature

Examples

>>> import datarobot as dr
>>> connector = dr.Connector.create('/tmp/connector-adls-gen2.jar')
>>> connector
Connector('ADLS Gen2 Connector')

Return type: Connector

update(file_path)¶

Updates the connector with new jar file. Only available to admin users.

Parameters

file_pathstr: the file path on file system file_path(s) for the connector.

Returns

connectorConnector: the updated connector.

Raises

ClientError: raised if user is not granted for Can manage connectors feature

Examples

>>> import datarobot as dr
>>> connector = dr.Connector.get('5fe1063e1c075e0245071446')
>>> connector.base_name
'connector-adls-gen2.jar'
>>> connector.update('/tmp/connector-s3.jar')
>>> connector.base_name
'connector-s3.jar'

Return type: Connector

delete()¶

Removes the connector. Only available to admin users.

Raises

ClientError: raised if user is not granted for Can manage connectors feature

Return type: None

class datarobot.DataStore(data_store_id=None, data_store_type=None, canonical_name=None, creator=None, updated=None, params=None, role=None)¶

A data store. Represents database

Attributes

idstr: The id of the data store.
data_store_typestr: The type of data store.
canonical_namestr: The user-friendly name of the data store.
creatorstr: The id of the user who created the data store.
updateddatetime.datetime: The time of the last update
paramsDataStoreParameters: A list specifying data store parameters.
rolestr: Your access role for this data store.

classmethod list(typ=None)¶

Returns list of available data stores.

Parameters

typstr: If specified, filters by specified data store type.

Returns

data_storeslist of DataStore instances: contains a list of available data stores.

Examples

>>> import datarobot as dr
>>> data_stores = dr.DataStore.list()
>>> data_stores
[DataStore('Demo'), DataStore('Airlines')]

Return type: List[DataStore]

classmethod get(data_store_id)¶

Gets the data store.

Parameters

data_store_idstr: the identifier of the data store.

Returns

data_storeDataStore: the required data store.

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5a8ac90b07a57a0001be501e')
>>> data_store
DataStore('Demo')

Return type: DataStore

classmethod create(data_store_type, canonical_name, driver_id, jdbc_url=None, fields=None)¶

Creates the data store.

Parameters

data_store_typestr: the type of data store.
canonical_namestr: the user-friendly name of the data store.
driver_idstr: the identifier of the DataDriver.
jdbc_urlstr: Optional. The full JDBC URL (for example: jdbc:postgresql://my.dbaddress.org:5432/my_db).
fields: list: Optional. If the type is dr-database-v1, then the fields specify the configuration.

Returns

data_storeDataStore: the created data store.

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.create(
...     data_store_type='jdbc',
...     canonical_name='Demo DB',
...     driver_id='5a6af02eb15372000117c040',
...     jdbc_url='jdbc:postgresql://my.db.address.org:5432/perftest'
... )
>>> data_store
DataStore('Demo DB')

Return type: DataStore

update(canonical_name=None, driver_id=None, jdbc_url=None, fields=None)¶

Updates the data store.

Parameters

canonical_namestr: optional, the user-friendly name of the data store.
driver_idstr: optional, the identifier of the DataDriver.
jdbc_urlstr: Optional. The full JDBC URL (for example: jdbc:postgresql://my.dbaddress.org:5432/my_db).
fields: list: Optional. If the type is dr-database-v1, then the fields specify the configuration.

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae')
>>> data_store
DataStore('Demo DB')
>>> data_store.update(canonical_name='Demo DB updated')
>>> data_store
DataStore('Demo DB updated')

Return type: None

delete()¶

Removes the DataStore

Return type: None

test(username=None, password=None, credential_id=None, use_kerberos=None, credential_data=None)¶

Tests database connection.

Changed in version v3.2: Added credential_id, use_kerberos and credential_data optional params and made username and password optional.

Parameters

usernamestr: optional, the username for database authentication.
passwordstr: optional, the password for database authentication. The password is encrypted at server side and never saved / stored
credential_idstr: optional, id of the set of credentials to use instead of username and password
use_kerberosbool: optional, whether to use Kerberos for data store authentication
credential_datadict: optional, the credentials to authenticate with the database, to use instead of user/password or credential ID

Returns

messagedict: message with status.

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae')
>>> data_store.test(username='db_username', password='db_password')
{'message': 'Connection successful'}

Return type: TestResponse

schemas(username, password)¶

Returns list of available schemas.

Parameters

usernamestr: the username for database authentication.
passwordstr: the password for database authentication. The password is encrypted at server side and never saved / stored

Returns

responsedict: dict with database name and list of str - available schemas

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae')
>>> data_store.schemas(username='db_username', password='db_password')
{'catalog': 'perftest', 'schemas': ['demo', 'information_schema', 'public']}

Return type: SchemasResponse

tables(username, password, schema=None)¶

Returns list of available tables in schema.

Parameters

usernamestr: optional, the username for database authentication.
passwordstr: optional, the password for database authentication. The password is encrypted at server side and never saved / stored
schemastr: optional, the schema name.

Returns

responsedict: dict with catalog name and tables info

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae')
>>> data_store.tables(username='db_username', password='db_password', schema='demo')
{'tables': [{'type': 'TABLE', 'name': 'diagnosis', 'schema': 'demo'}, {'type': 'TABLE',
'name': 'kickcars', 'schema': 'demo'}, {'type': 'TABLE', 'name': 'patient',
'schema': 'demo'}, {'type': 'TABLE', 'name': 'transcript', 'schema': 'demo'}],
'catalog': 'perftest'}

Return type: TablesResponse

classmethod from_server_data(data, keep_attrs=None)¶

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters

datadict: The directly translated dict of JSON from the server. No casing fixes have taken place
keep_attrsiterable: List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

Return type: DataStore

get_access_list()¶

Retrieve what users have access to this data store

New in version v2.14.

Returns

list ofclass:SharingAccess <datarobot.SharingAccess>

Return type: List[SharingAccess]

get_shared_roles()¶

Retrieve what users have access to this data store

New in version v3.2.

Returns

list ofclass:SharingRole <datarobot.models.sharing.SharingRole>

Return type: List[SharingRole]

share(access_list)¶

Modify the ability of users to access this data store

New in version v2.14.

Parameters

access_listlist of SharingRole: the modifications to make.

Raises

datarobot.ClientError: if you do not have permission to share this data store, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the data store without an owner.

Examples

The SharingRole class is needed in order to share a Data Store with one or more users.

For example, suppose you had a list of user IDs you wanted to share this DataStore with. You could use a loop to generate a list of SharingRole objects for them, and bulk share this Data Store.

>>> import datarobot as dr
>>> from datarobot.models.sharing import SharingRole
>>> from datarobot.enums import SHARING_ROLE, SHARING_RECIPIENT_TYPE
>>>
>>> user_ids = ["60912e09fd1f04e832a575c1", "639ce542862e9b1b1bfa8f1b", "63e185e7cd3a5f8e190c6393"]
>>> sharing_roles = []
>>> for user_id in user_ids:
...     new_sharing_role = SharingRole(
...         role=SHARING_ROLE.CONSUMER,
...         share_recipient_type=SHARING_RECIPIENT_TYPE.USER,
...         id=user_id,
...         can_share=True,
...     )
...     sharing_roles.append(new_sharing_role)
>>> dr.DataStore.get('my-data-store-id').share(access_list)

Similarly, a SharingRole instance can be used to remove a user’s access if the role is set to SHARING_ROLE.NO_ROLE, like in this example:

>>> import datarobot as dr
>>> from datarobot.models.sharing import SharingRole
>>> from datarobot.enums import SHARING_ROLE, SHARING_RECIPIENT_TYPE
>>>
>>> user_to_remove = "[email protected]"
... remove_sharing_role = SharingRole(
...     role=SHARING_ROLE.NO_ROLE,
...     share_recipient_type=SHARING_RECIPIENT_TYPE.USER,
...     username=user_to_remove,
...     can_share=False,
... )
>>> dr.DataStore.get('my-data-store-id').share(roles=[remove_sharing_role])

Return type: None

class datarobot.DataSource(data_source_id=None, data_source_type=None, canonical_name=None, creator=None, updated=None, params=None, role=None)¶

A data source. Represents data request

Attributes

idstr: the id of the data source.
typestr: the type of data source.
canonical_namestr: the user-friendly name of the data source.
creatorstr: the id of the user who created the data source.
updateddatetime.datetime: the time of the last update.
paramsDataSourceParameters: a list specifying data source parameters.
rolestr or None: if a string, represents a particular level of access and should be one of datarobot.enums.SHARING_ROLE. For more information on the specific access levels, see the sharing documentation. If None, can be passed to a share function to revoke access for a specific user.

classmethod list()¶

Returns list of available data sources.

Returns

data_sourceslist of DataSource instances: contains a list of available data sources.

Examples

>>> import datarobot as dr
>>> data_sources = dr.DataSource.list()
>>> data_sources
[DataSource('Diagnostics'), DataSource('Airlines 100mb'), DataSource('Airlines 10mb')]

Return type: List[DataSource]

classmethod get(data_source_id)¶

Gets the data source.

Parameters

data_source_idstr: the identifier of the data source.

Returns

data_sourceDataSource: the requested data source.

Examples

>>> import datarobot as dr
>>> data_source = dr.DataSource.get('5a8ac9ab07a57a0001be501f')
>>> data_source
DataSource('Diagnostics')

Return type: TypeVar(TDataSource, bound= DataSource)

classmethod create(data_source_type, canonical_name, params)¶

Creates the data source.

Parameters

data_source_typestr: the type of data source.
canonical_namestr: the user-friendly name of the data source.
paramsDataSourceParameters: a list specifying data source parameters.

Returns

data_sourceDataSource: the created data source.

Examples

>>> import datarobot as dr
>>> params = dr.DataSourceParameters(
...     data_store_id='5a8ac90b07a57a0001be501e',
...     query='SELECT * FROM airlines10mb WHERE "Year" >= 1995;'
... )
>>> data_source = dr.DataSource.create(
...     data_source_type='jdbc',
...     canonical_name='airlines stats after 1995',
...     params=params
... )
>>> data_source
DataSource('airlines stats after 1995')

Return type: TypeVar(TDataSource, bound= DataSource)

update(canonical_name=None, params=None)¶

Creates the data source.

Parameters

canonical_namestr: optional, the user-friendly name of the data source.
paramsDataSourceParameters: optional, the identifier of the DataDriver.

Examples

>>> import datarobot as dr
>>> data_source = dr.DataSource.get('5ad840cc613b480001570953')
>>> data_source
DataSource('airlines stats after 1995')
>>> params = dr.DataSourceParameters(
...     query='SELECT * FROM airlines10mb WHERE "Year" >= 1990;'
... )
>>> data_source.update(
...     canonical_name='airlines stats after 1990',
...     params=params
... )
>>> data_source
DataSource('airlines stats after 1990')

Return type: None

delete()¶

Removes the DataSource

Return type: None

classmethod from_server_data(data, keep_attrs=None)¶

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters

datadict: The directly translated dict of JSON from the server. No casing fixes have taken place
keep_attrsiterable: List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

Return type: TypeVar(TDataSource, bound= DataSource)

get_access_list()¶

Retrieve what users have access to this data source

New in version v2.14.

Returns

list ofclass:SharingAccess <datarobot.SharingAccess>

Return type: List[SharingAccess]

share(access_list)¶

Modify the ability of users to access this data source

New in version v2.14.

Parameters

access_list: list ofclass:SharingAccess <datarobot.SharingAccess>: The modifications to make.

Raises

datarobot.ClientError:: If you do not have permission to share this data source, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the data source without an owner.

Examples

Transfer access to the data source from old_user@datarobot.com to new_user@datarobot.com

from datarobot.enums import SHARING_ROLE
from datarobot.models.data_source import DataSource
from datarobot.models.sharing import SharingAccess

new_access = SharingAccess(
    "[email protected]",
    SHARING_ROLE.OWNER,
    can_share=True,
)
access_list = [
    SharingAccess("[email protected]", SHARING_ROLE.OWNER, can_share=True),
    new_access,
]

DataSource.get('my-data-source-id').share(access_list)

Return type: None

create_dataset(username=None, password=None, do_snapshot=None, persist_data_after_ingestion=None, categories=None, credential_id=None, use_kerberos=None)¶

Create a Dataset from this data source.

New in version v2.22.

Parameters

username: string, optional: The username for database authentication.
password: string, optional: The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored.
do_snapshot: bool, optional: If unset, uses the server default: True. If true, creates a snapshot dataset; if false, creates a remote dataset. Creating snapshots from non-file sources requires an additional permission, Enable Create Snapshot Data Source.
persist_data_after_ingestion: bool, optional: If unset, uses the server default: True. If true, will enforce saving all data (for download and sampling) and will allow a user to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.). If false, will not enforce saving data. The data schema (feature names and types) still will be available. Specifying this parameter to false and doSnapshot to true will result in an error.
categories: list[string], optional: An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
credential_id: string, optional: The ID of the set of credentials to use instead of user and password. Note that with this change, username and password will become optional.
use_kerberos: bool, optional: If unset, uses the server default: False. If true, use kerberos authentication for database authentication.

Returns

response: Dataset: The Dataset created from the uploaded data

Return type: Dataset

class datarobot.DataSourceParameters(data_store_id=None, table=None, schema=None, partition_column=None, query=None, fetch_size=None)¶

Data request configuration

Attributes

data_store_idstr: the id of the DataStore.
tablestr: optional, the name of specified database table.
schemastr: optional, the name of the schema associated with the table.
partition_columnstr: optional, the name of the partition column.
querystr: optional, the user specified SQL query.
fetch_sizeint: optional, a user specified fetch size in the range [1, 20000]. By default a fetchSize will be assigned to balance throughput and memory usage

Datasets¶

class datarobot.models.Dataset(dataset_id, version_id, name, categories, created_at, is_data_engine_eligible, is_latest_version, is_snapshot, processing_state, created_by=None, data_persisted=None, size=None, row_count=None, recipe_id=None)¶

Represents a Dataset returned from the api/v2/datasets/ endpoints.

Attributes

id: string: The ID of this dataset
name: string: The name of this dataset in the catalog
is_latest_version: bool: Whether this dataset version is the latest version of this dataset
version_id: string: The object ID of the catalog_version the dataset belongs to
categories: list(string): An array of strings describing the intended use of the dataset. The supported options are “TRAINING” and “PREDICTION”.
created_at: string: The date when the dataset was created
created_by: string, optional: Username of the user who created the dataset
is_snapshot: bool: Whether the dataset version is an immutable snapshot of data which has previously been retrieved and saved to Data_robot
data_persisted: bool, optional: If true, user is allowed to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.) and download data. If false, download is not allowed and only the data schema (feature names and types) will be available.
is_data_engine_eligible: bool: Whether this dataset can be a data source of a data engine query.
processing_state: string: Current ingestion process state of the dataset
row_count: int, optional: The number of rows in the dataset.
size: int, optional: The size of the dataset as a CSV in bytes.

get_uri()¶

Returns

urlstr: Permanent static hyperlink to this dataset in AI Catalog.

Return type: str

classmethod upload(source)¶

This method covers Dataset creation from local materials (file & DataFrame) and a URL.

Parameters

source: str, pd.DataFrame or file object: Pass a URL, filepath, file or DataFrame to create and return a Dataset.

Returns

response: Dataset: The Dataset created from the uploaded data source.

Raises

InvalidUsageError: If the source parameter cannot be determined to be a URL, filepath, file or DataFrame.

Examples

# Upload a local file
dataset_one = Dataset.upload("./data/examples.csv")

# Create a dataset via URL
dataset_two = Dataset.upload(
    "https://raw.githubusercontent.com/curran/data/gh-pages/dbpedia/cities/data.csv"
)

# Create dataset with a pandas Dataframe
dataset_three = Dataset.upload(my_df)

# Create dataset using a local file
with open("./data/examples.csv", "rb") as file_pointer:
    dataset_four = Dataset.create_from_file(filelike=file_pointer)

Return type: TypeVar(TDataset, bound= Dataset)

classmethod create_from_file(cls, file_path=None, filelike=None, categories=None, read_timeout=600, max_wait=600, *, use_cases=None)¶

A blocking call that creates a new Dataset from a file. Returns when the dataset has been successfully uploaded and processed.

Warning: This function does not clean up it’s open files. If you pass a filelike, you are responsible for closing it. If you pass a file_path, this will create a file object from the file_path but will not close it.

Parameters

file_path: string, optional: The path to the file. This will create a file object pointing to that file but will not close it.
filelike: file, optional: An open and readable file object.
categories: list[string], optional: An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
read_timeout: int, optional: The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete
max_wait: int, optional: Time in seconds after which dataset creation is considered unsuccessful
use_cases: list[UseCase] | UseCase | list[string] | string, optional: A list of UseCase objects, UseCase object, list of Use Case ids or a single Use Case id to add this new Dataset to. Must be a kwarg.

Returns

response: Dataset: A fully armed and operational Dataset

Return type: TypeVar(TDataset, bound= Dataset)

classmethod create_from_in_memory_data(cls, data_frame=None, records=None, categories=None, read_timeout=600, max_wait=600, fname=None, *, use_cases=None)¶

A blocking call that creates a new Dataset from in-memory data. Returns when the dataset has been successfully uploaded and processed.

The data can be either a pandas DataFrame or a list of dictionaries with identical keys.

Parameters

data_frame: DataFrame, optional: The data frame to upload
records: list[dict], optional: A list of dictionaries with identical keys to upload
categories: list[string], optional: An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
read_timeout: int, optional: The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete
max_wait: int, optional: Time in seconds after which dataset creation is considered unsuccessful
fname: string, optional: The file name, “data.csv” by default
use_cases: list[UseCase] | UseCase | list[string] | string, optional: A list of UseCase objects, UseCase object, list of Use Case IDs or a single Use Case ID to add this new dataset to. Must be a kwarg.

Returns

response: Dataset: The Dataset created from the uploaded data.

Raises

InvalidUsageError: If neither a DataFrame or list of records is passed.

Return type: TypeVar(TDataset, bound= Dataset)

classmethod create_from_url(cls, url, do_snapshot=None, persist_data_after_ingestion=None, categories=None, max_wait=600, *, use_cases=None)¶

A blocking call that creates a new Dataset from data stored at a url. Returns when the dataset has been successfully uploaded and processed.

Parameters

url: string: The URL to use as the source of data for the dataset being created.
do_snapshot: bool, optional: If unset, uses the server default: True. If true, creates a snapshot dataset; if false, creates a remote dataset. Creating snapshots from non-file sources may be disabled by the permission, Disable AI Catalog Snapshots.
persist_data_after_ingestion: bool, optional: If unset, uses the server default: True. If true, will enforce saving all data (for download and sampling) and will allow a user to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.). If false, will not enforce saving data. The data schema (feature names and types) still will be available. Specifying this parameter to false and doSnapshot to true will result in an error.
categories: list[string], optional: An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
max_wait: int, optional: Time in seconds after which dataset creation is considered unsuccessful.
use_cases: list[UseCase] | UseCase | list[string] | string, optional: A list of UseCase objects, UseCase object, list of Use Case IDs or a single Use Case ID to add this new dataset to. Must be a kwarg.

Returns

response: Dataset: The Dataset created from the uploaded data

Return type: TypeVar(TDataset, bound= Dataset)

classmethod create_from_datastage(cls, datastage_id, categories=None, max_wait=600, *, use_cases=None)¶

A blocking call that creates a new Dataset from data stored as a DataStage. Returns when the dataset has been successfully uploaded and processed.

Parameters

datastage_id: string: The ID of the DataStage to use as the source of data for the dataset being created.
categories: list[string], optional: An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
max_wait: int, optional: Time in seconds after which dataset creation is considered unsuccessful.

Returns

response: Dataset: The Dataset created from the uploaded data

Return type: TypeVar(TDataset, bound= Dataset)

classmethod create_from_data_source(cls, data_source_id, username=None, password=None, do_snapshot=None, persist_data_after_ingestion=None, categories=None, credential_id=None, use_kerberos=None, credential_data=None, max_wait=600, *, use_cases=None)¶

A blocking call that creates a new Dataset from data stored at a DataSource. Returns when the dataset has been successfully uploaded and processed.

New in version v2.22.

Parameters

data_source_id: string: The ID of the DataSource to use as the source of data.
username: string, optional: The username for database authentication.
password: string, optional: The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored.
do_snapshot: bool, optional: If unset, uses the server default: True. If true, creates a snapshot dataset; if false, creates a remote dataset. Creating snapshots from non-file sources requires may be disabled by the permission, Disable AI Catalog Snapshots.
persist_data_after_ingestion: bool, optional: If unset, uses the server default: True. If true, will enforce saving all data (for download and sampling) and will allow a user to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.). If false, will not enforce saving data. The data schema (feature names and types) still will be available. Specifying this parameter to false and doSnapshot to true will result in an error.
categories: list[string], optional: An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
credential_id: string, optional: The ID of the set of credentials to use instead of user and password. Note that with this change, username and password will become optional.
use_kerberos: bool, optional: If unset, uses the server default: False. If true, use kerberos authentication for database authentication.
credential_data: dict, optional: The credentials to authenticate with the database, to use instead of user/password or credential ID.
max_wait: int, optional: Time in seconds after which project creation is considered unsuccessful.
use_cases: list[UseCase] | UseCase | list[string] | string, optional: A list of UseCase objects, UseCase object, list of Use Case IDs or a single Use Case ID to add this new dataset to. Must be a kwarg.

Returns

response: Dataset: The Dataset created from the uploaded data

Return type: TypeVar(TDataset, bound= Dataset)

classmethod create_from_query_generator(cls, generator_id, dataset_id=None, dataset_version_id=None, max_wait=600, *, use_cases=None)¶

A blocking call that creates a new Dataset from the query generator. Returns when the dataset has been successfully processed. If optional parameters are not specified the query is applied to the dataset_id and dataset_version_id stored in the query generator. If specified they will override the stored dataset_id/dataset_version_id, e.g. to prep a prediction dataset.

Parameters

generator_id: str: The id of the query generator to use.
dataset_id: str, optional: The id of the dataset to apply the query to.
dataset_version_id: str, optional: The id of the dataset version to apply the query to. If not specified the latest version associated with dataset_id (if specified) is used.
max_waitint: optional, the maximum number of seconds to wait before giving up.
use_cases: list[UseCase] | UseCase | list[string] | string, optional: A list of UseCase objects, UseCase object, list of Use Case IDs or a single Use Case ID to add this new dataset to. Must be a kwarg.

Returns

response: Dataset: The Dataset created from the query generator

Return type: TypeVar(TDataset, bound= Dataset)

classmethod get(dataset_id)¶

Get information about a dataset.

Parameters

dataset_idstring: the id of the dataset

Returns

datasetDataset: the queried dataset

Return type: TypeVar(TDataset, bound= Dataset)

classmethod delete(dataset_id)¶

Soft deletes a dataset. You cannot get it or list it or do actions with it, except for un-deleting it.

Parameters

dataset_id: string: The id of the dataset to mark for deletion

Returns

None

Return type: None

classmethod un_delete(dataset_id)¶

Un-deletes a previously deleted dataset. If the dataset was not deleted, nothing happens.

Parameters

dataset_id: string: The id of the dataset to un-delete

Returns

None

Return type: None

classmethod list(category=None, filter_failed=None, order_by=None, use_cases=None)¶

List all datasets a user can view.

Parameters

category: string, optional: Optional. If specified, only dataset versions that have the specified category will be included in the results. Categories identify the intended use of the dataset; supported categories are “TRAINING” and “PREDICTION”.
filter_failed: bool, optional: If unset, uses the server default: False. Whether datasets that failed during import should be excluded from the results. If True invalid datasets will be excluded.
order_by: string, optional: If unset, uses the server default: “-created”. Sorting order which will be applied to catalog list, valid options are: - “created” – ascending order by creation datetime; - “-created” – descending order by creation datetime.
use_cases: Union[UseCase, List[UseCase], str, List[str]], optional: Filter available datasets by a specific Use Case or Cases. Accepts either the entity or the ID. If set to [None], the method filters the project’s datasets by those not linked to a UseCase.

Returns

list[Dataset]: a list of datasets the user can view

Return type: List[TypeVar(TDataset, bound= Dataset)]

classmethod iterate(offset=None, limit=None, category=None, order_by=None, filter_failed=None, use_cases=None)¶

Get an iterator for the requested datasets a user can view. This lazily retrieves results. It does not get the next page from the server until the current page is exhausted.

Parameters

offset: int, optional: If set, this many results will be skipped
limit: int, optional: Specifies the size of each page retrieved from the server. If unset, uses the server default.
category: string, optional: Optional. If specified, only dataset versions that have the specified category will be included in the results. Categories identify the intended use of the dataset; supported categories are “TRAINING” and “PREDICTION”.
filter_failed: bool, optional: If unset, uses the server default: False. Whether datasets that failed during import should be excluded from the results. If True invalid datasets will be excluded.
order_by: string, optional: If unset, uses the server default: “-created”. Sorting order which will be applied to catalog list, valid options are: - “created” – ascending order by creation datetime; - “-created” – descending order by creation datetime.
use_cases: Union[UseCase, List[UseCase], str, List[str]], optional: Filter available datasets by a specific Use Case or Cases. Accepts either the entity or the ID. If set to [None], the method filters the project’s datasets by those not linked to a UseCase.

Yields

Dataset: An iterator of the datasets the user can view.

Return type: Generator[TypeVar(TDataset, bound= Dataset), None, None]

update()¶

Updates the Dataset attributes in place with the latest information from the server.

Returns

None

Return type: None

modify(name=None, categories=None)¶

Modifies the Dataset name and/or categories. Updates the object in place.

Parameters

name: string, optional: The new name of the dataset
categories: list[string], optional: A list of strings describing the intended use of the dataset. The supported options are “TRAINING” and “PREDICTION”. If any categories were previously specified for the dataset, they will be overwritten. If omitted or None, keep previous categories. To clear them specify []

Returns

None

Return type: None

share(access_list, apply_grant_to_linked_objects=False)¶

Modify the ability of users to access this dataset

Parameters

access_list: list ofclass:SharingAccess <datarobot.SharingAccess>: The modifications to make.
apply_grant_to_linked_objects: bool: If true for any users being granted access to the dataset, grant the user read access to any linked objects such as DataSources and DataStores that may be used by this dataset. Ignored if no such objects are relevant for dataset, defaults to False.

Raises

datarobot.ClientError:: If you do not have permission to share this dataset, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the dataset without an owner.

Examples

Transfer access to the dataset from old_user@datarobot.com to new_user@datarobot.com

from datarobot.enums import SHARING_ROLE
from datarobot.models.dataset import Dataset
from datarobot.models.sharing import SharingAccess

new_access = SharingAccess(
    "[email protected]",
    SHARING_ROLE.OWNER,
    can_share=True,
)
access_list = [
    SharingAccess(
        "[email protected]",
        SHARING_ROLE.OWNER,
        can_share=True,
        can_use_data=True,
    ),
    new_access,
]

Dataset.get('my-dataset-id').share(access_list)

Return type: None

get_details()¶

Gets the details for this Dataset

Returns

DatasetDetails

Return type: DatasetDetails

get_all_features(order_by=None)¶

Get a list of all the features for this dataset.

Parameters

order_by: string, optional: If unset, uses the server default: ‘name’. How the features should be ordered. Can be ‘name’ or ‘featureType’.

Returns

list[DatasetFeature]

Return type: List[DatasetFeature]

iterate_all_features(offset=None, limit=None, order_by=None)¶

Get an iterator for the requested features of a dataset. This lazily retrieves results. It does not get the next page from the server until the current page is exhausted.

Parameters

offset: int, optional: If set, this many results will be skipped.
limit: int, optional: Specifies the size of each page retrieved from the server. If unset, uses the server default.
order_by: string, optional: If unset, uses the server default: ‘name’. How the features should be ordered. Can be ‘name’ or ‘featureType’.

Yields

DatasetFeature

Return type: Generator[DatasetFeature, None, None]

get_featurelists()¶

Get DatasetFeaturelists created on this Dataset

Returns

feature_lists: list[DatasetFeaturelist]

Return type: List[DatasetFeaturelist]

create_featurelist(name, features)¶

Create a new dataset featurelist

Parameters

namestr: the name of the modeling featurelist to create. Names must be unique within the dataset, or the server will return an error.
featureslist of str: the names of the features to include in the dataset featurelist. Each feature must be a dataset feature.

Returns

featurelistDatasetFeaturelist: the newly created featurelist

Examples

dataset = Dataset.get('1234deadbeeffeeddead4321')
dataset_features = dataset.get_all_features()
selected_features = [feat.name for feat in dataset_features][:5]  # select first five
new_flist = dataset.create_featurelist('Simple Features', selected_features)

Return type: DatasetFeaturelist

get_file(file_path=None, filelike=None)¶

Retrieves all the originally uploaded data in CSV form. Writes it to either the file or a filelike object that can write bytes.

Only one of file_path or filelike can be provided and it must be provided as a keyword argument (i.e. file_path=’path-to-write-to’). If a file-like object is provided, the user is responsible for closing it when they are done.

The user must also have permission to download data.

Parameters

file_path: string, optional: The destination to write the file to.
filelike: file, optional: A file-like object to write to. The object must be able to write bytes. The user is responsible for closing the object

Returns

None

Return type: None

get_as_dataframe(low_memory=False)¶

Retrieves all the originally uploaded data in a pandas DataFrame.

New in version v3.0.

Parameters

low_memory: bool, optional: If True, use local files to reduce memory usage which will be slower.

Returns

pd.DataFrame

Return type: DataFrame

get_projects()¶

Retrieves the Dataset’s projects as ProjectLocation named tuples.

Returns

locations: list[ProjectLocation]

Return type: List[ProjectLocation]

create_project(project_name=None, user=None, password=None, credential_id=None, use_kerberos=None, credential_data=None, *, use_cases=None)¶

Create a datarobot.models.Project from this dataset

Parameters

project_name: string, optional: The name of the project to be created. If not specified, will be “Untitled Project” for database connections, otherwise the project name will be based on the file used.
user: string, optional: The username for database authentication.
password: string, optional: The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored
credential_id: string, optional: The ID of the set of credentials to use instead of user and password.
use_kerberos: bool, optional: Server default is False. If true, use kerberos authentication for database authentication.
credential_data: dict, optional: The credentials to authenticate with the database, to use instead of user/password or credential ID.
use_cases: list[UseCase] | UseCase | list[string] | string, optional: A list of UseCase objects, UseCase object, list of Use Case ids or a single Use Case id to add this new Dataset to. Must be a kwarg.

Returns

Project

Return type: Project

classmethod create_version_from_file(dataset_id, file_path=None, filelike=None, categories=None, read_timeout=600, max_wait=600)¶

A blocking call that creates a new Dataset version from a file. Returns when the new dataset version has been successfully uploaded and processed.

Warning: This function does not clean up it’s open files. If you pass a filelike, you are responsible for closing it. If you pass a file_path, this will create a file object from the file_path but will not close it.

New in version v2.23.

Parameters

dataset_id: string: The ID of the dataset for which new version to be created
file_path: string, optional: The path to the file. This will create a file object pointing to that file but will not close it.
filelike: file, optional: An open and readable file object.
categories: list[string], optional: An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
read_timeout: int, optional: The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete
max_wait: int, optional: Time in seconds after which project creation is considered unsuccessful

Returns

response: Dataset: A fully armed and operational Dataset version

Return type: TypeVar(TDataset, bound= Dataset)

classmethod create_version_from_in_memory_data(dataset_id, data_frame=None, records=None, categories=None, read_timeout=600, max_wait=600)¶

A blocking call that creates a new Dataset version for a dataset from in-memory data. Returns when the dataset has been successfully uploaded and processed.

The data can be either a pandas DataFrame or a list of dictionaries with identical keys.

New in version v2.23.

Parameters

dataset_id: string: The ID of the dataset for which new version to be created
data_frame: DataFrame, optional: The data frame to upload
records: list[dict], optional: A list of dictionaries with identical keys to upload
categories: list[string], optional: An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
read_timeout: int, optional: The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete
max_wait: int, optional: Time in seconds after which project creation is considered unsuccessful

Returns

response: Dataset: The Dataset version created from the uploaded data

Raises

InvalidUsageError: If neither a DataFrame or list of records is passed.

Return type: TypeVar(TDataset, bound= Dataset)

classmethod create_version_from_url(dataset_id, url, categories=None, max_wait=600)¶

A blocking call that creates a new Dataset from data stored at a url for a given dataset. Returns when the dataset has been successfully uploaded and processed.

New in version v2.23.

Parameters

dataset_id: string: The ID of the dataset for which new version to be created
url: string: The URL to use as the source of data for the dataset being created.
categories: list[string], optional: An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
max_wait: int, optional: Time in seconds after which project creation is considered unsuccessful

Returns

response: Dataset: The Dataset version created from the uploaded data

Return type: TypeVar(TDataset, bound= Dataset)

classmethod create_version_from_datastage(dataset_id, datastage_id, categories=None, max_wait=600)¶

A blocking call that creates a new Dataset from data stored as a DataStage for a given dataset. Returns when the dataset has been successfully uploaded and processed.

Parameters

dataset_id: string: The ID of the dataset for which new version to be created
datastage_id: string: The ID of the DataStage to use as the source of data for the dataset being created.
categories: list[string], optional: An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
max_wait: int, optional: Time in seconds after which project creation is considered unsuccessful

Returns

response: Dataset: The Dataset version created from the uploaded data

Return type: TypeVar(TDataset, bound= Dataset)

classmethod create_version_from_data_source(dataset_id, data_source_id, username=None, password=None, categories=None, credential_id=None, use_kerberos=None, credential_data=None, max_wait=600)¶

A blocking call that creates a new Dataset from data stored at a DataSource. Returns when the dataset has been successfully uploaded and processed.

New in version v2.23.

Parameters

dataset_id: string: The ID of the dataset for which new version to be created
data_source_id: string: The ID of the DataSource to use as the source of data.
username: string, optional: The username for database authentication.
password: string, optional: The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored.
categories: list[string], optional: An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
credential_id: string, optional: The ID of the set of credentials to use instead of user and password. Note that with this change, username and password will become optional.
use_kerberos: bool, optional: If unset, uses the server default: False. If true, use kerberos authentication for database authentication.
credential_data: dict, optional: The credentials to authenticate with the database, to use instead of user/password or credential ID.
max_wait: int, optional: Time in seconds after which project creation is considered unsuccessful

Returns

response: Dataset: The Dataset version created from the uploaded data

Return type: TypeVar(TDataset, bound= Dataset)

classmethod from_data(data)¶

Instantiate an object of this class using a dict.

Parameters

datadict: Correctly snake_cased keys and their values.

Return type: TypeVar(T, bound= APIObject)

classmethod from_server_data(data, keep_attrs=None)¶

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters

datadict: The directly translated dict of JSON from the server. No casing fixes have taken place
keep_attrsiterable: List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

Return type: TypeVar(T, bound= APIObject)

open_in_browser()¶

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type: None

class datarobot.DatasetDetails(dataset_id, version_id, categories, created_by, created_at, data_source_type, error, is_latest_version, is_snapshot, is_data_engine_eligible, last_modification_date, last_modifier_full_name, name, uri, processing_state, data_persisted=None, data_engine_query_id=None, data_source_id=None, description=None, eda1_modification_date=None, eda1_modifier_full_name=None, feature_count=None, feature_count_by_type=None, row_count=None, size=None, tags=None, recipe_id=None, is_wrangling_eligible=None)¶

Represents a detailed view of a Dataset. The to_dataset method creates a Dataset from this details view.

Attributes

dataset_id: string: The ID of this dataset
name: string: The name of this dataset in the catalog
is_latest_version: bool: Whether this dataset version is the latest version of this dataset
version_id: string: The object ID of the catalog_version the dataset belongs to
categories: list(string): An array of strings describing the intended use of the dataset. The supported options are “TRAINING” and “PREDICTION”.
created_at: string: The date when the dataset was created
created_by: string: Username of the user who created the dataset
is_snapshot: bool: Whether the dataset version is an immutable snapshot of data which has previously been retrieved and saved to Data_robot
data_persisted: bool, optional: If true, user is allowed to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.) and download data. If false, download is not allowed and only the data schema (feature names and types) will be available.
is_data_engine_eligible: bool: Whether this dataset can be a data source of a data engine query.
processing_state: string: Current ingestion process state of the dataset
row_count: int, optional: The number of rows in the dataset.
size: int, optional: The size of the dataset as a CSV in bytes.
data_engine_query_id: string, optional: ID of the source data engine query
data_source_id: string, optional: ID of the datasource used as the source of the dataset
data_source_type: string: the type of the datasource that was used as the source of the dataset
description: string, optional: the description of the dataset
eda1_modification_date: string, optional: the ISO 8601 formatted date and time when the EDA1 for the dataset was updated
eda1_modifier_full_name: string, optional: the user who was the last to update EDA1 for the dataset
error: string: details of exception raised during ingestion process, if any
feature_count: int, optional: total number of features in the dataset
feature_count_by_type: list[FeatureTypeCount]: number of features in the dataset grouped by feature type
last_modification_date: string: the ISO 8601 formatted date and time when the dataset was last modified
last_modifier_full_name: string: full name of user who was the last to modify the dataset
tags: list[string]: list of tags attached to the item
uri: string: the uri to datasource like: - ‘file_name.csv’ - ‘jdbc:DATA_SOURCE_GIVEN_NAME/SCHEMA.TABLE_NAME’ - ‘jdbc:DATA_SOURCE_GIVEN_NAME/<query>’ - for query based datasources - ‘https://s3.amazonaws.com/datarobot_test/kickcars-sample-200.csv’ - etc.

classmethod get(dataset_id)¶

Get details for a Dataset from the server

Parameters

dataset_id: str: The id for the Dataset from which to get details

Returns

DatasetDetails

Return type: TypeVar(TDatasetDetails, bound= DatasetDetails)

to_dataset()¶

Build a Dataset object from the information in this object

Returns

Dataset

Return type: Dataset

class datarobot.models.dataset.ProjectLocation(url, id)¶

property id¶: Alias for field number 1

property url¶: Alias for field number 0

Data Engine Query Generator¶

class datarobot.DataEngineQueryGenerator(**generator_kwargs)¶

DataEngineQueryGenerator is used to set up time series data prep.

New in version v2.27.

Attributes

id: str: id of the query generator
query: str: text of the generated Spark SQL query
datasets: list(QueryGeneratorDataset): datasets associated with the query generator
generator_settings: QueryGeneratorSettings: the settings used to define the query
generator_type: str: “TimeSeries” is the only supported type

classmethod create(generator_type, datasets, generator_settings)¶

Creates a query generator entity.

New in version v2.27.

Parameters

generator_typestr: Type of data engine query generator
datasetsList[QueryGeneratorDataset]: Source datasets in the Data Engine workspace.
generator_settingsdict: Data engine generator settings of the given generator_type.

Returns

query_generatorDataEngineQueryGenerator: The created generator

Examples

import datarobot as dr
from datarobot.models.data_engine_query_generator import (
   QueryGeneratorDataset,
   QueryGeneratorSettings,
)
dataset = QueryGeneratorDataset(
   alias='My_Awesome_Dataset_csv',
   dataset_id='61093144cabd630828bca321',
   dataset_version_id=1,
)
settings = QueryGeneratorSettings(
   datetime_partition_column='date',
   time_unit='DAY',
   time_step=1,
   default_numeric_aggregation_method='sum',
   default_categorical_aggregation_method='mostFrequent',
)
g = dr.DataEngineQueryGenerator.create(
   generator_type='TimeSeries',
   datasets=[dataset],
   generator_settings=settings,
)
g.id
>>>'54e639a18bd88f08078ca831'
g.generator_type
>>>'TimeSeries'

classmethod get(generator_id)¶

Gets information about a query generator.

Parameters

generator_idstr: The identifier of the query generator you want to load.

Returns

query_generatorDataEngineQueryGenerator: The queried generator

Examples

import datarobot as dr
g = dr.DataEngineQueryGenerator.get(generator_id='54e639a18bd88f08078ca831')
g.id
>>>'54e639a18bd88f08078ca831'
g.generator_type
>>>'TimeSeries'

create_dataset(dataset_id=None, dataset_version_id=None, max_wait=600)¶

A blocking call that creates a new Dataset from the query generator. Returns when the dataset has been successfully processed. If optional parameters are not specified the query is applied to the dataset_id and dataset_version_id stored in the query generator. If specified they will override the stored dataset_id/dataset_version_id, i.e. to prep a prediction dataset.

Parameters

dataset_id: str, optional: The id of the unprepped dataset to apply the query to
dataset_version_id: str, optional: The version_id of the unprepped dataset to apply the query to

Returns

response: Dataset: The Dataset created from the query generator

prepare_prediction_dataset_from_catalog(project_id, dataset_id, dataset_version_id=None, max_wait=600, relax_known_in_advance_features_check=None)¶

Apply time series data prep to a catalog dataset and upload it to the project as a PredictionDataset.

New in version v3.1.

Parameters

project_idstr: The id of the project to which you upload the prediction dataset.
dataset_idstr: The identifier of the dataset.
dataset_version_idstr, optional: The version id of the dataset to use.
max_waitint, optional: Optional, the maximum number of seconds to wait before giving up.
relax_known_in_advance_features_checkbool, optional: For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

Returns

datasetPredictionDataset: The newly uploaded dataset.

Return type: PredictionDataset

prepare_prediction_dataset(sourcedata, project_id, max_wait=600, relax_known_in_advance_features_check=None)¶

Apply time series data prep and upload the PredictionDataset to the project.

New in version v3.1.

Parameters

sourcedatastr, file or pandas.DataFrame: Data to be used for predictions. If it is a string, it can be either a path to a local file, or raw file content. If using a file on disk, the filename must consist of ASCII characters only.
project_idstr: The id of the project to which you upload the prediction dataset.
max_waitint, optional: The maximum number of seconds to wait for the uploaded dataset to be processed before raising an error.
relax_known_in_advance_features_checkbool, optional: For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
Returns
——-
datasetPredictionDataset: The newly uploaded dataset.

Raises

InputNotUnderstoodError: Raised if sourcedata isn’t one of supported types.
AsyncFailureError: Raised if polling for the status of an async process resulted in a response with an unsupported status code.
AsyncProcessUnsuccessfulError: Raised if project creation was unsuccessful (i.e. the server reported an error in uploading the dataset).
AsyncTimeoutError: Raised if processing the uploaded dataset took more time than specified by the max_wait parameter.

Return type: PredictionDataset

Data Store¶

class datarobot.models.data_store.TestResponse() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶

class datarobot.models.data_store.SchemasResponse() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶

class datarobot.models.data_store.TablesResponse() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶

Datetime Trend Plots¶

class datarobot.models.datetime_trend_plots.AccuracyOverTimePlotsMetadata(project_id, model_id, forecast_distance, resolutions, backtest_metadata, holdout_metadata, backtest_statuses, holdout_statuses)¶

Accuracy over Time metadata for datetime model.

New in version v2.25.

Notes

Backtest/holdout status is a dict containing the following:

training: string
Status backtest/holdout training. One of datarobot.enums.DATETIME_TREND_PLOTS_STATUS
validation: string
Status backtest/holdout validation. One of datarobot.enums.DATETIME_TREND_PLOTS_STATUS

Backtest/holdout metadata is a dict containing the following:

training: dict
Start and end dates for the backtest/holdout training.
validation: dict
Start and end dates for the backtest/holdout validation.

Each dict in the training and validation in backtest/holdout metadata is structured like:

start_date: datetime.datetime or None
The datetime of the start of the chart data (inclusive). None if chart data is not computed.
end_date: datetime.datetime or None
The datetime of the end of the chart data (exclusive). None if chart data is not computed.

Attributes

project_id: string: The project ID.
model_id: string: The model ID.
forecast_distance: int or None: The forecast distance for which the metadata was retrieved. None for OTV projects.
resolutions: list of string: A list of datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION, which represents available time resolutions for which plots can be retrieved.
backtest_metadata: list of dict: List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details.
holdout_metadata: dict: Holdout metadata dict. See backtest/holdout metadata info in Notes for more details.
backtest_statuses: list of dict: List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details.
holdout_statuses: dict: Holdout status dict. See backtest/holdout status info in Notes for more details.

class datarobot.models.datetime_trend_plots.AccuracyOverTimePlot(project_id, model_id, start_date, end_date, resolution, bins, statistics, calendar_events)¶

Accuracy over Time plot for datetime model.

New in version v2.25.

Notes

Bin is a dict containing the following:

start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
actual: float or None
Average actual value of the target in the bin. None if there are no entries in the bin.
predicted: float or None
Average prediction of the model in the bin. None if there are no entries in the bin.
frequency: int or None
Indicates number of values averaged in bin.

Statistics is a dict containing the following:

durbin_watson: float or None
The Durbin-Watson statistic for the chart data. Value is between 0 and 4. Durbin-Watson statistic is a test statistic used to detect the presence of autocorrelation at lag 1 in the residuals (prediction errors) from a regression analysis. More info https://wikipedia.org/wiki/Durbin%E2%80%93Watson_statistic

Calendar event is a dict containing the following:

name: string
Name of the calendar event.
date: datetime
Date of the calendar event.
series_id: string or None
The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.

Attributes

project_id: string: The project ID.
model_id: string: The model ID.
resolution: string: The resolution that is used for binning. One of datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
start_date: datetime.datetime: The datetime of the start of the chart data (inclusive).
end_date: datetime.datetime: The datetime of the end of the chart data (exclusive).
bins: list of dict: List of plot bins. See bin info in Notes for more details.
statistics: dict: Statistics for plot. See statistics info in Notes for more details.
calendar_events: list of dict: List of calendar events for the plot. See calendar events info in Notes for more details.

class datarobot.models.datetime_trend_plots.AccuracyOverTimePlotPreview(project_id, model_id, start_date, end_date, bins)¶

Accuracy over Time plot preview for datetime model.

New in version v2.25.

Notes

Bin is a dict containing the following:

start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
actual: float or None
Average actual value of the target in the bin. None if there are no entries in the bin.
predicted: float or None
Average prediction of the model in the bin. None if there are no entries in the bin.

Attributes

project_id: string: The project ID.
model_id: string: The model ID.
start_date: datetime.datetime: The datetime of the start of the chart data (inclusive).
end_date: datetime.datetime: The datetime of the end of the chart data (exclusive).
bins: list of dict: List of plot bins. See bin info in Notes for more details.

class datarobot.models.datetime_trend_plots.ForecastVsActualPlotsMetadata(project_id, model_id, resolutions, backtest_metadata, holdout_metadata, backtest_statuses, holdout_statuses)¶

Forecast vs Actual plots metadata for datetime model.

New in version v2.25.

Notes

Backtest/holdout status is a dict containing the following:

training: dict
Dict containing each of datarobot.enums.DATETIME_TREND_PLOTS_STATUS as dict key, and list of forecast distances for particular status as dict value.
validation: dict
Dict containing each of datarobot.enums.DATETIME_TREND_PLOTS_STATUS as dict key, and list of forecast distances for particular status as dict value.

Backtest/holdout metadata is a dict containing the following:

training: dict
Start and end dates for the backtest/holdout training.
validation: dict
Start and end dates for the backtest/holdout validation.

Each dict in the training and validation in backtest/holdout metadata is structured like:

start_date: datetime.datetime or None
The datetime of the start of the chart data (inclusive). None if chart data is not computed.
end_date: datetime.datetime or None
The datetime of the end of the chart data (exclusive). None if chart data is not computed.

Attributes

project_id: string: The project ID.
model_id: string: The model ID.
resolutions: list of string: A list of datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION, which represents available time resolutions for which plots can be retrieved.
backtest_metadata: list of dict: List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details.
holdout_metadata: dict: Holdout metadata dict. See backtest/holdout metadata info in Notes for more details.
backtest_statuses: list of dict: List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details.
holdout_statuses: dict: Holdout status dict. See backtest/holdout status info in Notes for more details.

class datarobot.models.datetime_trend_plots.ForecastVsActualPlot(project_id, model_id, forecast_distances, start_date, end_date, resolution, bins, calendar_events)¶

Forecast vs Actual plot for datetime model.

New in version v2.25.

Notes

Bin is a dict containing the following:

start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
actual: float or None
Average actual value of the target in the bin. None if there are no entries in the bin.
forecasts: list of float
A list of average forecasts for the model for each forecast distance. Empty if there are no forecasts in the bin. Each index in the forecasts list maps to forecastDistances list index.
error: float or None
Average absolute residual value of the bin. None if there are no entries in the bin.
normalized_error: float or None
Normalized average absolute residual value of the bin. None if there are no entries in the bin.
frequency: int or None
Indicates number of values averaged in bin.

Calendar event is a dict containing the following:

name: string
Name of the calendar event.
date: datetime
Date of the calendar event.
series_id: string or None
The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.

Attributes

project_id: string: The project ID.
model_id: string: The model ID.
forecast_distances: list of int: A list of forecast distances that were retrieved.
resolution: string: The resolution that is used for binning. One of datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
start_date: datetime.datetime: The datetime of the start of the chart data (inclusive).
end_date: datetime.datetime: The datetime of the end of the chart data (exclusive).
bins: list of dict: List of plot bins. See bin info in Notes for more details.
calendar_events: list of dict: List of calendar events for the plot. See calendar events info in Notes for more details.

class datarobot.models.datetime_trend_plots.ForecastVsActualPlotPreview(project_id, model_id, start_date, end_date, bins)¶

Forecast vs Actual plot preview for datetime model.

New in version v2.25.

Notes

Bin is a dict containing the following:

start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
actual: float or None
Average actual value of the target in the bin. None if there are no entries in the bin.
predicted: float or None
Average prediction of the model in the bin. None if there are no entries in the bin.

Attributes

project_id: string: The project ID.
model_id: string: The model ID.
start_date: datetime.datetime: The datetime of the start of the chart data (inclusive).
end_date: datetime.datetime: The datetime of the end of the chart data (exclusive).
bins: list of dict: List of plot bins. See bin info in Notes for more details.

class datarobot.models.datetime_trend_plots.AnomalyOverTimePlotsMetadata(project_id, model_id, resolutions, backtest_metadata, holdout_metadata, backtest_statuses, holdout_statuses)¶

Anomaly over Time metadata for datetime model.

New in version v2.25.

Notes

Backtest/holdout status is a dict containing the following:

training: string
Status backtest/holdout training. One of datarobot.enums.DATETIME_TREND_PLOTS_STATUS
validation: string
Status backtest/holdout validation. One of datarobot.enums.DATETIME_TREND_PLOTS_STATUS

Backtest/holdout metadata is a dict containing the following:

training: dict
Start and end dates for the backtest/holdout training.
validation: dict
Start and end dates for the backtest/holdout validation.

Each dict in the training and validation in backtest/holdout metadata is structured like:

start_date: datetime.datetime or None
The datetime of the start of the chart data (inclusive). None if chart data is not computed.
end_date: datetime.datetime or None
The datetime of the end of the chart data (exclusive). None if chart data is not computed.

Attributes

project_id: string: The project ID.
model_id: string: The model ID.
resolutions: list of string: A list of datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION, which represents available time resolutions for which plots can be retrieved.
backtest_metadata: list of dict: List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details.
holdout_metadata: dict: Holdout metadata dict. See backtest/holdout metadata info in Notes for more details.
backtest_statuses: list of dict: List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details.
holdout_statuses: dict: Holdout status dict. See backtest/holdout status info in Notes for more details.

class datarobot.models.datetime_trend_plots.AnomalyOverTimePlot(project_id, model_id, start_date, end_date, resolution, bins, calendar_events)¶

Anomaly over Time plot for datetime model.

New in version v2.25.

Notes

Bin is a dict containing the following:

start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
predicted: float or None
Average prediction of the model in the bin. None if there are no entries in the bin.
frequency: int or None
Indicates number of values averaged in bin.

Calendar event is a dict containing the following:

name: string
Name of the calendar event.
date: datetime
Date of the calendar event.
series_id: string or None
The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.

Attributes

project_id: string: The project ID.
model_id: string: The model ID.
resolution: string: The resolution that is used for binning. One of datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
start_date: datetime.datetime: The datetime of the start of the chart data (inclusive).
end_date: datetime.datetime: The datetime of the end of the chart data (exclusive).
bins: list of dict: List of plot bins. See bin info in Notes for more details.
calendar_events: list of dict: List of calendar events for the plot. See calendar events info in Notes for more details.

class datarobot.models.datetime_trend_plots.AnomalyOverTimePlotPreview(project_id, model_id, prediction_threshold, start_date, end_date, bins)¶

Anomaly over Time plot preview for datetime model.

New in version v2.25.

Notes

Bin is a dict containing the following:

start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
end_date: datetime.datetime
The datetime of the end of the bin (exclusive).

Attributes

project_id: string: The project ID.
model_id: string: The model ID.
prediction_threshold: float: Only bins with predictions exceeding this threshold are returned in the response.
start_date: datetime.datetime: The datetime of the start of the chart data (inclusive).
end_date: datetime.datetime: The datetime of the end of the chart data (exclusive).
bins: list of dict: List of plot bins. See bin info in Notes for more details.

Deployment¶

class datarobot.models.Deployment(id, label=None, description=None, status=None, default_prediction_server=None, model=None, capabilities=None, prediction_usage=None, permissions=None, service_health=None, model_health=None, accuracy_health=None, importance=None, fairness_health=None, governance=None, owners=None, prediction_environment=None)¶

A deployment created from a DataRobot model.

Attributes

idstr

the id of the deployment

labelstr

the label of the deployment

descriptionstr

the description of the deployment

statusstr

(New in version v2.29) deployment status

default_prediction_serverdict

Information about the default prediction server for the deployment. Accepts the following values:

id: str. Prediction server ID.
url: str, optional. Prediction server URL.
datarobot-key: str. Corresponds the to the PredictionServer’s “snake_cased” datarobot_key parameter that allows you to verify and access the prediction server.

importancestr, optional

deployment importance

modeldict

information on the model of the deployment

capabilitiesdict

information on the capabilities of the deployment

prediction_usagedict

information on the prediction usage of the deployment

permissionslist

(New in version v2.18) user’s permissions on the deployment

service_healthdict

information on the service health of the deployment

model_healthdict

information on the model health of the deployment

accuracy_healthdict

information on the accuracy health of the deployment

fairness_healthdict

information on the fairness health of a deployment

governancedict

information on approval and change requests of a deployment

ownersdict

information on the owners of a deployment

prediction_environmentdict

information on the prediction environment of a deployment

classmethod create_from_learning_model(model_id, label, description=None, default_prediction_server_id=None, importance=None, prediction_threshold=None, status=None, max_wait=600)¶

Create a deployment from a DataRobot model.

New in version v2.17.

Parameters

model_idstr: id of the DataRobot model to deploy
labelstr: a human-readable label of the deployment
descriptionstr, optional: a human-readable description of the deployment
default_prediction_server_idstr, optional: an identifier of a prediction server to be used as the default prediction server
importancestr, optional: deployment importance
prediction_thresholdfloat, optional: threshold used for binary classification in predictions
statusstr, optional: deployment status
max_wait: int, optional: Seconds to wait for successful resolution of a deployment creation job. Deployment supports making predictions only after a deployment creating job has successfully finished.

Returns

deploymentDeployment: The created deployment

Examples

from datarobot import Project, Deployment
project = Project.get('5506fcd38bd88f5953219da0')
model = project.get_models()[0]
deployment = Deployment.create_from_learning_model(model.id, 'New Deployment')
deployment
>>> Deployment('New Deployment')

Return type: TypeVar(TDeployment, bound= Deployment)

classmethod create_from_leaderboard(model_id, label, description=None, default_prediction_server_id=None, importance=None, prediction_threshold=None, status=None, max_wait=600)¶

Create a deployment from a Leaderboard.

New in version v2.17.

Parameters

model_idstr: id of the Leaderboard to deploy
labelstr: a human-readable label of the deployment
descriptionstr, optional: a human-readable description of the deployment
default_prediction_server_idstr, optional: an identifier of a prediction server to be used as the default prediction server
importancestr, optional: deployment importance
prediction_thresholdfloat, optional: threshold used for binary classification in predictions
statusstr, optional: deployment status
max_waitint, optional: The amount of seconds to wait for successful resolution of a deployment creation job. Deployment supports making predictions only after a deployment creating job has successfully finished.

Returns

deploymentDeployment: The created deployment

Examples

from datarobot import Project, Deployment
project = Project.get('5506fcd38bd88f5953219da0')
model = project.get_models()[0]
deployment = Deployment.create_from_leaderboard(model.id, 'New Deployment')
deployment
>>> Deployment('New Deployment')

Return type: TypeVar(TDeployment, bound= Deployment)

classmethod create_from_custom_model_version(custom_model_version_id, label, description=None, default_prediction_server_id=None, max_wait=600, importance=None)¶

Create a deployment from a DataRobot custom model image.

Parameters

custom_model_version_idstr: The ID of the DataRobot custom model version to deploy. The version must have a base_environment_id.
labelstr: A label of the deployment.
descriptionstr, optional: A description of the deployment.
default_prediction_server_idstr: An identifier of a prediction server to be used as the default prediction server. Required for SaaS users and optional for Self-Managed users.
max_waitint, optional: Seconds to wait for successful resolution of a deployment creation job. Deployment supports making predictions only after a deployment creating job has successfully finished.
importancestr, optional: Deployment importance level.

Returns

deploymentDeployment: The created deployment

Return type: TypeVar(TDeployment, bound= Deployment)

classmethod create_from_registered_model_version(model_package_id, label, description=None, default_prediction_server_id=None, prediction_environment_id=None, importance=None, user_provided_id=None, additional_metadata=None, max_wait=600)¶

Create a deployment from a DataRobot model package.

Parameters

model_package_idstr: The ID of the DataRobot model package to deploy.
labelstr: A human readable label of the deployment.
descriptionstr, optional: A human readable description of the deployment.
default_prediction_server_idstr, optional: an identifier of a prediction server to be used as the default prediction server When working with prediction environments, default prediction server Id should not be provided
prediction_environment_idstr, optional: An identifier of a prediction environment to be used for model deployment.
importancestr, optional: Deployment importance level.
user_provided_idstr, optional: A user-provided unique ID associated with a deployment definition in a remote git repository.
additional_metadatadict, optional: ‘Key/Value pair dict, with additional metadata’
max_waitint, optional: The amount of seconds to wait for successful resolution of a deployment creation job. Deployment supports making predictions only after a deployment creating job has successfully finished.

Returns

deploymentDeployment: The created deployment

Return type: TypeVar(TDeployment, bound= Deployment)

classmethod list(order_by=None, search=None, filters=None)¶

List all deployments a user can view.

New in version v2.17.

Parameters

order_bystr, optional

(New in version v2.18) the order to sort the deployment list by, defaults to label

Allowed attributes to sort by are:

label
serviceHealth
modelHealth
accuracyHealth
recentPredictions
lastPredictionTimestamp

If the sort attribute is preceded by a hyphen, deployments will be sorted in descending order, otherwise in ascending order.

For health related sorting, ascending means failing, warning, passing, unknown.

searchstr, optional

(New in version v2.18) case insensitive search against deployment’s label and description.

filtersdatarobot.models.deployment.DeploymentListFilters, optional

(New in version v2.20) an object containing all filters that you’d like to apply to the resulting list of deployments. See DeploymentListFilters for details on usage.

Returns

deploymentslist: a list of deployments the user can view

Examples

from datarobot import Deployment
deployments = Deployment.list()
deployments
>>> [Deployment('New Deployment'), Deployment('Previous Deployment')]

from datarobot import Deployment
from datarobot.enums import DEPLOYMENT_SERVICE_HEALTH_STATUS
filters = DeploymentListFilters(
    role='OWNER',
    service_health=[DEPLOYMENT_SERVICE_HEALTH.FAILING]
)
filtered_deployments = Deployment.list(filters=filters)
filtered_deployments
>>> [Deployment('Deployment I Own w/ Failing Service Health')]

Return type: List[TypeVar(TDeployment, bound= Deployment)]

classmethod get(deployment_id)¶

Get information about a deployment.

New in version v2.17.

Parameters

deployment_idstr: the id of the deployment

Returns

deploymentDeployment: the queried deployment

Examples

from datarobot import Deployment
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
deployment.id
>>>'5c939e08962d741e34f609f0'
deployment.label
>>>'New Deployment'

Return type: TypeVar(TDeployment, bound= Deployment)

predict_batch(source, passthrough_columns=None, download_timeout=None, download_read_timeout=None, upload_read_timeout=None)¶

A convenience method for making predictions with csv file or pandas DataFrame using a batch prediction job.

For advanced usage, use datarobot.models.BatchPredictionJob directly.

New in version v3.0.

Parameters

source: str, pd.DataFrame or file object: Pass a filepath, file, or DataFrame for making batch predictions.
passthrough_columnslist[string] (optional): Keep these columns from the scoring dataset in the scored dataset. This is useful for correlating predictions with source data.
download_timeout: int, optional: Wait this many seconds for the download to become available. See datarobot.models.BatchPredictionJob.score().
download_read_timeout: int, optional: Wait this many seconds for the server to respond between chunks. See datarobot.models.BatchPredictionJob.score().
upload_read_timeout: int, optional: Wait this many seconds for the server to respond after a whole dataset upload. See datarobot.models.BatchPredictionJob.score().

Returns

pd.DataFrame: Prediction results in a pandas DataFrame.

Raises

InvalidUsageError: If the source parameter cannot be determined to be a filepath, file, or DataFrame.

Examples

from datarobot.models.deployment import Deployment

deployment = Deployment.get("<MY_DEPLOYMENT_ID>")
prediction_results_as_dataframe = deployment.predict_batch(
    source="./my_local_file.csv",
)

Return type: DataFrame

get_uri()¶

Returns

urlstr: Deployment’s overview URI

Return type: str

update(label=None, description=None, importance=None)¶

Update the label and description of this deployment.

New in version v2.19.

Return type: None

delete()¶

Delete this deployment.

New in version v2.17.

Return type: None

activate(max_wait=600)¶

Activates this deployment. When succeeded, deployment status become active.

New in version v2.29.

Parameters

max_waitint, optional: The maximum time to wait for deployment activation to complete before erroring

Return type: None

deactivate(max_wait=600)¶

Deactivates this deployment. When succeeded, deployment status become inactive.

New in version v2.29.

Parameters

max_waitint, optional: The maximum time to wait for deployment deactivation to complete before erroring

Return type: None

replace_model(new_model_id, reason, max_wait=600)¶

Replace the model used in this deployment. To confirm model replacement eligibility, use: validate_replacement_model() beforehand.

New in version v2.17.

Model replacement is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.

Predictions made against this deployment will start using the new model as soon as the request is completed. There will be no interruption for predictions throughout the process.

Parameters

new_model_idstr: The id of the new model to use. If replacing the deployment’s model with a CustomInferenceModel, a specific CustomModelVersion ID must be used.
reasonMODEL_REPLACEMENT_REASON: The reason for the model replacement. Must be one of ‘ACCURACY’, ‘DATA_DRIFT’, ‘ERRORS’, ‘SCHEDULED_REFRESH’, ‘SCORING_SPEED’, or ‘OTHER’. This value will be stored in the model history to keep track of why a model was replaced
max_waitint, optional: (new in version 2.22) The maximum time to wait for model replacement job to complete before erroring

Examples

from datarobot import Deployment
from datarobot.enums import MODEL_REPLACEMENT_REASON
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
deployment.model['id'], deployment.model['type']
>>>('5c0a979859b00004ba52e431', 'Decision Tree Classifier (Gini)')

deployment.replace_model('5c0a969859b00004ba52e41b', MODEL_REPLACEMENT_REASON.ACCURACY)
deployment.model['id'], deployment.model['type']
>>>('5c0a969859b00004ba52e41b', 'Support Vector Classifier (Linear Kernel)')

Return type: None

validate_replacement_model(new_model_id)¶

Validate a model can be used as the replacement model of the deployment.

New in version v2.17.

Parameters

new_model_idstr: the id of the new model to validate

Returns

statusstr: status of the validation, will be one of ‘passing’, ‘warning’ or ‘failing’. If the status is passing or warning, use replace_model() to perform a model replacement. If the status is failing, refer to checks for more detail on why the new model cannot be used as a replacement.
messagestr: message for the validation result
checksdict: explain why the new model can or cannot replace the deployment’s current model

Return type: Tuple[str, str, Dict[str, Any]]

get_features()¶

Retrieve the list of features needed to make predictions on this deployment.

Returns

features: list: a list of feature dict

Notes

Each feature dict contains the following structure:

name : str, feature name
feature_type : str, feature type
importance : float, numeric measure of the relationship strength between the feature and target (independent of model or other features)
date_format : str or None, the date format string for how this feature was interpreted, null if not a date feature, compatible with https://docs.python.org/2/library/time.html#time.strftime.
known_in_advance : bool, whether the feature was selected as known in advance in a time series model, false for non-time series models.

Examples

from datarobot import Deployment
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
features = deployment.get_features()
features[0]['feature_type']
>>>'Categorical'
features[0]['importance']
>>>0.133

Return type: List[FeatureDict]

submit_actuals(data, batch_size=10000)¶

Submit actuals for processing. The actuals submitted will be used to calculate accuracy metrics.

Parameters

data: list or pandas.DataFrame
batch_size: the max number of actuals in each request
If `data` is a list, each item should be a dict-like object with the following keys and
values; if `data` is a pandas.DataFrame, it should contain the following columns:
- association_id: str, a unique identifier used with a prediction,: max length 128 characters
- actual_value: str or int or float, the actual value of a prediction;: should be numeric for deployments with regression models or string for deployments with classification model
- was_acted_on: bool, optional, indicates if the prediction was acted on in a way that: could have affected the actual outcome
- timestamp: datetime or string in RFC3339 format, optional. If the datetime provided: does not have a timezone, we assume it is UTC.

Raises

ValueError: if input data is not a list of dict-like objects or a pandas.DataFrame if input data is empty

Examples

from datarobot import Deployment, AccuracyOverTime
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
data = [{
    'association_id': '439917',
    'actual_value': 'True',
    'was_acted_on': True
}]
deployment.submit_actuals(data)

Return type: None

submit_actuals_from_catalog_async(dataset_id, actual_value_column, association_id_column, dataset_version_id=None, timestamp_column=None, was_acted_on_column=None)¶

Submit actuals from AI Catalog for processing. The actuals submitted will be used to calculate accuracy metrics.

Parameters

dataset_id: str,: The ID of the source dataset.
dataset_version_id: str, optional: The ID of the dataset version to apply the query to. If not specified, the latest version associated with dataset_id is used.
association_id_column: str,: The name of the column that contains a unique identifier used with a prediction.
actual_value_column: str,: The name of the column that contains the actual value of a prediction.
was_acted_on_column: str, optional,: The name of the column that indicates if the prediction was acted on in a way that could have affected the actual outcome.
timestamp_column: str, optional,: The name of the column that contains datetime or string in RFC3339 format.

Returns

status_check_jobStatusCheckJob: Object contains all needed logic for a periodical status check of an async job.

Raises

ValueError: if dataset_id not provided if actual_value_column not provided if association_id_column not provided

Examples

from datarobot import Deployment
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
status_check_job = deployment.submit_actuals_from_catalog_async(data)

Return type: StatusCheckJob

get_predictions_by_forecast_date_settings()¶

Retrieve predictions by forecast date settings of this deployment.

New in version v2.27.

Returns

settingsdict

Predictions by forecast date settings of the deployment is a dict with the following format:

enabledbool: Is ‘’True’’ if predictions by forecast date is enabled for this deployment. To update this setting, see update_predictions_by_forecast_date_settings()
column_namestring: The column name in prediction datasets to be used as forecast date.
datetime_formatstring: The datetime format of the forecast date column in prediction datasets.

Return type: ForecastDateSettings

update_predictions_by_forecast_date_settings(enable_predictions_by_forecast_date, forecast_date_column_name=None, forecast_date_format=None, max_wait=600)¶

Update predictions by forecast date settings of this deployment.

New in version v2.27.

Updating predictions by forecast date setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.

Parameters

enable_predictions_by_forecast_datebool: set to ‘’True’’ if predictions by forecast date is to be turned on or set to ‘’False’’ if predictions by forecast date is to be turned off.
forecast_date_column_name: string, optional: The column name in prediction datasets to be used as forecast date. If ‘’enable_predictions_by_forecast_date’’ is set to ‘’False’’, then the parameter will be ignored.
forecast_date_format: string, optional: The datetime format of the forecast date column in prediction datasets. If ‘’enable_predictions_by_forecast_date’’ is set to ‘’False’’, then the parameter will be ignored.
max_waitint, optional: seconds to wait for successful

Examples

# To set predictions by forecast date settings to the same default settings you see when using
# the DataRobot web application, you use your 'Deployment' object like this:
deployment.update_predictions_by_forecast_date_settings(
   enable_predictions_by_forecast_date=True,
   forecast_date_column_name="date (actual)",
   forecast_date_format="%Y-%m-%d",
)

Return type: None

get_challenger_models_settings()¶

Retrieve challenger models settings of this deployment.

New in version v2.27.

Returns

settingsdict

Challenger models settings of the deployment is a dict with the following format:

enabledbool: Is ‘’True’’ if challenger models is enabled for this deployment. To update existing ‘’challenger_models’’ settings, see update_challenger_models_settings()

Return type: ChallengerModelsSettings

update_challenger_models_settings(challenger_models_enabled, max_wait=600)¶

Update challenger models settings of this deployment.

New in version v2.27.

Updating challenger models setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.

Parameters

challenger_models_enabledbool: set to ‘’True’’ if challenger models is to be turned on or set to ‘’False’’ if challenger models is to be turned off
max_waitint, optional: seconds to wait for successful resolution

Return type: None

get_segment_analysis_settings()¶

Retrieve segment analysis settings of this deployment.

New in version v2.27.

Returns

settingsdict

Segment analysis settings of the deployment containing two items with keys enabled and attributes, which are further described below.

enabledbool: Set to ‘’True’’ if segment analysis is enabled for this deployment. To update existing setting, see update_segment_analysis_settings()
attributeslist: To create or update existing segment analysis attributes, see update_segment_analysis_settings()

Return type: SegmentAnalysisSettings

update_segment_analysis_settings(segment_analysis_enabled, segment_analysis_attributes=None, max_wait=600)¶

Update segment analysis settings of this deployment.

New in version v2.27.

Updating segment analysis setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.

Parameters

segment_analysis_enabledbool: set to ‘’True’’ if segment analysis is to be turned on or set to ‘’False’’ if segment analysis is to be turned off
segment_analysis_attributes: list, optional: A list of strings that gives the segment attributes selected for tracking.
max_waitint, optional: seconds to wait for successful resolution

Return type: None

get_bias_and_fairness_settings()¶

Retrieve bias and fairness settings of this deployment.

..versionadded:: v3.2.0

Returns

settingsdict in the following format:

protected_featuresList[str]: A list of features to mark as protected.
preferable_target_valuebool: A target value that should be treated as a positive outcome for the prediction.
fairness_metric_setstr: Can be one of <datarobot.enums.FairnessMetricsSet>. A set of fairness metrics to use for calculating fairness.
fairness_thresholdfloat: Threshold value of the fairness metric. Cannot be less than 0 or greater than 1.

Return type: Optional[BiasAndFairnessSettings]

update_bias_and_fairness_settings(protected_features, fairness_metric_set, fairness_threshold, preferable_target_value, max_wait=600)¶

Update bias and fairness settings of this deployment.

..versionadded:: v3.2.0

Updating bias and fairness setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.

Parameters

protected_featuresList[str]: A list of features to mark as protected.
preferable_target_valuebool: A target value that should be treated as a positive outcome for the prediction.
fairness_metric_setstr: Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
fairness_thresholdfloat: Threshold value of the fairness metric. Cannot be less than 0 or greater than 1.
max_waitint, optional: seconds to wait for successful resolution

Return type: None

get_drift_tracking_settings()¶

Retrieve drift tracking settings of this deployment.

New in version v2.17.

Returns

settingsdict

Drift tracking settings of the deployment containing two nested dicts with key target_drift and feature_drift, which are further described below.

Target drift setting contains:

enabledbool: If target drift tracking is enabled for this deployment. To create or update existing ‘’target_drift’’ settings, see update_drift_tracking_settings()

Feature drift setting contains:

enabledbool: If feature drift tracking is enabled for this deployment. To create or update existing ‘’feature_drift’’ settings, see update_drift_tracking_settings()

Return type: DriftTrackingSettings

update_drift_tracking_settings(target_drift_enabled=None, feature_drift_enabled=None, max_wait=600)¶

Update drift tracking settings of this deployment.

New in version v2.17.

Updating drift tracking setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.

Parameters

target_drift_enabledbool, optional: if target drift tracking is to be turned on
feature_drift_enabledbool, optional: if feature drift tracking is to be turned on
max_waitint, optional: seconds to wait for successful resolution

Return type: None

get_association_id_settings()¶

Retrieve association ID setting for this deployment.

New in version v2.19.

Returns

association_id_settingsdict in the following format:

column_nameslist[string], optional: name of the columns to be used as association ID,
required_in_prediction_requestsbool, optional: whether the association ID column is required in prediction requests

Return type: str

update_association_id_settings(column_names=None, required_in_prediction_requests=None, max_wait=600)¶

Update association ID setting for this deployment.

New in version v2.19.

Parameters

column_nameslist[string], optional: name of the columns to be used as association ID, currently only support a list of one string
required_in_prediction_requestsbool, optional: whether the association ID column is required in prediction requests
max_waitint, optional: seconds to wait for successful resolution

Return type: None

get_predictions_data_collection_settings()¶

Retrieve predictions data collection settings of this deployment.

New in version v2.21.

Returns

predictions_data_collection_settingsdict in the following format:

enabledbool: If predictions data collection is enabled for this deployment. To update existing ‘’predictions_data_collection’’ settings, see update_predictions_data_collection_settings()

Return type: Dict[str, bool]

update_predictions_data_collection_settings(enabled, max_wait=600)¶

Update predictions data collection settings of this deployment.

New in version v2.21.

Updating predictions data collection setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.

Parameters

enabled: bool: if predictions data collection is to be turned on
max_waitint, optional: seconds to wait for successful resolution

Return type: None

get_prediction_warning_settings()¶

Retrieve prediction warning settings of this deployment.

New in version v2.19.

Returns

settingsdict in the following format:

enabledbool

If target prediction_warning is enabled for this deployment. To create or update existing ‘’prediction_warning’’ settings, see update_prediction_warning_settings()

custom_boundariesdict or None

If None default boundaries for a model are used. Otherwise has following keys:

upperfloat: All predictions greater than provided value are considered anomalous
lowerfloat: All predictions less than provided value are considered anomalous

Return type: PredictionWarningSettings

update_prediction_warning_settings(prediction_warning_enabled, use_default_boundaries=None, lower_boundary=None, upper_boundary=None, max_wait=600)¶

Update prediction warning settings of this deployment.

New in version v2.19.

Parameters

prediction_warning_enabledbool: If prediction warnings should be turned on.
use_default_boundariesbool, optional: If default boundaries of the model should be used for the deployment.
upper_boundaryfloat, optional: All predictions greater than provided value will be considered anomalous
lower_boundaryfloat, optional: All predictions less than provided value will be considered anomalous
max_waitint, optional: seconds to wait for successful resolution

Return type: None

get_prediction_intervals_settings()¶

Retrieve prediction intervals settings for this deployment.

New in version v2.19.

Returns

dict in the following format:

enabledbool: Whether prediction intervals are enabled for this deployment
percentileslist[int]: List of enabled prediction intervals’ sizes for this deployment. Currently we only support one percentile at a time.

Notes

Note that prediction intervals are only supported for time series deployments.

Return type: PredictionIntervalsSettings

update_prediction_intervals_settings(percentiles, enabled=True, max_wait=600)¶

Update prediction intervals settings for this deployment.

New in version v2.19.

Parameters

percentileslist[int]: The prediction intervals percentiles to enable for this deployment. Currently we only support setting one percentile at a time.
enabledbool, optional (defaults to True): Whether to enable showing prediction intervals in the results of predictions requested using this deployment.
max_waitint, optional: seconds to wait for successful resolution

Raises

AssertionError: If percentiles is in an invalid format
AsyncFailureError: If any of the responses from the server are unexpected
AsyncProcessUnsuccessfulError: If the prediction intervals calculation job has failed or has been cancelled.
AsyncTimeoutError: If the prediction intervals calculation job did not resolve in time

Notes

Updating prediction intervals settings is an asynchronous process, which means some preparatory work may be performed before the settings request is completed. This function will not return until all work is fully finished.

Note that prediction intervals are only supported for time series deployments.

Return type: None

get_service_stats(model_id=None, start_time=None, end_time=None, execution_time_quantile=None, response_time_quantile=None, slow_requests_threshold=None)¶

Retrieves values of many service stat metrics aggregated over a time period.

New in version v2.18.

Parameters

model_idstr, optional: the id of the model
start_timedatetime, optional: start of the time period
end_timedatetime, optional: end of the time period
execution_time_quantilefloat, optional: quantile for executionTime, defaults to 0.5
response_time_quantilefloat, optional: quantile for responseTime, defaults to 0.5
slow_requests_thresholdfloat, optional: threshold for slowRequests, defaults to 1000

Returns

service_statsServiceStats: the queried service stats metrics information

Return type: ServiceStats

get_service_stats_over_time(metric=None, model_id=None, start_time=None, end_time=None, bucket_size=None, quantile=None, threshold=None)¶

Retrieves values of a single service stat metric over a time period.

New in version v2.18.

Parameters

metricSERVICE_STAT_METRIC, optional: the service stat metric to retrieve
model_idstr, optional: the id of the model
start_timedatetime, optional: start of the time period
end_timedatetime, optional: end of the time period
bucket_sizestr, optional: time duration of a bucket, in ISO 8601 time duration format
quantilefloat, optional: quantile for ‘executionTime’ or ‘responseTime’, ignored when querying other metrics
thresholdint, optional: threshold for ‘slowQueries’, ignored when querying other metrics

Returns

service_stats_over_timeServiceStatsOverTime: the queried service stats metric over time information

Return type: ServiceStatsOverTime

get_target_drift(model_id=None, start_time=None, end_time=None, metric=None)¶

Retrieve target drift information over a certain time period.

New in version v2.21.

Parameters

model_idstr: the id of the model
start_timedatetime: start of the time period
end_timedatetime: end of the time period
metricstr: (New in version v2.22) metric used to calculate the drift score

Returns

target_driftTargetDrift: the queried target drift information

Return type: TargetDrift

get_feature_drift(model_id=None, start_time=None, end_time=None, metric=None)¶

Retrieve drift information for deployment’s features over a certain time period.

New in version v2.21.

Parameters

model_idstr: the id of the model
start_timedatetime: start of the time period
end_timedatetime: end of the time period
metricstr: (New in version v2.22) The metric used to calculate the drift score. Allowed values include psi, kl_divergence, dissimilarity, hellinger, and js_divergence.

Returns

feature_drift_data[FeatureDrift]: the queried feature drift information

Return type: List[FeatureDrift]

get_predictions_over_time(model_ids=None, start_time=None, end_time=None, bucket_size=None, target_classes=None, include_percentiles=False)¶

Retrieve stats of deployment’s prediction response over a certain time period.

New in version v3.2.

Parameters

model_idslist[str]: ID of models to retrieve prediction stats
start_timedatetime: start of the time period
end_timedatetime: end of the time period
bucket_sizeBUCKET_SIZE: time duration of each bucket
target_classeslist[str]: class names of target, only for deployments with multiclass target
include_percentilesbool: if the returned data includes percentiles, only for a deployment with a binary and regression target

Returns

predictions_over_timePredictionsOverTime: the queried predictions over time information

Examples

from datarobot import Deployment
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
predictions_over_time = deployment.get_predictions_over_time()
predictions_over_time.buckets[0]['mean_predicted_value']
>>>0.3772
predictions_over_time.buckets[0]['row_count']
>>>2000

Return type: PredictionsOverTime

get_accuracy(model_id=None, start_time=None, end_time=None, start=None, end=None, target_classes=None)¶

Retrieves values of many accuracy metrics aggregated over a time period.

New in version v2.18.

Parameters

model_idstr: the id of the model
start_timedatetime: start of the time period
end_timedatetime: end of the time period
target_classeslist[str], optional: Optional list of target class strings

Returns

accuracyAccuracy: the queried accuracy metrics information

Return type: Accuracy

get_accuracy_over_time(metric=None, model_id=None, start_time=None, end_time=None, bucket_size=None, target_classes=None)¶

Retrieves values of a single accuracy metric over a time period.

New in version v2.18.

Parameters

metricACCURACY_METRIC: the accuracy metric to retrieve
model_idstr: the id of the model
start_timedatetime: start of the time period
end_timedatetime: end of the time period
bucket_sizestr: time duration of a bucket, in ISO 8601 time duration format
target_classeslist[str], optional: Optional list of target class strings

Returns

accuracy_over_timeAccuracyOverTime: the queried accuracy metric over time information

Return type: AccuracyOverTime

get_predictions_vs_actuals_over_time(model_ids=None, start_time=None, end_time=None, bucket_size=None, target_classes=None)¶

Retrieve information for deployment’s predictions vs actuals over a certain time period.

New in version v3.3.

Parameters

model_idslist[str]: The ID of models to retrieve predictions vs actuals stats for.
start_timedatetime: Start of the time period.
end_timedatetime: End of the time period.
bucket_sizeBUCKET_SIZE: Time duration of each bucket.
target_classeslist[str]: Class names of target, only for deployments with a multiclass target.

Returns

predictions_vs_actuals_over_timePredictionsVsActualsOverTime: The queried predictions vs actuals over time information.

Examples

from datarobot import Deployment
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
predictions_over_time = deployment.get_predictions_vs_actuals_over_time()
predictions_over_time.buckets[0]['mean_actual_value']
>>>0.6673
predictions_over_time.buckets[0]['row_count_with_actual']
>>>500

Return type: PredictionsVsActualsOverTime

get_fairness_scores_over_time(start_time=None, end_time=None, bucket_size=None, model_id=None, protected_feature=None, fairness_metric=None)¶

Retrieves values of a single fairness score over a time period.

New in version v3.2.

Parameters

model_idstr: the id of the model
start_timedatetime: start of the time period
end_timedatetime: end of the time period
bucket_sizestr: time duration of a bucket, in ISO 8601 time duration format
protected_featurestr: name of protected feature
fairness_metricstr: A consolidation of the fairness metrics by the use case.

Returns

fairness_scores_over_timeFairnessScoresOverTime: the queried fairness score over time information

Return type: FairnessScoresOverTime

update_secondary_dataset_config(secondary_dataset_config_id, credential_ids=None)¶

Update the secondary dataset config used by Feature discovery model for a given deployment.

New in version v2.23.

Parameters

secondary_dataset_config_id: str: Id of the secondary dataset config
credential_ids: list or None: List of DatasetsCredentials used by the secondary datasets

Examples

from datarobot import Deployment
deployment = Deployment(deployment_id='5c939e08962d741e34f609f0')
config = deployment.update_secondary_dataset_config('5df109112ca582033ff44084')
config
>>> '5df109112ca582033ff44084'

Return type: str

get_secondary_dataset_config()¶

Get the secondary dataset config used by Feature discovery model for a given deployment.

New in version v2.23.

Returns

secondary_dataset_configSecondaryDatasetConfigurations: Id of the secondary dataset config

Examples

from datarobot import Deployment
deployment = Deployment(deployment_id='5c939e08962d741e34f609f0')
deployment.update_secondary_dataset_config('5df109112ca582033ff44084')
config = deployment.get_secondary_dataset_config()
config
>>> '5df109112ca582033ff44084'

Return type: str

get_prediction_results(model_id=None, start_time=None, end_time=None, actuals_present=None, offset=None, limit=None)¶

Retrieve a list of prediction results of the deployment.

New in version v2.24.

Parameters

model_idstr: the id of the model
start_timedatetime: start of the time period
end_timedatetime: end of the time period
actuals_presentbool: filters predictions results to only those who have actuals present or with missing actuals
offsetint: this many results will be skipped
limitint: at most this many results are returned

Returns

prediction_results: list[dict]: a list of prediction results

Examples

from datarobot import Deployment
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
results = deployment.get_prediction_results()

Return type: List[Dict[str, Any]]

download_prediction_results(filepath, model_id=None, start_time=None, end_time=None, actuals_present=None, offset=None, limit=None)¶

Download prediction results of the deployment as a CSV file.

New in version v2.24.

Parameters

filepathstr: path of the csv file
model_idstr: the id of the model
start_timedatetime: start of the time period
end_timedatetime: end of the time period
actuals_presentbool: filters predictions results to only those who have actuals present or with missing actuals
offsetint: this many results will be skipped
limitint: at most this many results are returned

Examples

from datarobot import Deployment
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
results = deployment.download_prediction_results('path_to_prediction_results.csv')

Return type: None

download_scoring_code(filepath, source_code=False, include_agent=False, include_prediction_explanations=False, include_prediction_intervals=False)¶

Retrieve scoring code of the current deployed model.

New in version v2.24.

Parameters

filepathstr: path of the scoring code file
source_codebool: whether source code or binary of the scoring code will be retrieved
include_agentbool: whether the scoring code retrieved will include tracking agent
include_prediction_explanationsbool: whether the scoring code retrieved will include prediction explanations
include_prediction_intervalsbool: whether the scoring code retrieved will support prediction intervals

Notes

When setting include_agent or include_predictions_explanations or include_prediction_intervals to True, it can take a considerably longer time to download the scoring code.

Examples

from datarobot import Deployment
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
results = deployment.download_scoring_code('path_to_scoring_code.jar')

Return type: None

download_model_package_file(filepath, compute_all_ts_intervals=False)¶

Retrieve model package file (mlpkg) of the current deployed model.

New in version v3.3.

Parameters

filepathstr: The file path of the model package file.
compute_all_ts_intervalsbool: Includes all time series intervals into the built Model Package (.mlpkg) if set to True.

Examples

from datarobot import Deployment
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
deployment.download_model_package_file('path_to_model_package.mlpkg')

Return type: None

delete_monitoring_data(model_id, start_time=None, end_time=None, max_wait=600)¶

Delete deployment monitoring data.

Parameters

model_idstr: id of the model to delete monitoring data
start_timedatetime, optional: start of the time period to delete monitoring data
end_timedatetime, optional: end of the time period to delete monitoring data
max_waitint, optional: seconds to wait for successful resolution

Return type: None

list_shared_roles(id=None, name=None, share_recipient_type=None, limit=100, offset=0)¶

Get a list of users, groups and organizations that have an access to this user blueprint

Parameters

id: str, Optional: Only return the access control information for a organization, group or user with this ID.
name: string, Optional: Only return the access control information for a organization, group or user with this name.
share_recipient_type: enum(‘user’, ‘group’, ‘organization’), Optional: Only returns results with the given recipient type.
limit: int (Default=0): At most this many results are returned.
offset: int (Default=0): This many results will be skipped.

Returns

list(DeploymentSharedRole)

Return type: List[DeploymentSharedRole]

update_shared_roles(roles)¶

Share a deployment with a user, group, or organization

Parameters

roles: list(or(GrantAccessControlWithUsernameValidator, GrantAccessControlWithIdValidator)): Array of GrantAccessControl objects, up to maximum 100 objects.

Return type: None

classmethod from_data(data)¶

Instantiate an object of this class using a dict.

Parameters

datadict: Correctly snake_cased keys and their values.

Return type: TypeVar(T, bound= APIObject)

classmethod from_server_data(data, keep_attrs=None)¶

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters

datadict: The directly translated dict of JSON from the server. No casing fixes have taken place
keep_attrsiterable: List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

Return type: TypeVar(T, bound= APIObject)

open_in_browser()¶

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type: None

class datarobot.models.deployment.DeploymentListFilters(role=None, service_health=None, model_health=None, accuracy_health=None, execution_environment_type=None, importance=None)¶

class datarobot.models.deployment.ServiceStats(period=None, metrics=None, model_id=None)¶

Deployment service stats information.

Attributes

model_idstr: the model used to retrieve service stats metrics
perioddict: the time period used to retrieve service stats metrics
metricsdict: the service stats metrics

classmethod get(deployment_id, model_id=None, start_time=None, end_time=None, execution_time_quantile=None, response_time_quantile=None, slow_requests_threshold=None)¶

Retrieve value of service stat metrics over a certain time period.

New in version v2.18.

Parameters

deployment_idstr: the id of the deployment
model_idstr, optional: the id of the model
start_timedatetime, optional: start of the time period
end_timedatetime, optional: end of the time period
execution_time_quantilefloat, optional: quantile for executionTime, defaults to 0.5
response_time_quantilefloat, optional: quantile for responseTime, defaults to 0.5
slow_requests_thresholdfloat, optional: threshold for slowRequests, defaults to 1000

Returns

service_statsServiceStats: the queried service stats metrics

Return type: ServiceStats

class datarobot.models.deployment.ServiceStatsOverTime(buckets=None, summary=None, metric=None, model_id=None)¶

Deployment service stats over time information.

Attributes

model_idstr: the model used to retrieve accuracy metric
metricstr: the service stat metric being retrieved
bucketsdict: how the service stat metric changes over time
summarydict: summary for the service stat metric

classmethod get(deployment_id, metric=None, model_id=None, start_time=None, end_time=None, bucket_size=None, quantile=None, threshold=None)¶

Retrieve information about how a service stat metric changes over a certain time period.

New in version v2.18.

Parameters

deployment_idstr: the id of the deployment
metricSERVICE_STAT_METRIC, optional: the service stat metric to retrieve
model_idstr, optional: the id of the model
start_timedatetime, optional: start of the time period
end_timedatetime, optional: end of the time period
bucket_sizestr, optional: time duration of a bucket, in ISO 8601 time duration format
quantilefloat, optional: quantile for ‘executionTime’ or ‘responseTime’, ignored when querying other metrics
thresholdint, optional: threshold for ‘slowQueries’, ignored when querying other metrics

Returns

service_stats_over_timeServiceStatsOverTime: the queried service stat over time information

Return type: ServiceStatsOverTime

property bucket_values: OrderedDict[str, Union[int, float, None]]¶

The metric value for all time buckets, keyed by start time of the bucket.

Returns

bucket_values: OrderedDict

class datarobot.models.deployment.TargetDrift(period=None, metric=None, model_id=None, target_name=None, drift_score=None, sample_size=None, baseline_sample_size=None)¶

Deployment target drift information.

Attributes

model_idstr: the model used to retrieve target drift metric
perioddict: the time period used to retrieve target drift metric
metricstr: the data drift metric
target_namestr: name of the target
drift_scorefloat: target drift score
sample_sizeint: count of data points for comparison
baseline_sample_sizeint: count of data points for baseline

classmethod get(deployment_id, model_id=None, start_time=None, end_time=None, metric=None)¶

Retrieve target drift information over a certain time period.

New in version v2.21.

Parameters

deployment_idstr: the id of the deployment
model_idstr: the id of the model
start_timedatetime: start of the time period
end_timedatetime: end of the time period
metricstr: (New in version v2.22) metric used to calculate the drift score

Returns

target_driftTargetDrift: the queried target drift information

Examples

from datarobot import Deployment, TargetDrift
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
target_drift = TargetDrift.get(deployment.id)
target_drift.period['end']
>>>'2019-08-01 00:00:00+00:00'
target_drift.drift_score
>>>0.03423
accuracy.target_name
>>>'readmitted'

Return type: TargetDrift

class datarobot.models.deployment.FeatureDrift(period=None, metric=None, model_id=None, name=None, drift_score=None, feature_impact=None, sample_size=None, baseline_sample_size=None)¶

Deployment feature drift information.

Attributes

model_idstr: the model used to retrieve feature drift metric
perioddict: the time period used to retrieve feature drift metric
metricstr: the data drift metric
namestr: name of the feature
drift_scorefloat: feature drift score
sample_sizeint: count of data points for comparison
baseline_sample_sizeint: count of data points for baseline

classmethod list(deployment_id, model_id=None, start_time=None, end_time=None, metric=None)¶

Retrieve drift information for deployment’s features over a certain time period.

New in version v2.21.

Parameters

deployment_idstr: the id of the deployment
model_idstr: the id of the model
start_timedatetime: start of the time period
end_timedatetime: end of the time period
metricstr: (New in version v2.22) metric used to calculate the drift score

Returns

feature_drift_data[FeatureDrift]: the queried feature drift information

Examples

from datarobot import Deployment, TargetDrift
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
feature_drift = FeatureDrift.list(deployment.id)[0]
feature_drift.period
>>>'2019-08-01 00:00:00+00:00'
feature_drift.drift_score
>>>0.252
feature_drift.name
>>>'age'

Return type: List[FeatureDrift]

class datarobot.models.deployment.PredictionsOverTime(baselines=None, buckets=None)¶

Deployment predictions over time information.

Attributes

baselinesList: target baseline for each model queried
bucketsList: predictions over time bucket for each model and bucket queried

classmethod get(deployment_id, model_ids=None, start_time=None, end_time=None, bucket_size=None, target_classes=None, include_percentiles=False)¶

Retrieve information for deployment’s prediction response over a certain time period.

New in version v3.2.

Parameters

deployment_idstr: the id of the deployment
model_idslist[str]: ID of models to retrieve prediction stats
start_timedatetime: start of the time period
end_timedatetime: end of the time period
bucket_sizeBUCKET_SIZE: time duration of each bucket
target_classeslist[str]: class names of target, only for deployments with multiclass target
include_percentilesbool: if the returned data includes percentiles, only for a deployment with a binary and regression target

Returns

predictions_over_timePredictionsOverTime: the queried predictions over time information

Return type: PredictionsOverTime

class datarobot.models.deployment.Accuracy(period=None, metrics=None, model_id=None)¶

Deployment accuracy information.

Attributes

model_idstr: the model used to retrieve accuracy metrics
perioddict: the time period used to retrieve accuracy metrics
metricsdict: the accuracy metrics

classmethod get(deployment_id, model_id=None, start_time=None, end_time=None, target_classes=None)¶

Retrieve values of accuracy metrics over a certain time period.

New in version v2.18.

Parameters

deployment_idstr: the id of the deployment
model_idstr: the id of the model
start_timedatetime: start of the time period
end_timedatetime: end of the time period
target_classeslist[str], optional: Optional list of target class strings

Returns

accuracyAccuracy: the queried accuracy metrics information

Examples

from datarobot import Deployment, Accuracy
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
accuracy = Accuracy.get(deployment.id)
accuracy.period['end']
>>>'2019-08-01 00:00:00+00:00'
accuracy.metric['LogLoss']['value']
>>>0.7533
accuracy.metric_values['LogLoss']
>>>0.7533

Return type: Accuracy

property metric_values: Dict[str, Optional[int]]¶

The value for all metrics, keyed by metric name.

Returns

metric_values: Dict

Return type: Dict[str, Optional[int]]

property metric_baselines: Dict[str, Optional[int]]¶

The baseline value for all metrics, keyed by metric name.

Returns

metric_baselines: Dict

Return type: Dict[str, Optional[int]]

property percent_changes: Dict[str, Optional[int]]¶

The percent change of value over baseline for all metrics, keyed by metric name.

Returns

percent_changes: Dict

Return type: Dict[str, Optional[int]]

class datarobot.models.deployment.AccuracyOverTime(buckets=None, summary=None, baseline=None, metric=None, model_id=None)¶

Deployment accuracy over time information.

Attributes

model_idstr: the model used to retrieve accuracy metric
metricstr: the accuracy metric being retrieved
bucketsdict: how the accuracy metric changes over time
summarydict: summary for the accuracy metric
baselinedict: baseline for the accuracy metric

classmethod get(deployment_id, metric=None, model_id=None, start_time=None, end_time=None, bucket_size=None, target_classes=None)¶

Retrieve information about how an accuracy metric changes over a certain time period.

New in version v2.18.

Parameters

deployment_idstr: the id of the deployment
metricACCURACY_METRIC: the accuracy metric to retrieve
model_idstr: the id of the model
start_timedatetime: start of the time period
end_timedatetime: end of the time period
bucket_sizestr: time duration of a bucket, in ISO 8601 time duration format
target_classeslist[str], optional: Optional list of target class strings

Returns

accuracy_over_timeAccuracyOverTime: the queried accuracy metric over time information

Examples

from datarobot import Deployment, AccuracyOverTime
from datarobot.enums import ACCURACY_METRICS
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
accuracy_over_time = AccuracyOverTime.get(deployment.id, metric=ACCURACY_METRIC.LOGLOSS)
accuracy_over_time.metric
>>>'LogLoss'
accuracy_over_time.metric_values
>>>{datetime.datetime(2019, 8, 1): 0.73, datetime.datetime(2019, 8, 2): 0.55}

Return type: AccuracyOverTime

classmethod get_as_dataframe(deployment_id, metrics=None, model_id=None, start_time=None, end_time=None, bucket_size=None)¶

Retrieve information about how a list of accuracy metrics change over a certain time period as pandas DataFrame.

In the returned DataFrame, the columns corresponds to the metrics being retrieved; the rows are labeled with the start time of each bucket.

Parameters

deployment_idstr: the id of the deployment
metrics[ACCURACY_METRIC]: the accuracy metrics to retrieve
model_idstr: the id of the model
start_timedatetime: start of the time period
end_timedatetime: end of the time period
bucket_sizestr: time duration of a bucket, in ISO 8601 time duration format

Returns

accuracy_over_time: pd.DataFrame

Return type: DataFrame

property bucket_values: Dict[datetime.datetime, int]¶

The metric value for all time buckets, keyed by start time of the bucket.

Returns

bucket_values: Dict

Return type: Dict[datetime, int]

property bucket_sample_sizes: Dict[datetime.datetime, int]¶

The sample size for all time buckets, keyed by start time of the bucket.

Returns

bucket_sample_sizes: Dict

Return type: Dict[datetime, int]

class datarobot.models.deployment.PredictionsVsActualsOverTime(summary=None, baselines=None, buckets=None)¶

Deployment predictions vs actuals over time information.

Attributes

summarydict: predictions vs actuals over time summary for all models and buckets queried
baselinesList: target baseline for each model queried
bucketsList: predictions vs actuals over time bucket for each model and bucket queried

classmethod get(deployment_id, model_ids=None, start_time=None, end_time=None, bucket_size=None, target_classes=None)¶

Retrieve information for deployment’s predictions vs actuals over a certain time period.

New in version v3.3.

Parameters

deployment_idstr: the id of the deployment
model_idslist[str]: ID of models to retrieve predictions vs actuals stats
start_timedatetime: start of the time period
end_timedatetime: end of the time period
bucket_sizeBUCKET_SIZE: time duration of each bucket
target_classeslist[str]: class names of target, only for deployments with multiclass target

Returns

predictions_vs_actuals_over_timePredictionsVsActualsOverTime: the queried predictions vs actuals over time information

Return type: PredictionsVsActualsOverTime

class datarobot.models.deployment.bias_and_fairness.FairnessScoresOverTime(summary=None, buckets=None, protected_feature=None, fairness_threshold=None, model_id=None, model_package_id=None, favorable_target_outcome=None)¶

Deployment fairness over time information.

Attributes

bucketsList: fairness over time bucket for each model and bucket queried
summarydict: summary for the fairness score
protected_featurestr: name of protected feature
fairnessThresholdfloat: threshold used to compute fairness results
modelIdstr: model id for which fairness is computed
modelPackageIdstr: model package id for which fairness is computed
favorableTargetOutcomebool: preferable class of the target

classmethod get(deployment_id, model_id=None, start_time=None, end_time=None, bucket_size=None, fairness_metric=None, protected_feature=None)¶

Retrieve information for deployment’s fairness score response over a certain time period.

New in version FUTURE.

Parameters

deployment_idstr: the id of the deployment
model_idstr: id of models to retrieve fairness score stats
start_timedatetime: start of the time period
end_timedatetime: end of the time period
protected_featurestr: name of the protected feature
fairness_metricstr: A consolidation of the fairness metrics by the use case.
bucket_sizeBUCKET_SIZE: time duration of each bucket

Returns

fairness_scores_over_timeFairnessScoresOverTime: the queried fairness score over time information

Return type: FairnessScoresOverTime

class datarobot.models.deployment.DeploymentSharedRole(id, name, role, share_recipient_type, **kwargs)¶

Parameters

share_recipient_type: enum(‘user’, ‘group’, ‘organization’): Describes the recipient type, either user, group, or organization.
role: str, one of enum(‘CONSUMER’, ‘USER’, ‘OWNER’): The role of the org/group/user on this deployment.
id: str: The ID of the recipient organization, group or user.
name: string: The name of the recipient organization, group or user.

class datarobot.models.deployment.DeploymentGrantSharedRoleWithId(id, role, share_recipient_type='user', **kwargs)¶

Parameters

share_recipient_type: enum(‘user’, ‘group’, ‘organization’): Describes the recipient type, either user, group, or organization.
role: enum(‘OWNER’, ‘USER’, ‘OBSERVER’, ‘NO_ROLE’): The role of the recipient on this entity. One of OWNER, USER, OBSERVER, NO_ROLE. If NO_ROLE is specified, any existing role for the recipient will be removed.
id: str: The ID of the recipient.

class datarobot.models.deployment.DeploymentGrantSharedRoleWithUsername(role, username, **kwargs)¶

Parameters

role: string: The role of the recipient on this entity. One of OWNER, USER, CONSUMER, NO_ROLE. If NO_ROLE is specified, any existing role for the user will be removed.
username: string: Username of the user to update the access role for.

class datarobot.models.deployment.deployment.FeatureDict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶

class datarobot.models.deployment.deployment.ForecastDateSettings() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶

class datarobot.models.deployment.deployment.ChallengerModelsSettings() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶

class datarobot.models.deployment.deployment.SegmentAnalysisSettings() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶

class datarobot.models.deployment.deployment.BiasAndFairnessSettings() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶

class datarobot.models.deployment.deployment.DriftTrackingSettings() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶

class datarobot.models.deployment.deployment.PredictionWarningSettings() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶

class datarobot.models.deployment.deployment.PredictionIntervalsSettings() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶

External Baseline Validation¶

class datarobot.models.external_baseline_validation.ExternalBaselineValidationInfo(baseline_validation_job_id, project_id, catalog_version_id, target, datetime_partition_column, is_external_baseline_dataset_valid, multiseries_id_columns=None, holdout_start_date=None, holdout_end_date=None, backtests=None, forecast_window_start=None, forecast_window_end=None, message=None)¶

An object containing information about external time series baseline predictions validation results.

Attributes

baseline_validation_job_idstr: the identifier of the baseline validation job
project_idstr: the identifier of the project
catalog_version_idstr: the identifier of the catalog version used in the validation job
targetstr: the name of the target feature
datetime_partition_columnstr: the name of the column whose values as dates are used to assign a row to a particular partition
is_external_baseline_dataset_validbool: whether the external baseline dataset passes the validation check
multiseries_id_columnslist of str or null: a list of the names of multiseries id columns to define series within the training data. Currently only one multiseries id column is supported.
holdout_start_datestr or None: the start date of holdout scoring data
holdout_end_datestr or None: the end date of holdout scoring data
backtestslist of dicts containing validation_start_date and validation_end_date or None: the configurd backtets of the time series project
forecast_window_startint: offset into the future to define how far forward relative to the forecast point the forecast window should start.
forecast_window_endint: offset into the future to define how far forward relative to the forecast point the forecast window should end.
messagestr or None: the description of the issue with external baseline validation job

classmethod get(project_id, validation_job_id)¶

Get information about external baseline validation job

Parameters

project_idstring: the identifier of the project
validation_job_idstring: the identifier of the external baseline validation job

Returns

info: ExternalBaselineValidationInfo: information about external baseline validation job

Return type: ExternalBaselineValidationInfo

External Scores and Insights¶

class datarobot.ExternalScores(project_id, scores, model_id=None, dataset_id=None, actual_value_column=None)¶

Metric scores on prediction dataset with target or actual value column in unsupervised case. Contains project metrics for supervised and special classification metrics set for unsupervised projects.

New in version v2.21.

Examples

List all scores for a dataset

import datarobot as dr
scores = dr.Scores.list(project_id, dataset_id=dataset_id)

Attributes

project_id: str: id of the project the model belongs to
model_id: str: id of the model
dataset_id: str: id of the prediction dataset with target or actual value column for unsupervised case
actual_value_column: str, optional: For unsupervised projects only. Actual value column which was used to calculate the classification metrics and insights on the prediction dataset.
scores: list of dicts in a form of {‘label’: metric_name, ‘value’: score}: Scores on the dataset.

classmethod create(project_id, model_id, dataset_id, actual_value_column=None)¶

Compute an external dataset insights for the specified model.

Parameters

project_idstr: id of the project the model belongs to
model_idstr: id of the model for which insights is requested
dataset_idstr: id of the dataset for which insights is requested
actual_value_columnstr, optional: actual values column label, for unsupervised projects only

Returns

jobJob: an instance of created async job

Return type: Job

classmethod list(project_id, model_id=None, dataset_id=None, offset=0, limit=100)¶

Fetch external scores list for the project and optionally for model and dataset.

Parameters

project_id: str: id of the project
model_id: str, optional: if specified, only scores for this model will be retrieved
dataset_id: str, optional: if specified, only scores for this dataset will be retrieved
offset: int, optional: this many results will be skipped, default: 0
limit: int, optional: at most this many results are returned, default: 100, max 1000. To return all results, specify 0

Returns

A list ofpy:class:External Scores <datarobot.ExternalScores> objects

Return type: List[ExternalScores]

classmethod get(project_id, model_id, dataset_id)¶

Retrieve external scores for the project, model and dataset.

Parameters

project_id: str: id of the project
model_id: str: if specified, only scores for this model will be retrieved
dataset_id: str: if specified, only scores for this dataset will be retrieved

Returns

External Scores object

Return type: ExternalScores

class datarobot.ExternalLiftChart(dataset_id, bins)¶

Lift chart for the model and prediction dataset with target or actual value column in unsupervised case.

New in version v2.21.

LiftChartBin is a dict containing the following:

actual (float) Sum of actual target values in bin

predicted (float) Sum of predicted target values in bin

bin_weight (float) The weight of the bin. For weighted projects, it is the sum of the weights of the rows in the bin. For unweighted projects, it is the number of rows in the bin.

Attributes

dataset_id: str: id of the prediction dataset with target or actual value column for unsupervised case
bins: list of dict: List of dicts with schema described as LiftChartBin above.

classmethod list(project_id, model_id, dataset_id=None, offset=0, limit=100)¶

Retrieve list of the lift charts for the model.

Parameters

project_id: str: id of the project
model_id: str: if specified, only lift chart for this model will be retrieved
dataset_id: str, optional: if specified, only lift chart for this dataset will be retrieved
offset: int, optional: this many results will be skipped, default: 0
limit: int, optional: at most this many results are returned, default: 100, max 1000. To return all results, specify 0

Returns

A list ofpy:class:ExternalLiftChart <datarobot.ExternalLiftChart> objects

Return type: List[ExternalLiftChart]

classmethod get(project_id, model_id, dataset_id)¶

Retrieve lift chart for the model and prediction dataset.

Parameters

project_id: str: project id
model_id: str: model id
dataset_id: str: prediction dataset id with target or actual value column for unsupervised case

Returns

ExternalLiftChart object

Return type: ExternalLiftChart

class datarobot.ExternalRocCurve(dataset_id, roc_points, negative_class_predictions, positive_class_predictions)¶

ROC curve data for the model and prediction dataset with target or actual value column in unsupervised case.

New in version v2.21.

Attributes

dataset_id: str: id of the prediction dataset with target or actual value column for unsupervised case
roc_points: list of dict: List of precalculated metrics associated with thresholds for ROC curve.
negative_class_predictions: list of float: List of predictions from example for negative class
positive_class_predictions: list of float: List of predictions from example for positive class

classmethod list(project_id, model_id, dataset_id=None, offset=0, limit=100)¶

Retrieve list of the roc curves for the model.

Parameters

project_id: str: id of the project
model_id: str: if specified, only lift chart for this model will be retrieved
dataset_id: str, optional: if specified, only lift chart for this dataset will be retrieved
offset: int, optional: this many results will be skipped, default: 0
limit: int, optional: at most this many results are returned, default: 100, max 1000. To return all results, specify 0

Returns

A list ofpy:class:ExternalRocCurve <datarobot.ExternalRocCurve> objects

Return type: List[ExternalRocCurve]

classmethod get(project_id, model_id, dataset_id)¶

Retrieve ROC curve chart for the model and prediction dataset.

Parameters

project_id: str: project id
model_id: str: model id
dataset_id: str: prediction dataset id with target or actual value column for unsupervised case

Returns

ExternalRocCurve object

Return type: ExternalRocCurve

Feature¶

class datarobot.models.Feature(id, project_id=None, name=None, feature_type=None, importance=None, low_information=None, unique_count=None, na_count=None, date_format=None, min=None, max=None, mean=None, median=None, std_dev=None, time_series_eligible=None, time_series_eligibility_reason=None, time_step=None, time_unit=None, target_leakage=None, feature_lineage_id=None, key_summary=None, multilabel_insights=None)¶

A feature from a project’s dataset

These are features either included in the originally uploaded dataset or added to it via feature transformations. In time series projects, these will be distinct from the ModelingFeature s created during partitioning; otherwise, they will correspond to the same features. For more information about input and modeling features, see the time series documentation.

The min, max, mean, median, and std_dev attributes provide information about the distribution of the feature in the EDA sample data. For non-numeric features or features created prior to these summary statistics becoming available, they will be None. For features where the summary statistics are available, they will be in a format compatible with the data type, i.e. date type features will have their summary statistics expressed as ISO-8601 formatted date strings.

Attributes

idint

the id for the feature - note that name is used to reference the feature instead of id

project_idstr

the id of the project the feature belongs to

namestr

the name of the feature

feature_typestr

the type of the feature, e.g. ‘Categorical’, ‘Text’

importancefloat or None

numeric measure of the strength of relationship between the feature and target (independent of any model or other features); may be None for non-modeling features such as partition columns

low_informationbool

whether a feature is considered too uninformative for modeling (e.g. because it has too few values)

unique_countint

number of unique values

na_countint or None

number of missing values

date_formatstr or None

For Date features, the date format string for how this feature was interpreted, compatible with https://docs.python.org/2/library/time.html#time.strftime . For other feature types, None.

minstr, int, float, or None

The minimum value of the source data in the EDA sample

maxstr, int, float, or None

The maximum value of the source data in the EDA sample

meanstr, int, or, float

The arithmetic mean of the source data in the EDA sample

medianstr, int, float, or None

The median of the source data in the EDA sample

std_devstr, int, float, or None

The standard deviation of the source data in the EDA sample

time_series_eligiblebool

Whether this feature can be used as the datetime partition column in a time series project.

time_series_eligibility_reasonstr

Why the feature is ineligible for the datetime partition column in a time series project, or ‘suitable’ when it is eligible.

time_stepint or None

For time series eligible features, a positive integer determining the interval at which windows can be specified. If used as the datetime partition column on a time series project, the feature derivation and forecast windows must start and end at an integer multiple of this value. None for features that are not time series eligible.

time_unitstr or None

For time series eligible features, the time unit covered by a single time step, e.g. ‘HOUR’, or None for features that are not time series eligible.

target_leakagestr

Whether a feature is considered to have target leakage or not. A value of ‘SKIPPED_DETECTION’ indicates that target leakage detection was not run on the feature. ‘FALSE’ indicates no leakage, ‘MODERATE’ indicates a moderate risk of target leakage, and ‘HIGH_RISK’ indicates a high risk of target leakage

feature_lineage_idstr

id of a lineage for automatically discovered features or derived time series features.

key_summary: list of dict

Statistics for top 50 keys (truncated to 103 characters) of Summarized Categorical column example:

{‘key’:’DataRobot’, ‘summary’:{‘min’:0, ‘max’:29815.0, ‘stdDev’:6498.029, ‘mean’:1490.75, ‘median’:0.0, ‘pctRows’:5.0}}

where,

key: string or None

name of the key

summary: dict

statistics of the key

max: maximum value of the key. min: minimum value of the key. mean: mean value of the key. median: median value of the key. stdDev: standard deviation of the key. pctRows: percentage occurrence of key in the EDA sample of the feature.

multilabel_insights_keystr or None

For multicategorical columns this will contain a key for multilabel insights. The key is unique for a project, feature and EDA stage combination. This will be the key for the most recent, finished EDA stage.

classmethod get(project_id, feature_name)¶

Retrieve a single feature

Parameters

project_idstr: The ID of the project the feature is associated with.
feature_namestr: The name of the feature to retrieve

Returns

featureFeature: The queried instance

get_multiseries_properties(multiseries_id_columns, max_wait=600)¶

Retrieve time series properties for a potential multiseries datetime partition column

Multiseries time series projects use multiseries id columns to model multiple distinct series within a single project. This function returns the time series properties (time step and time unit) of this column if it were used as a datetime partition column with the specified multiseries id columns, running multiseries detection automatically if it had not previously been successfully ran.

Parameters

multiseries_id_columnslist of str: the name(s) of the multiseries id columns to use with this datetime partition column. Currently only one multiseries id column is supported.
max_waitint, optional: if a multiseries detection task is run, the maximum amount of time to wait for it to complete before giving up

Returns

propertiesdict

A dict with three keys:

time_series_eligible : bool, whether the column can be used as a partition column

time_unit : str or null, the inferred time unit if used as a partition column

time_step : int or null, the inferred time step if used as a partition column

get_cross_series_properties(datetime_partition_column, cross_series_group_by_columns, max_wait=600)¶

Retrieve cross-series properties for multiseries ID column.

This function returns the cross-series properties (eligibility as group-by column) of this column if it were used with specified datetime partition column and with current multiseries id column, running cross-series group-by validation automatically if it had not previously been successfully ran.

Parameters

datetime_partition_columndatetime partition column
cross_series_group_by_columnslist of str: the name(s) of the columns to use with this multiseries ID column. Currently only one cross-series group-by column is supported.
max_waitint, optional: if a multiseries detection task is run, the maximum amount of time to wait for it to complete before giving up

Returns

propertiesdict

A dict with three keys:

name : str, column name

eligibility : str, reason for column eligibility

isEligible : bool, is column eligible as cross-series group-by

get_multicategorical_histogram()¶

Retrieve multicategorical histogram for this feature

New in version v2.24.

Returns

datarobot.models.MulticategoricalHistogram

Raises

datarobot.errors.InvalidUsageError: if this method is called on a unsuited feature
ValueError: if no multilabel_insights_key is present for this feature

get_pairwise_correlations()¶

Retrieve pairwise label correlation for multicategorical features

New in version v2.24.

Returns

datarobot.models.PairwiseCorrelations

Raises

datarobot.errors.InvalidUsageError: if this method is called on a unsuited feature
ValueError: if no multilabel_insights_key is present for this feature

get_pairwise_joint_probabilities()¶

Retrieve pairwise label joint probabilities for multicategorical features

New in version v2.24.

Returns

datarobot.models.PairwiseJointProbabilities

Raises

datarobot.errors.InvalidUsageError: if this method is called on a unsuited feature
ValueError: if no multilabel_insights_key is present for this feature

get_pairwise_conditional_probabilities()¶

Retrieve pairwise label conditional probabilities for multicategorical features

New in version v2.24.

Returns

datarobot.models.PairwiseConditionalProbabilities

Raises

datarobot.errors.InvalidUsageError: if this method is called on a unsuited feature
ValueError: if no multilabel_insights_key is present for this feature

classmethod from_data(data)¶

Instantiate an object of this class using a dict.

Parameters

datadict: Correctly snake_cased keys and their values.

Return type: TypeVar(T, bound= APIObject)

classmethod from_server_data(data, keep_attrs=None)¶

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters

datadict: The directly translated dict of JSON from the server. No casing fixes have taken place
keep_attrsiterable: List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

Return type: TypeVar(T, bound= APIObject)

get_histogram(bin_limit=None)¶

Retrieve a feature histogram

Parameters

bin_limitint or None: Desired max number of histogram bins. If omitted, by default endpoint will use 60.

Returns

featureHistogramFeatureHistogram: The requested histogram with desired number or bins

class datarobot.models.ModelingFeature(project_id=None, name=None, feature_type=None, importance=None, low_information=None, unique_count=None, na_count=None, date_format=None, min=None, max=None, mean=None, median=None, std_dev=None, parent_feature_names=None, key_summary=None, is_restored_after_reduction=None)¶

A feature used for modeling

In time series projects, a new set of modeling features is created after setting the partitioning options. These features are automatically derived from those in the project’s dataset and are the features used for modeling. Modeling features are only accessible once the target and partitioning options have been set. In projects that don’t use time series modeling, once the target has been set, ModelingFeatures and Features will behave the same.

For more information about input and modeling features, see the time series documentation.

As with the Feature object, the min, max, `mean, median, and std_dev attributes provide information about the distribution of the feature in the EDA sample data. For non-numeric features, they will be None. For features where the summary statistics are available, they will be in a format compatible with the data type, i.e. date type features will have their summary statistics expressed as ISO-8601 formatted date strings.

Attributes

project_idstr

the id of the project the feature belongs to

namestr

the name of the feature

feature_typestr

the type of the feature, e.g. ‘Categorical’, ‘Text’

importancefloat or None

numeric measure of the strength of relationship between the feature and target (independent of any model or other features); may be None for non-modeling features such as partition columns

low_informationbool

whether a feature is considered too uninformative for modeling (e.g. because it has too few values)

unique_countint

number of unique values

na_countint or None

number of missing values

date_formatstr or None

For Date features, the date format string for how this feature was interpreted, compatible with https://docs.python.org/2/library/time.html#time.strftime . For other feature types, None.

minstr, int, float, or None

The minimum value of the source data in the EDA sample

maxstr, int, float, or None

The maximum value of the source data in the EDA sample

meanstr, int, or, float

The arithmetic mean of the source data in the EDA sample

medianstr, int, float, or None

The median of the source data in the EDA sample

std_devstr, int, float, or None

The standard deviation of the source data in the EDA sample

parent_feature_nameslist of str

A list of the names of input features used to derive this modeling feature. In cases where the input features and modeling features are the same, this will simply contain the feature’s name. Note that if a derived feature was used to create this modeling feature, the values here will not necessarily correspond to the features that must be supplied at prediction time.

key_summary: list of dict

Statistics for top 50 keys (truncated to 103 characters) of Summarized Categorical column example:

{‘key’:’DataRobot’, ‘summary’:{‘min’:0, ‘max’:29815.0, ‘stdDev’:6498.029, ‘mean’:1490.75, ‘median’:0.0, ‘pctRows’:5.0}}

where,

key: string or None

name of the key

summary: dict

statistics of the key

max: maximum value of the key. min: minimum value of the key. mean: mean value of the key. median: median value of the key. stdDev: standard deviation of the key. pctRows: percentage occurrence of key in the EDA sample of the feature.

classmethod get(project_id, feature_name)¶

Retrieve a single modeling feature

Parameters

project_idstr: The ID of the project the feature is associated with.
feature_namestr: The name of the feature to retrieve

Returns

featureModelingFeature: The requested feature

class datarobot.models.DatasetFeature(id_, dataset_id=None, dataset_version_id=None, name=None, feature_type=None, low_information=None, unique_count=None, na_count=None, date_format=None, min_=None, max_=None, mean=None, median=None, std_dev=None, time_series_eligible=None, time_series_eligibility_reason=None, time_step=None, time_unit=None, target_leakage=None, target_leakage_reason=None)¶

A feature from a project’s dataset

These are features either included in the originally uploaded dataset or added to it via feature transformations.

The min, max, mean, median, and std_dev attributes provide information about the distribution of the feature in the EDA sample data. For non-numeric features or features created prior to these summary statistics becoming available, they will be None. For features where the summary statistics are available, they will be in a format compatible with the data type, i.e. date type features will have their summary statistics expressed as ISO-8601 formatted date strings.

Attributes

idint: the id for the feature - note that name is used to reference the feature instead of id
dataset_idstr: the id of the dataset the feature belongs to
dataset_version_idstr: the id of the dataset version the feature belongs to
namestr: the name of the feature
feature_typestr, optional: the type of the feature, e.g. ‘Categorical’, ‘Text’
low_informationbool, optional: whether a feature is considered too uninformative for modeling (e.g. because it has too few values)
unique_countint, optional: number of unique values
na_countint, optional: number of missing values
date_formatstr, optional: For Date features, the date format string for how this feature was interpreted, compatible with https://docs.python.org/2/library/time.html#time.strftime . For other feature types, None.
minstr, int, float, optional: The minimum value of the source data in the EDA sample
maxstr, int, float, optional: The maximum value of the source data in the EDA sample
meanstr, int, float, optional: The arithmetic mean of the source data in the EDA sample
medianstr, int, float, optional: The median of the source data in the EDA sample
std_devstr, int, float, optional: The standard deviation of the source data in the EDA sample
time_series_eligiblebool, optional: Whether this feature can be used as the datetime partition column in a time series project.
time_series_eligibility_reasonstr, optional: Why the feature is ineligible for the datetime partition column in a time series project, or ‘suitable’ when it is eligible.
time_stepint, optional: For time series eligible features, a positive integer determining the interval at which windows can be specified. If used as the datetime partition column on a time series project, the feature derivation and forecast windows must start and end at an integer multiple of this value. None for features that are not time series eligible.
time_unitstr, optional: For time series eligible features, the time unit covered by a single time step, e.g. ‘HOUR’, or None for features that are not time series eligible.
target_leakagestr, optional: Whether a feature is considered to have target leakage or not. A value of ‘SKIPPED_DETECTION’ indicates that target leakage detection was not run on the feature. ‘FALSE’ indicates no leakage, ‘MODERATE’ indicates a moderate risk of target leakage, and ‘HIGH_RISK’ indicates a high risk of target leakage
target_leakage_reason: string, optional: The descriptive text explaining the reason for target leakage, if any.

get_histogram(bin_limit=None)¶

Retrieve a feature histogram

Parameters

bin_limitint or None: Desired max number of histogram bins. If omitted, by default endpoint will use 60.

Returns

featureHistogramDatasetFeatureHistogram: The requested histogram with desired number or bins

class datarobot.models.DatasetFeatureHistogram(plot)¶

classmethod get(dataset_id, feature_name, bin_limit=None, key_name=None)¶

Retrieve a single feature histogram

Parameters

dataset_idstr: The ID of the Dataset the feature is associated with.
feature_namestr: The name of the feature to retrieve
bin_limitint or None: Desired max number of histogram bins. If omitted, by default the endpoint will use 60.
key_name: string or None: (Only required for summarized categorical feature) Name of the top 50 keys for which plot to be retrieved

Returns

featureHistogramFeatureHistogram: The queried instance with plot attribute in it.

class datarobot.models.FeatureHistogram(plot)¶

classmethod get(project_id, feature_name, bin_limit=None, key_name=None)¶

Retrieve a single feature histogram

Parameters

project_idstr: The ID of the project the feature is associated with.
feature_namestr: The name of the feature to retrieve
bin_limitint or None: Desired max number of histogram bins. If omitted, by default endpoint will use 60.
key_name: string or None: (Only required for summarized categorical feature) Name of the top 50 keys for which plot to be retrieved

Returns

featureHistogramFeatureHistogram: The queried instance with plot attribute in it.

class datarobot.models.InteractionFeature(rows, source_columns, bars, bubbles)¶

Interaction feature data

New in version v2.21.

Attributes

rows: int: Total number of rows
source_columns: list(str): names of two categorical features which were combined into this one
bars: list(dict): dictionaries representing frequencies of each independent value from the source columns
bubbles: list(dict): dictionaries representing frequencies of each combined value in the interaction feature.

classmethod get(project_id, feature_name)¶

Retrieve a single Interaction feature

Parameters

project_idstr: The id of the project the feature belongs to
feature_namestr: The name of the Interaction feature to retrieve

Returns

featureInteractionFeature: The queried instance

class datarobot.models.MulticategoricalHistogram(feature_name, histogram)¶

Histogram for Multicategorical feature.

New in version v2.24.

Notes

HistogramValues contains:

values.[].label : string - Label name
values.[].plot : list - Histogram for label
values.[].plot.[].label_relevance : int - Label relevance value
values.[].plot.[].row_count : int - Row count where label has given relevance
values.[].plot.[].row_pct : float - Percentage of rows where label has given relevance

Attributes

feature_namestr: Name of the feature
valueslist(dict): List of Histogram values with a schema described as HistogramValues

classmethod get(multilabel_insights_key)¶

Retrieves multicategorical histogram

You might find it more convenient to use Feature.get_multicategorical_histogram instead.

Parameters

multilabel_insights_key: string: Key for multilabel insights, unique for a project, feature and EDA stage combination. The multilabel_insights_key can be retrieved via Feature.multilabel_insights_key.

Returns

MulticategoricalHistogram: The multicategorical histogram for multilabel_insights_key

to_dataframe()¶

Convenience method to get all the information from this multicategorical_histogram instance in form of a pandas.DataFrame.

Returns

pandas.DataFrame: Histogram information as a multicategorical_histogram. The dataframe will contain these columns: feature_name, label, label_relevance, row_count and row_pct

class datarobot.models.PairwiseCorrelations(*args, **kwargs)¶

Correlation of label pairs for multicategorical feature.

New in version v2.24.

Notes

CorrelationValues contain:

values.[].label_configuration : list of length 2 - Configuration of the label pair
values.[].label_configuration.[].label : str – Label name
values.[].statistic_value : float – Statistic value

Attributes

feature_namestr: Name of the feature
valueslist(dict): List of correlation values with a schema described as CorrelationValues
statistic_dataframepandas.DataFrame: Correlation values for all label pairs as a DataFrame

classmethod get(multilabel_insights_key)¶

Retrieves pairwise correlations

You might find it more convenient to use Feature.get_pairwise_correlations instead.

Parameters

multilabel_insights_key: string: Key for multilabel insights, unique for a project, feature and EDA stage combination. The multilabel_insights_key can be retrieved via Feature.multilabel_insights_key.

Returns

PairwiseCorrelations: The pairwise label correlations

as_dataframe()¶

The pairwise label correlations as a (num_labels x num_labels) DataFrame.

Returns

pandas.DataFrame: The pairwise label correlations. Index and column names allow the interpretation of the values.

class datarobot.models.PairwiseJointProbabilities(*args, **kwargs)¶

Joint probabilities of label pairs for multicategorical feature.

New in version v2.24.

Notes

ProbabilityValues contain:

values.[].label_configuration : list of length 2 - Configuration of the label pair
values.[].label_configuration.[].relevance : int – 0 for absence of the labels, 1 for the presence of labels
values.[].label_configuration.[].label : str – Label name
values.[].statistic_value : float – Statistic value

Attributes

feature_namestr

Name of the feature

valueslist(dict)

List of joint probability values with a schema described as ProbabilityValues

statistic_dataframesdict(pandas.DataFrame)

Joint Probability values as DataFrames for different relevance combinations.

E.g. The probability P(A=0,B=1) can be retrieved via: pairwise_joint_probabilities.statistic_dataframes[(0,1)].loc['A', 'B']

classmethod get(multilabel_insights_key)¶

Retrieves pairwise joint probabilities

You might find it more convenient to use Feature.get_pairwise_joint_probabilities instead.

Parameters

multilabel_insights_key: string: Key for multilabel insights, unique for a project, feature and EDA stage combination. The multilabel_insights_key can be retrieved via Feature.multilabel_insights_key.

Returns

PairwiseJointProbabilities: The pairwise joint probabilities

as_dataframe(relevance_configuration)¶

Joint probabilities of label pairs as a (num_labels x num_labels) DataFrame.

Parameters

relevance_configuration: tuple of length 2

Valid options are (0, 0), (0, 1), (1, 0) and (1, 1). Values of 0 indicate absence of labels and 1 indicates presence of labels. The first value describes the presence for the labels in axis=0 and the second value describes the presence for the labels in axis=1.

For example the matrix values for a relevance configuration of (0, 1) describe the probabilities of absent labels in the index axis and present labels in the column axis.

E.g. The probability P(A=0,B=1) can be retrieved via: pairwise_joint_probabilities.as_dataframe((0,1)).loc['A', 'B']

Returns

pandas.DataFrame: The joint probabilities for the requested relevance_configuration. Index and column names allow the interpretation of the values.

class datarobot.models.PairwiseConditionalProbabilities(*args, **kwargs)¶

Conditional probabilities of label pairs for multicategorical feature.

New in version v2.24.

Notes

ProbabilityValues contain:

values.[].label_configuration : list of length 2 - Configuration of the label pair
values.[].label_configuration.[].relevance : int – 0 for absence of the labels, 1 for the presence of labels
values.[].label_configuration.[].label : str – Label name
values.[].statistic_value : float – Statistic value

Attributes

feature_namestr

Name of the feature

valueslist(dict)

List of conditional probability values with a schema described as ProbabilityValues

statistic_dataframesdict(pandas.DataFrame)

Conditional Probability values as DataFrames for different relevance combinations. The label names in the columns are the events, on which we condition. The label names in the index are the events whose conditional probability given the indexes is in the dataframe.

E.g. The probability P(A=0|B=1) can be retrieved via: pairwise_conditional_probabilities.statistic_dataframes[(0,1)].loc['A', 'B']

classmethod get(multilabel_insights_key)¶

Retrieves pairwise conditional probabilities

You might find it more convenient to use Feature.get_pairwise_conditional_probabilities instead.

Parameters

multilabel_insights_key: string: Key for multilabel insights, unique for a project, feature and EDA stage combination. The multilabel_insights_key can be retrieved via Feature.multilabel_insights_key.

Returns

PairwiseConditionalProbabilities: The pairwise conditional probabilities

as_dataframe(relevance_configuration)¶

Conditional probabilities of label pairs as a (num_labels x num_labels) DataFrame. The label names in the columns are the events, on which we condition. The label names in the index are the events whose conditional probability given the indexes is in the dataframe.

E.g. The probability P(A=0|B=1) can be retrieved via: pairwise_conditional_probabilities.as_dataframe((0, 1)).loc['A', 'B']

Parameters

relevance_configuration: tuple of length 2

Valid options are (0, 0), (0, 1), (1, 0) and (1, 1). Values of 0 indicate absence of labels and 1 indicates presence of labels. The first value describes the presence for the labels in axis=0 and the second value describes the presence for the labels in axis=1.

For example the matrix values for a relevance configuration of (0, 1) describe the probabilities of absent labels in the index axis given the presence of labels in the column axis.

Returns

pandas.DataFrame: The conditional probabilities for the requested relevance_configuration. Index and column names allow the interpretation of the values.

Feature Association¶

class datarobot.models.FeatureAssociationMatrix(strengths=None, features=None, project_id=None)¶

Feature association statistics for a project.

Note

Projects created prior to v2.17 are not supported by this feature.

Examples

import datarobot as dr

# retrieve feature association matrix
feature_association_matrix = dr.FeatureAssociationMatrix.get(project_id)
feature_association_matrix.strengths
feature_association_matrix.features

# retrieve feature association matrix for a metric, association type or a feature list
feature_association_matrix = dr.FeatureAssociationMatrix.get(
    project_id,
    metric=enums.FEATURE_ASSOCIATION_METRIC.SPEARMAN,
    association_type=enums.FEATURE_ASSOCIATION_TYPE.CORRELATION,
    featurelist_id=featurelist_id,
)

Attributes

project_idstr: Id of the associated project.
strengthslist of dict: Pairwise statistics for the available features as structured below.
featureslist of dict: Metadata for each feature and where it goes in the matrix.

classmethod get(project_id, metric=None, association_type=None, featurelist_id=None)¶

Get feature association statistics.

Parameters

project_idstr: Id of the project that contains the requested associations.
metricenums.FEATURE_ASSOCIATION_METRIC: The name of a metric to get pairwise data for. Since ‘v2.19’ this is optional and defaults to enums.FEATURE_ASSOCIATION_METRIC.MUTUAL_INFO.
association_typeenums.FEATURE_ASSOCIATION_TYPE: The type of dependence for the data. Since ‘v2.19’ this is optional and defaults to enums.FEATURE_ASSOCIATION_TYPE.ASSOCIATION.
featurelist_idstr or None: Optional, the feature list to lookup FAM data for. By default, depending on the type of the project “Informative Features” or “Timeseries Informative Features” list will be used. (New in version v2.19)

Returns

FeatureAssociationMatrix: Feature association pairwise metric strength data, feature clustering data, and ordering data for Feature Association Matrix visualization.

Return type: FeatureAssociationMatrix

Feature Association Matrix Details¶

class datarobot.models.FeatureAssociationMatrixDetails(project_id=None, chart_type=None, values=None, features=None, types=None, featurelist_id=None)¶

Plotting details for a pair of passed features present in the feature association matrix.

Note

Projects created prior to v2.17 are not supported by this feature.

Attributes

project_idstr: Id of the project that contains the requested associations.
chart_typestr: Which type of plotting the pair of features gets in the UI. e.g. ‘HORIZONTAL_BOX’, ‘VERTICAL_BOX’, ‘SCATTER’ or ‘CONTINGENCY’
valueslist: The data triplets for pairwise plotting e.g. {“values”: [[460.0, 428.5, 0.001], [1679.3, 259.0, 0.001], …] The first entry of each list is a value of feature1, the second entry of each list is a value of feature2, and the third is the relative frequency of the pair of datapoints in the sample.
featureslist: A list of the requested features, [feature1, feature2]
typeslist: The type of feature1 and feature2. Possible values: “CATEGORICAL”, “NUMERIC”
featurelist_idstr: Id of the feature list to lookup FAM details for.

classmethod get(project_id, feature1, feature2, featurelist_id=None)¶

Get a sample of the actual values used to measure the association between a pair of features

New in version v2.17.

Parameters

project_idstr: Id of the project of interest.
feature1str: Feature name for the first feature of interest.
feature2str: Feature name for the second feature of interest.
featurelist_idstr: Optional, the feature list to lookup FAM data for. By default, depending on the type of the project “Informative Features” or “Timeseries Informative Features” list will be used.

Returns

FeatureAssociationMatrixDetails: The feature association plotting for provided pair of features.

Return type: FeatureAssociationMatrixDetails

Feature Association Featurelists¶

class datarobot.models.FeatureAssociationFeaturelists(project_id=None, featurelists=None)¶

Featurelists with feature association matrix availability flags for a project.

Attributes

project_idstr: Id of the project that contains the requested associations.
featurelistslist fo dict: The featurelists with the featurelist_id, title and the has_fam flag.

classmethod get(project_id)¶

Get featurelists with feature association status for each.

Parameters

project_idstr: Id of the project of interest.

Returns

FeatureAssociationFeaturelists: Featurelist with feature association status for each.

Return type: FeatureAssociationFeaturelists

Feature Discovery¶

Relationships Configuration¶

class datarobot.models.RelationshipsConfiguration(id, dataset_definitions=None, relationships=None, feature_discovery_mode=None, feature_discovery_settings=None)¶

A Relationships configuration specifies a set of secondary datasets as well as the relationships among them. It is used to configure Feature Discovery for a project to generate features automatically from these datasets.

Attributes

idstring

Id of the created relationships configuration

dataset_definitions: list

Each element is a dataset_definitions for a dataset.

relationships: list

Each element is a relationship between two datasets

feature_discovery_mode: str

Mode of feature discovery. Supported values are ‘default’ and ‘manual’

feature_discovery_settings: list

List of feature discovery settings used to customize the feature discovery process

The `dataset_definitions` structure is

identifier: string

Alias of the dataset (used directly as part of the generated feature names)

catalog_id: str, or None

Identifier of the catalog item

catalog_version_id: str

Identifier of the catalog item version

primary_temporal_key: string, optional

Name of the column indicating time of record creation

feature_list_id: string, optional

Identifier of the feature list. This decides which columns in the dataset are used for feature generation

snapshot_policy: str

Policy to use when creating a project or making predictions. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)

feature_lists: list

List of feature list info

data_source: dict

Data source info if the dataset is from data source

data_sources: list

List of Data source details for a JDBC datasets

is_deleted: bool, optional

Whether the dataset is deleted or not

The `data source info` structured is

data_store_id: str

Id of the data store.

data_store_namestr

User-friendly name of the data store.

urlstr

Url used to connect to the data store.

dbtablestr

Name of table from the data store.

schema: str

Schema definition of the table from the data store

catalog: str

Catalog name of the data source.

The `feature list info` structure is

idstr

Id of the featurelist

namestr

Name of the featurelist

featureslist of str

Names of all the Features in the featurelist

dataset_idstr

Project the featurelist belongs to

creation_datedatetime.datetime

When the featurelist was created

user_createdbool

Whether the featurelist was created by a user or by DataRobot automation

created_by: str

Name of user who created it

descriptionstr

Description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.

dataset_id: str

Dataset which is associated with the feature list

dataset_version_id: str or None

Version of the dataset which is associated with feature list. Only relevant for Informative features

The `relationships` schema is

dataset1_identifier: str or None

Identifier of the first dataset in this relationship. This is specified in the identifier field of dataset_definition structure. If None, then the relationship is with the primary dataset.

dataset2_identifier: str

Identifier of the second dataset in this relationship. This is specified in the identifier field of dataset_definition schema.

dataset1_keys: list of str (max length: 10 min length: 1)

Column(s) from the first dataset which are used to join to the second dataset

dataset2_keys: list of str (max length: 10 min length: 1)

Column(s) from the second dataset that are used to join to the first dataset

time_unit: str, or None

Time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR. If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_start: int, or None

How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_end: int, or None

How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_time_unit: int or None

Time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR If present, time-aware joins will be used. Only applicable when dataset1Identifier is not provided.

feature_derivation_windows: list of dict, or None

List of feature derivation windows settings. If present, time-aware joins will be used. Only allowed when feature_derivation_window_start, feature_derivation_window_end and feature_derivation_window_time_unit are not provided.

prediction_point_rounding: int, or None

Closest value of prediction_point_rounding_time_unit to round the prediction point into the past when applying the feature derivation window. Will be a positive integer, if present.Only applicable when dataset1_identifier is not provided.

prediction_point_rounding_time_unit: str, or None

time unit of the prediction point rounding. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR Only applicable when dataset1_identifier is not provided.

The `feature_derivation_windows` is a list of dictionary with schema:

start: int: How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin.
end: int: How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end.
unit: string: Time unit of the feature derivation window. One of datarobot.enums.AllowedTimeUnitsSAFER.

The `feature_discovery_settings` structure is:

name: str

Name of the feature discovery setting

value: bool

Value of the feature discovery setting

To see the list of possible settings, create a RelationshipConfiguration without specifying

settings and check its `feature_discovery_settings` attribute, which is a list of possible

settings with their default values.

classmethod create(dataset_definitions, relationships, feature_discovery_settings=None)¶

Create a Relationships Configuration

Parameters

dataset_definitions: list of dataset definitions: Each element is a datarobot.helpers.feature_discovery.DatasetDefinition
relationships: list of relationships: Each element is a datarobot.helpers.feature_discovery.Relationship
feature_discovery_settingslist of feature discovery settings, optional: Each element is a dictionary or a datarobot.helpers.feature_discovery.FeatureDiscoverySetting. If not provided, default settings will be used.

Returns

relationships_configuration: RelationshipsConfiguration: Created relationships configuration

Examples

dataset_definition = dr.DatasetDefinition(
    identifier='profile',
    catalog_id='5fd06b4af24c641b68e4d88f',
    catalog_version_id='5fd06b4af24c641b68e4d88f'
)
relationship = dr.Relationship(
    dataset2_identifier='profile',
    dataset1_keys=['CustomerID'],
    dataset2_keys=['CustomerID'],
    feature_derivation_window_start=-14,
    feature_derivation_window_end=-1,
    feature_derivation_window_time_unit='DAY',
    prediction_point_rounding=1,
    prediction_point_rounding_time_unit='DAY'
)
dataset_definitions = [dataset_definition]
relationships = [relationship]
relationship_config = dr.RelationshipsConfiguration.create(
    dataset_definitions=dataset_definitions,
    relationships=relationships,
    feature_discovery_settings = [
        {'name': 'enable_categorical_statistics', 'value': True},
        {'name': 'enable_numeric_skewness', 'value': True},
    ]
)
>>> relationship_config.id
'5c88a37770fc42a2fcc62759'

get()¶

Retrieve the Relationships configuration for a given id

Returns

relationships_configuration: RelationshipsConfiguration: The requested relationships configuration

Raises

ClientError: Raised if an invalid relationships config id is provided.

Examples

relationships_config = dr.RelationshipsConfiguration(valid_config_id)
result = relationships_config.get()
>>> result.id
'5c88a37770fc42a2fcc62759'

replace(dataset_definitions, relationships, feature_discovery_settings=None)¶

Update the Relationships Configuration which is not used in the feature discovery Project

Parameters

dataset_definitions: list of dataset definition: Each element is a datarobot.helpers.feature_discovery.DatasetDefinition
relationships: list of relationships: Each element is a datarobot.helpers.feature_discovery.Relationship
feature_discovery_settingslist of feature discovery settings, optional: Each element is a dictionary or a datarobot.helpers.feature_discovery.FeatureDiscoverySetting. If not provided, default settings will be used.

Returns

relationships_configuration: RelationshipsConfiguration: the updated relationships configuration

delete()¶

Delete the Relationships configuration

Raises

ClientError: Raised if an invalid relationships config id is provided.

Examples

# Deleting with a valid id
relationships_config = dr.RelationshipsConfiguration(valid_config_id)
status_code = relationships_config.delete()
status_code
>>> 204
relationships_config.get()
>>> ClientError: Relationships Configuration not found

Dataset Definition¶

class datarobot.helpers.feature_discovery.DatasetDefinition(identifier, catalog_id, catalog_version_id, snapshot_policy='latest', feature_list_id=None, primary_temporal_key=None)¶

Dataset definition for the Feature Discovery

New in version v2.25.

Examples

import datarobot as dr
dataset_definition = dr.DatasetDefinition(
    identifier='profile',
    catalog_id='5ec4aec1f072bc028e3471ae',
    catalog_version_id='5ec4aec2f072bc028e3471b1',
)

dataset_definition = dr.DatasetDefinition(
    identifier='transaction',
    catalog_id='5ec4aec1f072bc028e3471ae',
    catalog_version_id='5ec4aec2f072bc028e3471b1',
    primary_temporal_key='Date'
)

Attributes

identifier: string: Alias of the dataset (used directly as part of the generated feature names)
catalog_id: string, optional: Identifier of the catalog item
catalog_version_id: string: Identifier of the catalog item version
primary_temporal_key: string, optional: Name of the column indicating time of record creation
feature_list_id: string, optional: Identifier of the feature list. This decides which columns in the dataset are used for feature generation
snapshot_policy: string, optional: Policy to use when creating a project or making predictions. If omitted, by default endpoint will use ‘latest’. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)

Relationship¶

class datarobot.helpers.feature_discovery.Relationship(dataset2_identifier, dataset1_keys, dataset2_keys, dataset1_identifier=None, feature_derivation_window_start=None, feature_derivation_window_end=None, feature_derivation_window_time_unit=None, feature_derivation_windows=None, prediction_point_rounding=None, prediction_point_rounding_time_unit=None)¶

Relationship between dataset defined in DatasetDefinition

New in version v2.25.

Examples

import datarobot as dr
relationship = dr.Relationship(
    dataset1_identifier='profile',
    dataset2_identifier='transaction',
    dataset1_keys=['CustomerID'],
    dataset2_keys=['CustomerID']
)

relationship = dr.Relationship(
    dataset2_identifier='profile',
    dataset1_keys=['CustomerID'],
    dataset2_keys=['CustomerID'],
    feature_derivation_window_start=-14,
    feature_derivation_window_end=-1,
    feature_derivation_window_time_unit='DAY',
    prediction_point_rounding=1,
    prediction_point_rounding_time_unit='DAY'
)

Attributes

dataset1_identifier: string, optional

Identifier of the first dataset in this relationship. This is specified in the identifier field of dataset_definition structure. If None, then the relationship is with the primary dataset.

dataset2_identifier: string

Identifier of the second dataset in this relationship. This is specified in the identifier field of dataset_definition schema.

dataset1_keys: list of string (max length: 10 min length: 1)

Column(s) from the first dataset which are used to join to the second dataset

dataset2_keys: list of string (max length: 10 min length: 1)

Column(s) from the second dataset that are used to join to the first dataset

feature_derivation_window_start: int, or None

How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_end: int, optional

How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_time_unit: int, optional

Time unit of the feature derivation window. One of datarobot.enums.AllowedTimeUnitsSAFER If present, time-aware joins will be used. Only applicable when dataset1_identifier is not provided.

feature_derivation_windows: list of dict, or None

List of feature derivation windows settings. If present, time-aware joins will be used. Only allowed when feature_derivation_window_start, feature_derivation_window_end and feature_derivation_window_time_unit are not provided.

prediction_point_rounding: int, optional

Closest value of prediction_point_rounding_time_unit to round the prediction point into the past when applying the feature derivation window. Will be a positive integer, if present.Only applicable when dataset1_identifier is not provided.

prediction_point_rounding_time_unit: string, optional

Time unit of the prediction point rounding. One of datarobot.enums.AllowedTimeUnitsSAFER Only applicable when dataset1_identifier is not provided.

The `feature_derivation_windows` is a list of dictionary with schema:

start: int: How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin.
end: int: How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end.
unit: string: Time unit of the feature derivation window. One of datarobot.enums.AllowedTimeUnitsSAFER.

Feature Lineage¶

class datarobot.models.FeatureLineage(steps=None)¶

Lineage of an automatically engineered feature.

Attributes

steps: list: list of steps which were applied to build the feature.
`steps` structure is:
idint: step id starting with 0.
step_type: str: one of the data/action/json/generatedData.
name: str: name of the step.
description: str: description of the step.
parents: list[int]: references to other steps id.
is_time_aware: bool: indicator of step being time aware. Mandatory only for action and join steps. action step provides additional information about feature derivation window in the timeInfo field.
catalog_id: str: id of the catalog for a data step.
catalog_version_id: str: id of the catalog version for a data step.
group_by: list[str]: list of columns which this action step aggregated by.
columns: list: names of columns involved into the feature generation. Available only for data steps.
time_info: dict: description of the feature derivation window which was applied to this action step.
join_info: list[dict]: join step details.
`columns` structure is
data_type: str: the type of the feature, e.g. ‘Categorical’, ‘Text’
is_input: bool: indicates features which provided data to transform in this lineage.
name: str: feature name.
is_cutoff: bool: indicates a cutoff column.
`time_info` structure is:
latest: dict: end of the feature derivation window applied.
duration: dict: size of the feature derivation window applied.
`latest` and `duration` structure is:
time_unit: str: time unit name like ‘MINUTE’, ‘DAY’, ‘MONTH’ etc.
duration: int: value/size of this duration object.
`join_info` structure is:
join_type: str: kind of join, left/right.
left_table: dict: information about a dataset which was considered as left.
right_table: str: information about a dataset which was considered as right.
`left_table` and `right_table` structure is:
columns: list[str]: list of columns which datasets were joined by.
datasteps: list[int]: list of data steps id which brought the columns into the current step dataset.

classmethod get(project_id, id)¶

Retrieve a single FeatureLineage.

Parameters

project_idstr: The id of the project the feature belongs to
idstr: id of a feature lineage to retrieve

Returns

lineageFeatureLineage: The queried instance

Secondary Dataset Configurations¶

class datarobot.models.SecondaryDatasetConfigurations(id, project_id, config=None, secondary_datasets=None, name=None, creator_full_name=None, creator_user_id=None, created=None, featurelist_id=None, credential_ids=None, is_default=None, project_version=None)¶

Create secondary dataset configurations for a given project

New in version v2.20.

Attributes

idstr: Id of this secondary dataset configuration
project_idstr: Id of the associated project.
config: list of DatasetConfiguration (Deprecated in version v2.23): List of secondary dataset configurations
secondary_datasets: list of SecondaryDataset (new in v2.23): List of secondary datasets (secondaryDataset)
name: str: Verbose name of the SecondaryDatasetConfig. null if it wasn’t specified.
created: datetime.datetime: DR-formatted datetime. null for legacy (before DR 6.0) db records.
creator_user_id: str: Id of the user created this config.
creator_full_name: str: fullname or email of the user created this config.
featurelist_id: str, optional: Id of the feature list. null if it wasn’t specified.
credential_ids: list of DatasetsCredentials, optional: credentials used by the secondary datasets if the datasets used in the configuration are from datasource
is_default: bool, optional: Boolean flag if default config created during feature discovery aim
project_version: str, optional: Version of project when its created (Release version)

classmethod create(project_id, secondary_datasets, name, featurelist_id=None)¶

create secondary dataset configurations

New in version v2.20.

Parameters

project_idstr: id of the associated project.
secondary_datasets: list of SecondaryDataset (New in version v2.23): list of secondary datasets used by the configuration each element is a datarobot.helpers.feature_discovery.SecondaryDataset
name: str (New in version v2.23): Name of the secondary datasets configuration
featurelist_id: str, or None (New in version v2.23): Id of the featurelist

Returns

an instance of SecondaryDatasetConfigurations

Raises

ClientError: raised if incorrect configuration parameters are provided

Examples

profile_secondary_dataset = dr.SecondaryDataset(
    identifier='profile',
    catalog_id='5ec4aec1f072bc028e3471ae',
    catalog_version_id='5ec4aec2f072bc028e3471b1',
    snapshot_policy='latest'
)

transaction_secondary_dataset = dr.SecondaryDataset(
    identifier='transaction',
    catalog_id='5ec4aec268f0f30289a03901',
    catalog_version_id='5ec4aec268f0f30289a03900',
    snapshot_policy='latest'
)

secondary_datasets = [profile_secondary_dataset, transaction_secondary_dataset]
new_secondary_dataset_config = dr.SecondaryDatasetConfigurations.create(
    project_id=project.id,
    name='My config',
    secondary_datasets=secondary_datasets
)

>>> new_secondary_dataset_config.id
'5fd1e86c589238a4e635e93d'

Return type: SecondaryDatasetConfigurations

delete()¶

Removes the Secondary datasets configuration

New in version v2.21.

Raises

ClientError: Raised if an invalid or already deleted secondary dataset config id is provided

Examples

# Deleting with a valid secondary_dataset_config id
status_code = dr.SecondaryDatasetConfigurations.delete(some_config_id)
status_code
>>> 204

Return type: None

get()¶

Retrieve a single secondary dataset configuration for a given id

New in version v2.21.

Returns

secondary_dataset_configurationsSecondaryDatasetConfigurations: The requested secondary dataset configurations

Examples

config_id = '5fd1e86c589238a4e635e93d'
secondary_dataset_config = dr.SecondaryDatasetConfigurations(id=config_id).get()
>>> secondary_dataset_config
{
     'created': datetime.datetime(2020, 12, 9, 6, 16, 22, tzinfo=tzutc()),
     'creator_full_name': u'[email protected]',
     'creator_user_id': u'asdf4af1gf4bdsd2fba1de0a',
     'credential_ids': None,
     'featurelist_id': None,
     'id': u'5fd1e86c589238a4e635e93d',
     'is_default': True,
     'name': u'My config',
     'project_id': u'5fd06afce2456ec1e9d20457',
     'project_version': None,
     'secondary_datasets': [
            {
                'snapshot_policy': u'latest',
                'identifier': u'profile',
                'catalog_version_id': u'5fd06b4af24c641b68e4d88f',
                'catalog_id': u'5fd06b4af24c641b68e4d88e'
            },
            {
                'snapshot_policy': u'dynamic',
                'identifier': u'transaction',
                'catalog_version_id': u'5fd1e86c589238a4e635e98e',
                'catalog_id': u'5fd1e86c589238a4e635e98d'
            }
     ]
}

Return type: SecondaryDatasetConfigurations

classmethod list(project_id, featurelist_id=None, limit=None, offset=None)¶

Returns list of secondary dataset configurations.

New in version v2.23.

Parameters

project_id: str: The Id of project
featurelist_id: str, optional: Id of the feature list to filter the secondary datasets configurations

Returns

secondary_dataset_configurationslist of SecondaryDatasetConfigurations: The requested list of secondary dataset configurations for a given project

Examples

pid = '5fd06afce2456ec1e9d20457'
secondary_dataset_configs = dr.SecondaryDatasetConfigurations.list(pid)
>>> secondary_dataset_configs[0]
    {
         'created': datetime.datetime(2020, 12, 9, 6, 16, 22, tzinfo=tzutc()),
         'creator_full_name': u'[email protected]',
         'creator_user_id': u'asdf4af1gf4bdsd2fba1de0a',
         'credential_ids': None,
         'featurelist_id': None,
         'id': u'5fd1e86c589238a4e635e93d',
         'is_default': True,
         'name': u'My config',
         'project_id': u'5fd06afce2456ec1e9d20457',
         'project_version': None,
         'secondary_datasets': [
                {
                    'snapshot_policy': u'latest',
                    'identifier': u'profile',
                    'catalog_version_id': u'5fd06b4af24c641b68e4d88f',
                    'catalog_id': u'5fd06b4af24c641b68e4d88e'
                },
                {
                    'snapshot_policy': u'dynamic',
                    'identifier': u'transaction',
                    'catalog_version_id': u'5fd1e86c589238a4e635e98e',
                    'catalog_id': u'5fd1e86c589238a4e635e98d'
                }
         ]
    }

Return type: List[SecondaryDatasetConfigurations]

Secondary Dataset¶

class datarobot.helpers.feature_discovery.SecondaryDataset(identifier, catalog_id, catalog_version_id, snapshot_policy='latest')¶

A secondary dataset to be used for feature discovery

New in version v2.25.

Examples

import datarobot as dr
dataset_definition = dr.SecondaryDataset(
    identifier='profile',
    catalog_id='5ec4aec1f072bc028e3471ae',
    catalog_version_id='5ec4aec2f072bc028e3471b1',
)

Attributes

identifier: string: Alias of the dataset (used directly as part of the generated feature names)
catalog_id: string: Identifier of the catalog item
catalog_version_id: string: Identifier of the catalog item version
snapshot_policy: string, optional: Policy to use while creating a project or making predictions. If omitted, by default endpoint will use ‘latest’. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)

Feature Effects¶

class datarobot.models.FeatureEffects(project_id, model_id, source, feature_effects, data_slice_id=None, backtest_index=None)¶

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Notes

featureEffects is a dict containing the following:

feature_name (string) Name of the feature

feature_type (string) dr.enums.FEATURE_TYPE, Feature type either numeric, categorical or datetime

feature_impact_score (float) Feature impact score

weight_label (string) optional, Weight label if configured for the project else null

partial_dependence (List) Partial dependence results

predicted_vs_actual (List) optional, Predicted versus actual results, may be omitted if there are insufficient qualified samples

partial_dependence is a dict containing the following:

is_capped (bool) Indicates whether the data for computation is capped
data (List) partial dependence results in the following format

data is a list of dict containing the following:

label (string) Contains label for categorical and numeric features as string
dependence (float) Value of partial dependence

predicted_vs_actual is a dict containing the following:

is_capped (bool) Indicates whether the data for computation is capped
data (List) pred vs actual results in the following format

data is a list of dict containing the following:

label (string) Contains label for categorical features for numeric features contains range or numeric value.
bin (List) optional, For numeric features contains labels for left and right bin limits
predicted (float) Predicted value
actual (float) Actual value. Actual value is null for unsupervised timeseries models
row_count (int or float) Number of rows for the label and bin. Type is float if weight or exposure is set for the project.

Attributes

project_id: string: The project that contains requested model
model_id: string: The model to retrieve Feature Effects for
source: string: The source to retrieve Feature Effects for
data_slice_id: string or None: The slice to retrieve Feature Effects for; if None, retrieve unsliced data
feature_effects: list: Feature Effects for every feature
backtest_index: string, required only for DatetimeModels,: The backtest index to retrieve Feature Effects for.

classmethod from_server_data(data, *args, use_insights_format=False, **kwargs)¶

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing.

Parameters

datadict: The directly translated dict of JSON from the server. No casing fixes have taken place
use_insights_formatbool, optional: Whether to repack the data from the format used in the GET /insights/featureEffects/ URL to the format used in the legacy URL.

class datarobot.models.FeatureEffectMetadata(status, sources)¶

Feature Effect Metadata for model, contains status and available model sources.

Notes

source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.

class datarobot.models.FeatureEffectMetadataDatetime(data)¶

Feature Effect Metadata for datetime model, contains list of feature effect metadata per backtest.

Notes

feature effect metadata per backtest contains:

status : string.
backtest_index : string.
sources : list(string).

source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.

backtest_index is expected parameter to submit compute request and retrieve Feature Effect. One of provided backtest indexes shall be used.

Attributes

datalist[FeatureEffectMetadataDatetimePerBacktest]: List feature effect metadata per backtest

class datarobot.models.FeatureEffectMetadataDatetimePerBacktest(ff_metadata_datetime_per_backtest)¶: Convert dictionary into feature effect metadata per backtest which contains backtest_index, status and sources.

Feature List¶

class datarobot.DatasetFeaturelist(id=None, name=None, features=None, dataset_id=None, dataset_version_id=None, creation_date=None, created_by=None, user_created=None, description=None)¶

A set of features attached to a dataset in the AI Catalog

Attributes

idstr: the id of the dataset featurelist
dataset_idstr: the id of the dataset the featurelist belongs to
dataset_version_id: str, optional: the version id of the dataset this featurelist belongs to
namestr: the name of the dataset featurelist
featureslist of str: a list of the names of features included in this dataset featurelist
creation_datedatetime.datetime: when the featurelist was created
created_bystr: the user name of the user who created this featurelist
user_createdbool: whether the featurelist was created by a user or by DataRobot automation
descriptionstr, optional: the description of the featurelist. Only present on DataRobot-created featurelists.

classmethod get(dataset_id, featurelist_id)¶

Retrieve a dataset featurelist

Parameters

dataset_idstr: the id of the dataset the featurelist belongs to
featurelist_idstr: the id of the dataset featurelist to retrieve

Returns

featurelistDatasetFeatureList: the specified featurelist

Return type: TypeVar(TDatasetFeaturelist, bound= DatasetFeaturelist)

delete()¶

Delete a dataset featurelist

Featurelists configured into the dataset as a default featurelist cannot be deleted.

Return type: None

update(name=None)¶

Update the name of an existing featurelist

Note that only user-created featurelists can be renamed, and that names must not conflict with names used by other featurelists.

Parameters

namestr, optional: the new name for the featurelist

Return type: None

class datarobot.models.Featurelist(id=None, name=None, features=None, project_id=None, created=None, is_user_created=None, num_models=None, description=None)¶

A set of features used in modeling

Attributes

idstr: the id of the featurelist
namestr: the name of the featurelist
featureslist of str: the names of all the Features in the featurelist
project_idstr: the project the featurelist belongs to
createddatetime.datetime: (New in version v2.13) when the featurelist was created
is_user_createdbool: (New in version v2.13) whether the featurelist was created by a user or by DataRobot automation
num_modelsint: (New in version v2.13) the number of models currently using this featurelist. A model is considered to use a featurelist if it is used to train the model or as a monotonic constraint featurelist, or if the model is a blender with at least one component model using the featurelist.
descriptionstr: (New in version v2.13) the description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.

classmethod from_data(data)¶

Overrides the parent method to ensure description is always populated

Parameters

datadict: the data from the server, having gone through processing

Return type: TypeVar(TFeaturelist, bound= Featurelist)

classmethod get(project_id, featurelist_id)¶

Retrieve a known feature list

Parameters

project_idstr: The id of the project the featurelist is associated with
featurelist_idstr: The ID of the featurelist to retrieve

Returns

featurelistFeaturelist: The queried instance

Raises

ValueError: passed project_id parameter value is of not supported type

Return type: TypeVar(TFeaturelist, bound= Featurelist)

delete(dry_run=False, delete_dependencies=False)¶

Delete a featurelist, and any models and jobs using it

All models using a featurelist, whether as the training featurelist or as a monotonic constraint featurelist, will also be deleted when the deletion is executed and any queued or running jobs using it will be cancelled. Similarly, predictions made on these models will also be deleted. All the entities that are to be deleted with a featurelist are described as “dependencies” of it. To preview the results of deleting a featurelist, call delete with dry_run=True

When deleting a featurelist with dependencies, users must specify delete_dependencies=True to confirm they want to delete the featurelist and all its dependencies. Without that option, only featurelists with no dependencies may be successfully deleted and others will error.

Featurelists configured into the project as a default featurelist or as a default monotonic constraint featurelist cannot be deleted.

Featurelists used in a model deployment cannot be deleted until the model deployment is deleted.

Parameters

dry_runbool, optional: specify True to preview the result of deleting the featurelist, instead of actually deleting it.
delete_dependenciesbool, optional: specify True to successfully delete featurelists with dependencies; if left False by default, featurelists without dependencies can be successfully deleted and those with dependencies will error upon attempting to delete them.

Returns

resultdict

A dictionary describing the result of deleting the featurelist, with the following keys

dry_run : bool, whether the deletion was a dry run or an actual deletion
can_delete : bool, whether the featurelist can actually be deleted
deletion_blocked_reason : str, why the featurelist can’t be deleted (if it can’t)
num_affected_models : int, the number of models using this featurelist
num_affected_jobs : int, the number of jobs using this featurelist

Return type: DeleteFeatureListResult

classmethod from_server_data(data, keep_attrs=None)¶

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters

datadict: The directly translated dict of JSON from the server. No casing fixes have taken place
keep_attrsiterable: List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

Return type: TypeVar(T, bound= APIObject)

update(name=None, description=None)¶

Update the name or description of an existing featurelist

Note that only user-created featurelists can be renamed, and that names must not conflict with names used by other featurelists.

Parameters

namestr, optional: the new name for the featurelist
descriptionstr, optional: the new description for the featurelist

Return type: None

class datarobot.models.ModelingFeaturelist(id=None, name=None, features=None, project_id=None, created=None, is_user_created=None, num_models=None, description=None)¶

A set of features that can be used to build a model

In time series projects, a new set of modeling features is created after setting the partitioning options. These features are automatically derived from those in the project’s dataset and are the features used for modeling. Modeling features are only accessible once the target and partitioning options have been set. In projects that don’t use time series modeling, once the target has been set, ModelingFeaturelists and Featurelists will behave the same.

For more information about input and modeling features, see the time series documentation.

Attributes

idstr: the id of the modeling featurelist
project_idstr: the id of the project the modeling featurelist belongs to
namestr: the name of the modeling featurelist
featureslist of str: a list of the names of features included in this modeling featurelist
createddatetime.datetime: (New in version v2.13) when the featurelist was created
is_user_createdbool: (New in version v2.13) whether the featurelist was created by a user or by DataRobot automation
num_modelsint: (New in version v2.13) the number of models currently using this featurelist. A model is considered to use a featurelist if it is used to train the model or as a monotonic constraint featurelist, or if the model is a blender with at least one component model using the featurelist.
descriptionstr: (New in version v2.13) the description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.

classmethod get(project_id, featurelist_id)¶

Retrieve a modeling featurelist

Modeling featurelists can only be retrieved once the target and partitioning options have been set.

Parameters

project_idstr: the id of the project the modeling featurelist belongs to
featurelist_idstr: the id of the modeling featurelist to retrieve

Returns

featurelistModelingFeaturelist: the specified featurelist

Return type: TypeVar(TModelingFeaturelist, bound= ModelingFeaturelist)

update(name=None, description=None)¶

Update the name or description of an existing featurelist

Note that only user-created featurelists can be renamed, and that names must not conflict with names used by other featurelists.

Parameters

namestr, optional: the new name for the featurelist
descriptionstr, optional: the new description for the featurelist

Return type: None

delete(dry_run=False, delete_dependencies=False)¶

Delete a featurelist, and any models and jobs using it

All models using a featurelist, whether as the training featurelist or as a monotonic constraint featurelist, will also be deleted when the deletion is executed and any queued or running jobs using it will be cancelled. Similarly, predictions made on these models will also be deleted. All the entities that are to be deleted with a featurelist are described as “dependencies” of it. To preview the results of deleting a featurelist, call delete with dry_run=True

When deleting a featurelist with dependencies, users must specify delete_dependencies=True to confirm they want to delete the featurelist and all its dependencies. Without that option, only featurelists with no dependencies may be successfully deleted and others will error.

Featurelists configured into the project as a default featurelist or as a default monotonic constraint featurelist cannot be deleted.

Featurelists used in a model deployment cannot be deleted until the model deployment is deleted.

Parameters

dry_runbool, optional: specify True to preview the result of deleting the featurelist, instead of actually deleting it.
delete_dependenciesbool, optional: specify True to successfully delete featurelists with dependencies; if left False by default, featurelists without dependencies can be successfully deleted and those with dependencies will error upon attempting to delete them.

Returns

resultdict

A dictionary describing the result of deleting the featurelist, with the following keys

dry_run : bool, whether the deletion was a dry run or an actual deletion
can_delete : bool, whether the featurelist can actually be deleted
deletion_blocked_reason : str, why the featurelist can’t be deleted (if it can’t)
num_affected_models : int, the number of models using this featurelist
num_affected_jobs : int, the number of jobs using this featurelist

Return type: DeleteFeatureListResult

class datarobot.models.featurelist.DeleteFeatureListResult() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶

Restoring Discarded Features¶

class datarobot.models.restore_discarded_features.DiscardedFeaturesInfo(total_restore_limit, remaining_restore_limit, count, features)¶

An object containing information about time series features which were reduced during time series feature generation process. These features can be restored back to the project. They will be included into All Time Series Features and can be used to create new feature lists.

New in version v2.27.

Attributes

total_restore_limitint: The total limit indicating how many features can be restored in this project.
remaining_restore_limitint: The remaining available number of the features which can be restored in this project.
featureslist of strings: Discarded features which can be restored.
countint: Discarded features count.

classmethod restore(project_id, features_to_restore, max_wait=600)¶

Restore discarded during time series feature generation process features back to the project. After restoration features will be included into All Time Series Features.

New in version v2.27.

Parameters

project_id: string
features_to_restore: list of strings: List of the feature names to restore
max_wait: int, optional: max time to wait for features to be restored. Defaults to 10 min

Returns

status: FeatureRestorationStatus: information about features which were restored and which were not.

Return type: FeatureRestorationStatus

classmethod retrieve(project_id)¶

Retrieve the discarded features information for a given project.

New in version v2.27.

Parameters

project_id: string

Returns

info: DiscardedFeaturesInfo: information about features which were discarded during feature generation process and limits how many features can be restored.

Return type: DiscardedFeaturesInfo

class datarobot.models.restore_discarded_features.FeatureRestorationStatus(warnings, features_to_restore)¶

Status of the feature restoration process.

New in version v2.27.

Attributes

warningslist of strings: Warnings generated for those features which failed to restore
remaining_restore_limitint: The remaining available number of the features which can be restored in this project.
restored_featureslist of strings: Features which were restored

Job¶

class datarobot.models.Job(data, completed_resource_url=None)¶

Tracks asynchronous work being done within a project

Attributes

idint: the id of the job
project_idstr: the id of the project the job belongs to
statusstr: the status of the job - will be one of datarobot.enums.QUEUE_STATUS
job_typestr: what kind of work the job is doing - will be one of datarobot.enums.JOB_TYPE
is_blockedbool: if true, the job is blocked (cannot be executed) until its dependencies are resolved

classmethod get(project_id, job_id)¶

Fetches one job.

Parameters

project_idstr: The identifier of the project in which the job resides
job_idstr: The job id

Returns

jobJob: The job

Raises

AsyncFailureError: Querying this resource gave a status code other than 200 or 303

Return type: Job

cancel()¶: Cancel this job. If this job has not finished running, it will be removed and canceled.

get_result(params=None)¶

Parameters

paramsdict or None: Query parameters to be added to request to get results.
For featureEffects, source param is required to define source,
otherwise the default is `training`

Returns

resultobject

Return type depends on the job type:

for model jobs, a Model is returned
for predict jobs, a pandas.DataFrame (with predictions) is returned
for featureImpact jobs, a list of dicts by default (see with_metadata parameter of the FeatureImpactJob class and its get() method).
for primeRulesets jobs, a list of Rulesets
for primeModel jobs, a PrimeModel
for primeDownloadValidation jobs, a PrimeFile
for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
for predictionExplanations jobs, a PredictionExplanations
for featureEffects, a FeatureEffects

Raises

JobNotFinished: If the job is not finished, the result is not available.
AsyncProcessUnsuccessfulError: If the job errored or was aborted

get_result_when_complete(max_wait=600, params=None)¶

Parameters

max_waitint, optional: How long to wait for the job to finish.
paramsdict, optional: Query parameters to be added to request.

Returns

result: object: Return type is the same as would be returned by Job.get_result.

Raises

AsyncTimeoutError: If the job does not finish in time
AsyncProcessUnsuccessfulError: If the job errored or was aborted

refresh()¶: Update this object with the latest job data from the server.

wait_for_completion(max_wait=600)¶

Waits for job to complete.

Parameters

max_waitint, optional: How long to wait for the job to finish.

Return type: None

class datarobot.models.TrainingPredictionsJob(data, model_id, data_subset, **kwargs)¶

classmethod get(project_id, job_id, model_id=None, data_subset=None)¶

Fetches one training predictions job.

The resulting TrainingPredictions object will be annotated with model_id and data_subset.

Parameters

project_idstr: The identifier of the project in which the job resides
job_idstr: The job id
model_idstr: The identifier of the model used for computing training predictions
data_subsetdr.enums.DATA_SUBSET, optional: Data subset used for computing training predictions

Returns

jobTrainingPredictionsJob: The job

refresh()¶: Update this object with the latest job data from the server.

cancel()¶: Cancel this job. If this job has not finished running, it will be removed and canceled.

get_result(params=None)¶

Parameters

paramsdict or None: Query parameters to be added to request to get results.
For featureEffects, source param is required to define source,
otherwise the default is `training`

Returns

resultobject

Return type depends on the job type:

for model jobs, a Model is returned
for predict jobs, a pandas.DataFrame (with predictions) is returned
for featureImpact jobs, a list of dicts by default (see with_metadata parameter of the FeatureImpactJob class and its get() method).
for primeRulesets jobs, a list of Rulesets
for primeModel jobs, a PrimeModel
for primeDownloadValidation jobs, a PrimeFile
for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
for predictionExplanations jobs, a PredictionExplanations
for featureEffects, a FeatureEffects

Raises

JobNotFinished: If the job is not finished, the result is not available.
AsyncProcessUnsuccessfulError: If the job errored or was aborted

get_result_when_complete(max_wait=600, params=None)¶

Parameters

max_waitint, optional: How long to wait for the job to finish.
paramsdict, optional: Query parameters to be added to request.

Returns

result: object: Return type is the same as would be returned by Job.get_result.

Raises

AsyncTimeoutError: If the job does not finish in time
AsyncProcessUnsuccessfulError: If the job errored or was aborted

wait_for_completion(max_wait=600)¶

Waits for job to complete.

Parameters

max_waitint, optional: How long to wait for the job to finish.

Return type: None

class datarobot.models.ShapMatrixJob(data, model_id=None, dataset_id=None, **kwargs)¶

classmethod get(project_id, job_id, model_id=None, dataset_id=None)¶

Fetches one SHAP matrix job.

Parameters

project_idstr: The identifier of the project in which the job resides
job_idstr: The job identifier
model_idstr: The identifier of the model used for computing prediction explanations
dataset_idstr: The identifier of the dataset against which prediction explanations should be computed

Returns

jobShapMatrixJob: The job

Raises

AsyncFailureError: Querying this resource gave a status code other than 200 or 303

Return type: ShapMatrixJob

refresh()¶

Update this object with the latest job data from the server.

Return type: None

cancel()¶: Cancel this job. If this job has not finished running, it will be removed and canceled.

get_result(params=None)¶

Parameters

paramsdict or None: Query parameters to be added to request to get results.
For featureEffects, source param is required to define source,
otherwise the default is `training`

Returns

resultobject

Return type depends on the job type:

for model jobs, a Model is returned
for predict jobs, a pandas.DataFrame (with predictions) is returned
for featureImpact jobs, a list of dicts by default (see with_metadata parameter of the FeatureImpactJob class and its get() method).
for primeRulesets jobs, a list of Rulesets
for primeModel jobs, a PrimeModel
for primeDownloadValidation jobs, a PrimeFile
for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
for predictionExplanations jobs, a PredictionExplanations
for featureEffects, a FeatureEffects

Raises

JobNotFinished: If the job is not finished, the result is not available.
AsyncProcessUnsuccessfulError: If the job errored or was aborted

get_result_when_complete(max_wait=600, params=None)¶

Parameters

max_waitint, optional: How long to wait for the job to finish.
paramsdict, optional: Query parameters to be added to request.

Returns

result: object: Return type is the same as would be returned by Job.get_result.

Raises

AsyncTimeoutError: If the job does not finish in time
AsyncProcessUnsuccessfulError: If the job errored or was aborted

wait_for_completion(max_wait=600)¶

Waits for job to complete.

Parameters

max_waitint, optional: How long to wait for the job to finish.

Return type: None

class datarobot.models.FeatureImpactJob(data, completed_resource_url=None, with_metadata=False)¶

Custom Feature Impact job to handle different return value structures.

The original implementation had just the the data and the new one also includes some metadata.

In general, we aim to keep the number of Job classes low by just utilizing the job_type attribute to control any specific formatting; however in this case when we needed to support a new representation with the _same_ job_type, customizing the behavior of _make_result_from_location allowed us to achieve our ends without complicating the _make_result_from_json method.

classmethod get(project_id, job_id, with_metadata=False)¶

Fetches one job.

Parameters

project_idstr: The identifier of the project in which the job resides
job_idstr: The job id
with_metadatabool: To make this job return the metadata (i.e. the full object of the completed resource) set the with_metadata flag to True.

Returns

jobJob: The job

Raises

AsyncFailureError: Querying this resource gave a status code other than 200 or 303

cancel()¶: Cancel this job. If this job has not finished running, it will be removed and canceled.

get_result(params=None)¶

Parameters

paramsdict or None: Query parameters to be added to request to get results.
For featureEffects, source param is required to define source,
otherwise the default is `training`

Returns

resultobject

Return type depends on the job type:

for model jobs, a Model is returned
for predict jobs, a pandas.DataFrame (with predictions) is returned
for featureImpact jobs, a list of dicts by default (see with_metadata parameter of the FeatureImpactJob class and its get() method).
for primeRulesets jobs, a list of Rulesets
for primeModel jobs, a PrimeModel
for primeDownloadValidation jobs, a PrimeFile
for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
for predictionExplanations jobs, a PredictionExplanations
for featureEffects, a FeatureEffects

Raises

JobNotFinished: If the job is not finished, the result is not available.
AsyncProcessUnsuccessfulError: If the job errored or was aborted

get_result_when_complete(max_wait=600, params=None)¶

Parameters

max_waitint, optional: How long to wait for the job to finish.
paramsdict, optional: Query parameters to be added to request.

Returns

result: object: Return type is the same as would be returned by Job.get_result.

Raises

AsyncTimeoutError: If the job does not finish in time
AsyncProcessUnsuccessfulError: If the job errored or was aborted

refresh()¶: Update this object with the latest job data from the server.

wait_for_completion(max_wait=600)¶

Waits for job to complete.

Parameters

max_waitint, optional: How long to wait for the job to finish.

Return type: None

Lift Chart¶

class datarobot.models.lift_chart.LiftChart(source, bins, source_model_id, target_class, data_slice_id=None)¶

Lift chart data for model.

Notes

LiftChartBin is a dict containing the following:

actual (float) Sum of actual target values in bin

predicted (float) Sum of predicted target values in bin

bin_weight (float) The weight of the bin. For weighted projects, it is the sum of the weights of the rows in the bin. For unweighted projects, it is the number of rows in the bin.

Attributes

sourcestr: Lift chart data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.
binslist of dict: List of dicts with schema described as LiftChartBin above.
source_model_idstr: ID of the model this lift chart represents; in some cases, insights from the parent of a frozen model may be used
target_classstr, optional: For multiclass lift - target class for this lift chart data.
data_slice_id: string or None: The slice to retrieve Lift Chart for; if None, retrieve unsliced data.

classmethod from_server_data(data, keep_attrs=None, use_insights_format=False, **kwargs)¶

Overwrite APIObject.from_server_data to handle lift chart data retrieved from either legacy URL or /insights/ new URL.

Parameters

datadict: The directly translated dict of JSON from the server. No casing fixes have taken place
use_insights_formatbool, optional: Whether to repack the data from the format used in the GET /insights/liftChart/ URL to the format used in the legacy URL.

Missing Values Report¶

class datarobot.models.missing_report.MissingValuesReport(missing_values_report)¶

Missing values report for model, contains list of reports per feature sorted by missing count in descending order.

Notes

Report per feature contains:

feature : feature name.
type : feature type – ‘Numeric’ or ‘Categorical’.
missing_count : missing values count in training data.
missing_percentage : missing values percentage in training data.
tasks : list of information per each task, which was applied to feature.

task information contains:

id : a number of task in the blueprint diagram.
name : task name.
descriptions : human readable aggregated information about how the task handles missing values. The following descriptions may be present: what value is imputed for missing values, whether the feature being missing is treated as a feature by the task, whether missing values are treated as infrequent values, whether infrequent values are treated as missing values, and whether missing values are ignored.

classmethod get(project_id, model_id)¶

Retrieve a missing report.

Parameters

project_idstr: The project’s id.
model_idstr: The model’s id.

Returns

MissingValuesReport: The queried missing report.

Return type: MissingValuesReport

Models¶

Model¶

class datarobot.models.Model(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, is_n_clusters_dynamically_determined=None, blueprint_id=None, metrics=None, project=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, n_clusters=None, has_empty_clusters=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, use_project_settings=None, supports_composable_ml=None)¶

A model trained on a project’s dataset capable of making predictions

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Attributes

idstr: the id of the model
project_idstr: the id of the project the model belongs to
processeslist of str: the processes used by the model
featurelist_namestr: the name of the featurelist used by the model
featurelist_idstr: the id of the featurelist used by the model
sample_pctfloat or None: the percentage of the project dataset used in training the model. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date and training_end_date instead.
training_row_countint or None: the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
training_durationstr or None: only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
training_start_datedatetime or None: only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
training_end_datedatetime or None: only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
model_typestr: what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
model_categorystr: what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
is_frozenbool: whether this model is a frozen model
is_n_clusters_dynamically_determinedbool: (New in version v2.27) optional, if this model determines number of clusters dynamically
blueprint_idstr: the id of the blueprint used in this model
metricsdict: a mapping from each metric to the model’s scores for that metric
monotonic_increasing_featurelist_idstr: optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
monotonic_decreasing_featurelist_idstr: optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
n_clustersint: (New in version v2.27) optional, number of data clusters discovered by model
has_empty_clusters: bool: (New in version v2.27) optional, whether clustering models produces empty clusters.
supports_monotonic_constraintsbool: optional, whether this model supports enforcing monotonic constraints
is_starredbool: whether this model marked as starred
prediction_thresholdfloat: for binary classification projects, the threshold used for predictions
prediction_threshold_read_onlybool: indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
model_numberinteger: model number assigned to a model
parent_model_idstr or None: (New in version v2.20) the id of the model that tuning parameters are derived from
use_project_settingsbool or None: (New in version v2.20) Only present for models in datetime-partitioned projects. If True, indicates that the custom backtest partitioning settings specified by the user were used to train the model and evaluate backtest scores.
supports_composable_mlbool or None: (New in version v2.26) whether this model is supported in the Composable ML.

classmethod get(project, model_id)¶

Retrieve a specific model.

Parameters

projectstr: The project’s id.
model_idstr: The model_id of the leaderboard item to retrieve.

Returns

modelModel: The queried instance.

Raises

ValueError: passed project parameter value is of not supported type

Return type: Model

get_features_used()¶

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns

featureslist of str: The names of the features used in the model.

Return type: List[str]

get_supported_capabilities()¶

Retrieves a summary of the capabilities supported by a model.

New in version v2.14.

Returns

supportsBlending: bool

whether the model supports blending

supportsMonotonicConstraints: bool

whether the model supports monotonic constraints

hasWordCloud: bool

whether the model has word cloud data available

eligibleForPrime: bool

whether the model is eligible for Prime

hasParameters: bool

whether the model has parameters that can be retrieved

supportsCodeGeneration: bool

(New in version v2.18) whether the model supports code generation

supportsShap: bool

(New in version v2.18) True if the model supports Shapley package. i.e. Shapley based: feature Importance

supportsEarlyStopping: bool

(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_num_iterations_trained()¶

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns

projectId: str: id of project containing the model
modelId: str: id of the model
data: array: list of numEstimatorsItem objects, one for each modeling stage.
numEstimatorsItem will be of the form:
stage: str: indicates the modeling stage (for multi-stage models); None of single-stage models
numIterations: int: the number of estimators or iterations trained by the model

delete()¶

Delete a model from the project’s leaderboard.

Return type: None

get_uri()¶

Returns

urlstr: Permanent static hyperlink to this model at leaderboard.

Return type: str

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)¶

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Note

For datetime partitioned projects, see train_datetime instead.

Parameters

sample_pctfloat, optional: The amount of data to use for training, as a percentage of the project dataset from 0 to 100.
featurelist_idstr, optional: The identifier of the featurelist to use. If not defined, the featurelist of this model is used.
scoring_typestr, optional: Either validation or crossValidation (also dr.SCORING_TYPE.validation or dr.SCORING_TYPE.cross_validation). validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, crossValidation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.
training_row_countint, optional: The number of rows to use to train the requested model.
monotonic_increasing_featurelist_idstr: (new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
monotonic_decreasing_featurelist_idstr: (new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns

model_job_idstr: id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Examples

project = Project.get('project-id')
model = Model.get('project-id', 'model-id')
model_job_id = model.train(training_row_count=project.max_train_rows)

Return type: str

train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)¶

Trains this model on a different featurelist or sample size.

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters

featurelist_idstr, optional: the featurelist to use to train the model. If not specified, the featurelist of this model is used.
training_row_countint, optional: the number of rows of data that should be used to train the model. If specified, neither training_duration nor use_project_settings may be specified.
training_durationstr, optional: a duration string specifying what time range the data used to train the model should span. If specified, neither training_row_count nor use_project_settings may be specified.
use_project_settingsbool, optional: (New in version v2.20) defaults to False. If True, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neither training_row_count nor training_duration may be specified.
time_window_sample_pctint, optional: may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
sampling_methodstr, optional: (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.
monotonic_increasing_featurelist_idstr, optional: (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
monotonic_decreasing_featurelist_idstr, optional: (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
n_clusters: int, optional: (New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.

Returns

jobModelJob: the created job to build the model

Return type: ModelJob

retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)¶

Submit a job to the queue to train a blender model.

Parameters

sample_pct: float, optional: The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
featurelist_idstr, optional: The featurelist id
training_row_countint, optional: The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.
n_clusters: int, optional: (new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.

Returns

jobModelJob: The created job that is retraining the model

Return type: ModelJob

incremental_train(data_stage_id, training_data_name=None)¶

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters

data_stage_id: str: The id of the data stage to use for training.
training_data_namestr, optional: The name of the iteration or data stage to indicate what the incremental learning was performed on.

Returns

jobModelJob: The created job that is retraining the model

Return type: ModelJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)¶

Requests predictions against a previously uploaded dataset.

Parameters

dataset_idstring, optional: The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
datasetDataset, optional: The dataset to make predictions against (as uploaded from Project.upload_dataset)
dataframepd.DataFrame, optional: (New in v3.0) The dataframe to make predictions against
file_pathstr, optional: (New in v3.0) Path to file to make predictions against
fileIOBase, optional: (New in v3.0) File to make predictions against
include_prediction_intervalsbool, optional: (New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
prediction_intervals_sizeint, optional: (New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
forecast_pointdatetime.datetime or None, optional: (New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
predictions_start_datedatetime.datetime or None, optional: (New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.
predictions_end_datedatetime.datetime or None, optional: (New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.
actual_value_columnstring, optional: (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the: response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
max_explanations: (New in version v2.21) int optional; specifies the maximum number of: explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
max_ngram_explanations: optional; int or str: (New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.

Returns

jobPredictJob: The job computing the predictions

Return type: PredictJob

get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters

with_metadatabool: The flag indicating if the result should include the metadata as well.
data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.

Returns

list or dict

The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:

featureImpacts - Feature Impact data as a dictionary. Each item is a dict with
keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.

shapBased - A boolean that indicates whether Feature Impact was calculated using
Shapley values.

ranRedundancyDetection - A boolean that indicates whether redundant feature
identification was run while calculating this Feature Impact.

rowCount - An integer or None that indicates the number of rows that was used to
calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.

count - An integer with the number of features under the featureImpacts.

Raises

ClientError (404): If the feature impacts have not been computed.
ValueError: If data_slice_filter passed as None

get_all_feature_impacts(data_slice_filter=None)¶

Retrieve a list of all feature impact results available for the model.

Parameters

data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.

Returns

list of dicts: Data for all available model feature impacts. Or an empty list if not data found.

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()

get_multiclass_feature_impact()¶

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns

feature_impactslist of dict: The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises

ClientError (404): If the multiclass feature impacts have not been computed.

request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)¶

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters

row_countint, optional: The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.
with_metadatabool, optional: Flag indicating whether the result should include the metadata. If true, metadata is included.
data_slice_idstr, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

jobJob or status_id: Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.

Raises

JobAlreadyRequested (422): If the feature impacts have already been requested.

request_external_test(dataset_id, actual_value_column=None)¶

Request external test to compute scores and insights on an external test dataset

Parameters

dataset_idstring: The dataset to make predictions against (as uploaded from Project.upload_dataset)
actual_value_columnstring, optional: (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
Returns
——-
jobJob: a Job representing external dataset insights computation

get_or_request_feature_impact(max_wait=600, **kwargs)¶

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters

max_waitint, optional: The maximum time to wait for a requested feature impact job to complete before erroring
**kwargs: Arbitrary keyword arguments passed to request_feature_impact.

Returns

feature_impactslist or dict: The feature impact data. See get_feature_impact for the exact schema.

get_feature_effect_metadata()¶

Retrieve Feature Effects metadata. Response contains status and available model sources.

Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for the project.

Use source to retrieve Feature Effects, selecting one of the provided sources.

Returns

feature_effect_metadata: FeatureEffectMetadata

request_feature_effect(row_count=None, data_slice_id=None)¶

Submit request to compute Feature Effects for the model.

See get_feature_effect for more information on the result of the job.

Parameters

row_countint: (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
data_slice_idstr, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

jobJob: A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.

Raises

JobAlreadyRequested (422): If the feature effect have already been requested.

request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)¶

Request Feature Effects computation for the multiclass model.

See get_feature_effect for more information on the result of the job.

Parameters

row_countint: The number of rows from dataset to use for Feature Impact calculation.
top_n_featuresint or None: Number of top features (ranked by feature impact) used to calculate Feature Effects.
featureslist or None: The list of features used to calculate Feature Effects.

Returns

jobJob: A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.

get_feature_effect(source, data_slice_id=None)¶

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters

sourcestring: The source Feature Effects are retrieved for.
data_slice_idstring, optional: ID for the data slice used in the request. If None, retrieve unsliced insight data.

Returns

feature_effectsFeatureEffects: The feature effects data.

Raises

ClientError (404): If the feature effects have not been computed or source is not valid value.

get_feature_effects_multiclass(source='training', class_=None)¶

Retrieve Feature Effects for the multiclass model.

Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters

sourcestr: The source Feature Effects are retrieved for.
class_str or None: The class name Feature Effects are retrieved for.

Returns

list: The list of multiclass feature effects.

Raises

ClientError (404): If Feature Effects have not been computed or source is not valid value.

get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)¶

Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.

Parameters

sourcestring: The source Feature Effects retrieve for.
class_str or None: The class name Feature Effects retrieve for.
row_countint: The number of rows from dataset to use for Feature Impact calculation.
top_n_featuresint or None: Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
featureslist or None: The list of features used to calculate Feature Effects.
max_waitint, optional: The maximum time to wait for a requested Feature Effects job to complete before erroring.

Returns

feature_effectslist of FeatureEffectsMulticlass: The list of multiclass feature effects data.

get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)¶

Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source.

Parameters

sourcestring: The source Feature Effects are retrieved for.
max_waitint, optional: The maximum time to wait for a requested Feature Effect job to complete before erroring.
row_countint, optional: (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
data_slice_idstr, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

feature_effectsFeatureEffects: The Feature Effects data.

get_prime_eligibility()¶

Check if this model can be approximated with DataRobot Prime

Returns

prime_eligibilitydict: a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

request_approximation()¶

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns

jobJob: the job generating the rulesets

get_rulesets()¶

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns

rulesetslist of Ruleset

Return type: List[Ruleset]

download_export(filepath)¶

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters

filepathstr: The path at which to save the exported model file.

Return type: None

request_transferable_export(prediction_intervals_size=None)¶

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Parameters

prediction_intervals_sizeint, optional: (New in v2.19) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Prediction intervals size must be between 1 and 100 (inclusive).

Returns

Job

Examples

model = datarobot.Model.get('project-id', 'model-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')

Return type: Job

request_frozen_model(sample_pct=None, training_row_count=None)¶

Train a new frozen model with parameters from this model

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters

sample_pctfloat: optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.
training_row_countint: (New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns

model_jobModelJob: the modeling job training a frozen model

Return type: ModelJob

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)¶

Train a new frozen model with parameters from this model.

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters

training_row_countint, optional: the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
training_durationstr, optional: a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
training_start_datedatetime.datetime, optional: the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
training_end_datedatetime.datetime, optional: the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
time_window_sample_pctint, optional: may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
sampling_methodstr, optional: (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

Returns

model_jobModelJob: the modeling job training a frozen model

Return type: ModelJob

get_parameters()¶

Retrieve model parameters.

Returns

ModelParameters: Model parameters for this model.

request_lift_chart(source, data_slice_id=None)¶

Request the model Lift Chart for the specified source.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
data_slice_idstring, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

status_check_jobStatusCheckJob: Object contains all needed logic for a periodical status check of an async job.

Return type: StatusCheckJob

get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶

Retrieve the model Lift chart for the specified source.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

Returns

LiftChart: Model lift chart data

Raises

ClientError: If the insight is not available for this model
ValueError: If data_slice_filter passed as None

request_roc_curve(source, data_slice_id=None)¶

Request the model Roc Curve for the specified source.

Parameters

sourcestr: Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
data_slice_idstring, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

status_check_jobStatusCheckJob: Object contains all needed logic for a periodical status check of an async job.

Return type: StatusCheckJob

get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all Lift charts available for the model.

Parameters

fallback_to_parent_insightsbool, optional: (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
data_slice_filterDataSlice, optional: Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns

list of LiftChart: Data for all available model lift charts. Or an empty list if no data found.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()

get_multiclass_lift_chart(source, fallback_to_parent_insights=False)¶

Retrieve model Lift chart for the specified source.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns

list of LiftChart: Model lift chart data for each saved target class

Raises

ClientError: If the insight is not available for this model

get_all_multiclass_lift_charts(fallback_to_parent_insights=False)¶

Retrieve a list of all Lift charts available for the model.

Parameters

fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns

list of LiftChart: Data for all available model lift charts.

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)¶

Retrieve model Lift charts for the specified source.

New in version v2.24.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns

list of LiftChart: Model lift chart data for each saved target class

Raises

ClientError: If the insight is not available for this model

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶

Retrieve model residuals chart for the specified source.

Parameters

sourcestr: Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.

Returns

ResidualsChart: Model residuals chart data

Raises

ClientError: If the insight is not available for this model
ValueError: If data_slice_filter passed as None

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all residuals charts available for the model.

Parameters

fallback_to_parent_insightsbool: Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
data_slice_filterDataSlice, optional: Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns

list of ResidualsChart: Data for all available model residuals charts.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()

request_residuals_chart(source, data_slice_id=None)¶

Request the model residuals chart for the specified source.

Parameters

sourcestr: Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
data_slice_idstring, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

status_check_jobStatusCheckJob: Object contains all needed logic for a periodical status check of an async job.

Return type: StatusCheckJob

get_pareto_front()¶

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns

ParetoFront: Model ParetoFront data

get_confusion_chart(source, fallback_to_parent_insights=False)¶

Retrieve them model’s confusion matrix for the specified source.

Parameters

sourcestr: Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns

ConfusionChart: Model ConfusionChart data

Raises

ClientError: If the insight is not available for this model

get_all_confusion_charts(fallback_to_parent_insights=False)¶

Retrieve a list of all confusion matrices available for the model.

Parameters

fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns

list of ConfusionChart: Data for all available confusion charts for model.

get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶

Retrieve the ROC curve for a binary model for the specified source.

Parameters

sourcestr: ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns

RocCurve: Model ROC curve data

Raises

ClientError: If the insight is not available for this model
(New in version v3.0) TypeError: If the underlying project type is multilabel
ValueError: If data_slice_filter passed as None

get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all ROC curves available for the model.

Parameters

fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
data_slice_filterDataSlice, optional: filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns

list of RocCurve: Data for all available model ROC curves. Or an empty list if no RocCurves are found.

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()

get_labelwise_roc_curves(source, fallback_to_parent_insights=False)¶

Retrieve a list of LabelwiseRocCurve instances for a multilabel model the given source and all labels.

New in version v2.24.

Parameters

sourcestr: ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns

list ofclass:LabelwiseRocCurve <datarobot.models.roc_curve.LabelwiseRocCurve>: Labelwise ROC Curve instances for source and all labels

Raises

ClientError: If the insight is not available for this model
(New in version v3.0) TypeError: If the underlying project type is binary

get_word_cloud(exclude_stop_words=False)¶

Retrieve word cloud data for the model.

Parameters

exclude_stop_wordsbool, optional: Set to True if you want stopwords filtered out of response.

Returns

WordCloud: Word cloud data for the model.

download_scoring_code(file_name, source_code=False)¶

Download the Scoring Code JAR.

Parameters

file_namestr: File path where scoring code will be saved.
source_codebool, optional: Set to True to download source code archive. It will not be executable.

get_model_blueprint_json()¶

Get the blueprint json representation used by this model.

Returns

BlueprintJson: Json representation of the blueprint stages.

Return type: Dict[str, Tuple[List[str], List[str], str]]

get_model_blueprint_documents()¶

Get documentation for tasks used in this model.

Returns

list of BlueprintTaskDocument: All documents available for the model.

get_model_blueprint_chart()¶

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns

ModelBlueprintChart: The queried model blueprint chart.

get_missing_report_info()¶

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns

An iterable of MissingReportPerFeature: The queried model missing report, sorted by missing count (DESCENDING order).

get_frozen_child_models()¶

Retrieve the IDs for all models that are frozen from this model.

Returns

A list of Models

request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)¶

Start a job to build training predictions

Parameters

data_subsetstr

data set definition to build predictions on. Choices are:

dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
models in datetime partitioned projects

dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
all data except training set. Not valid for models in datetime partitioned projects

dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only

dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.

explanation_algorithmdr.enums.EXPLANATIONS_ALGORITHM

(New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).

max_explanationsint

(New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the max_explanations, the sum of remaining values will also be returned as shap_remaining_total. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored if explanation_algorithm is not set.

Returns

Job: an instance of created async job

cross_validate()¶

Run cross validation on the model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns

ModelJob: The created job to build the model

get_cross_validation_scores(partition=None, metric=None)¶

Return a dictionary, keyed by metric, showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters

partitionfloat: optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.
metric: unicode: optional name of the metric to filter to resulting cross validation scores by

Returns

cross_validation_scores: dict: A dictionary keyed by metric showing cross validation scores per partition.

advanced_tune(params, description=None)¶

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters

paramsdict: Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
descriptionstr: Human-readable string describing the newly advanced-tuned model

Returns

ModelJob: The created job to build the model

Return type: ModelJob

get_advanced_tuning_parameters()¶

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns

dict

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys

parameter_name : (str) name of the parameter (unique per task, see below)
parameter_id : (str) opaque ID string uniquely identifying parameter
default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)
current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.
task_name : (str) name of the task that this parameter belongs to
constraints: (dict) see the notes below
vertex_id: (str) ID of vertex that this parameter belongs to

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

Return type: AdvancedTuningParamsType

start_advanced_tuning_session()¶

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns

AdvancedTuningSession: Session for setting up and running Advanced Tuning on a model

star_model()¶

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

unstar_model()¶

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

set_prediction_threshold(threshold)¶

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters

thresholdfloat: only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

download_training_artifact(file_name)¶

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters

file_namestr: File path where trained model artifact(s) will be saved.

request_fairness_insights(fairness_metrics_set=None)¶

Request fairness insights to be computed for the model.

Parameters

fairness_metrics_setstr, optional: Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

Returns

status_idstr: A statusId of computation request.

get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)¶

Retrieve a list of Per Class Bias insights for the model.

Parameters

fairness_metrics_setstr, optional: Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
offsetint, optional: Number of items to skip.
limitint, optional: Number of items to return.

Returns

json

request_data_disparity_insights(feature, compared_class_names)¶

Request data disparity insights to be computed for the model.

Parameters

featurestr: Bias and Fairness protected feature name.
compared_class_nameslist(str): List of two classes to compare

Returns

status_idstr: A statusId of computation request.

get_data_disparity_insights(feature, class_name1, class_name2)¶

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters

featurestr: Bias and Fairness protected feature name.
class_name1str: One of the compared classes
class_name2str: Another compared class

Returns

json

request_cross_class_accuracy_scores()¶

Request data disparity insights to be computed for the model.

Returns

status_idstr: A statusId of computation request.

get_cross_class_accuracy_scores()¶

Retrieves a list of Cross Class Accuracy scores for the model.

Returns

json

classmethod from_data(data)¶

Instantiate an object of this class using a dict.

Parameters

datadict: Correctly snake_cased keys and their values.

Return type: TypeVar(T, bound= APIObject)

open_in_browser()¶

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type: None

class datarobot.models.model.AdvancedTuningParamsType() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶

class datarobot.models.model.BiasMitigationFeatureInfo(messages)¶

PrimeModel¶

class datarobot.models.PrimeModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, parent_model_id=None, ruleset_id=None, rule_count=None, score=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, supports_composable_ml=None)¶

Represents a DataRobot Prime model approximating a parent model with downloadable code.

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Attributes

idstr: the id of the model
project_idstr: the id of the project the model belongs to
processeslist of str: the processes used by the model
featurelist_namestr: the name of the featurelist used by the model
featurelist_idstr: the id of the featurelist used by the model
sample_pctfloat: the percentage of the project dataset used in training the model
training_row_countint or None: the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
training_durationstr or None: only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
training_start_datedatetime or None: only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
training_end_datedatetime or None: only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
model_typestr: what model this is, e.g. ‘DataRobot Prime’
model_categorystr: what kind of model this is - always ‘prime’ for DataRobot Prime models
is_frozenbool: whether this model is a frozen model
blueprint_idstr: the id of the blueprint used in this model
metricsdict: a mapping from each metric to the model’s scores for that metric
rulesetRuleset: the ruleset used in the Prime model
parent_model_idstr: the id of the model that this Prime model approximates
monotonic_increasing_featurelist_idstr: optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
monotonic_decreasing_featurelist_idstr: optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
supports_monotonic_constraintsbool: optional, whether this model supports enforcing monotonic constraints
is_starredbool: whether this model is marked as starred
prediction_thresholdfloat: for binary classification projects, the threshold used for predictions
prediction_threshold_read_onlybool: indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
supports_composable_mlbool or None: (New in version v2.26) whether this model is supported in the Composable ML.

classmethod get(project_id, model_id)¶

Retrieve a specific prime model.

Parameters

project_idstr: The id of the project the prime model belongs to
model_idstr: The model_id of the prime model to retrieve.

Returns

modelPrimeModel: The queried instance.

request_download_validation(language)¶

Prep and validate the downloadable code for the ruleset associated with this model.

Parameters

languagestr: the language the code should be downloaded in - see datarobot.enums.PRIME_LANGUAGE for available languages

Returns

jobJob: A job tracking the code preparation and validation

advanced_tune(params, description=None)¶

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters

paramsdict: Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
descriptionstr: Human-readable string describing the newly advanced-tuned model

Returns

ModelJob: The created job to build the model

Return type: ModelJob

cross_validate()¶

Run cross validation on the model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns

ModelJob: The created job to build the model

delete()¶

Delete a model from the project’s leaderboard.

Return type: None

download_export(filepath)¶

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters

filepathstr: The path at which to save the exported model file.

Return type: None

download_scoring_code(file_name, source_code=False)¶

Download the Scoring Code JAR.

Parameters

file_namestr: File path where scoring code will be saved.
source_codebool, optional: Set to True to download source code archive. It will not be executable.

download_training_artifact(file_name)¶

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters

file_namestr: File path where trained model artifact(s) will be saved.

classmethod from_data(data)¶

Instantiate an object of this class using a dict.

Parameters

datadict: Correctly snake_cased keys and their values.

Return type: TypeVar(T, bound= APIObject)

get_advanced_tuning_parameters()¶

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns

dict

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys

parameter_name : (str) name of the parameter (unique per task, see below)
parameter_id : (str) opaque ID string uniquely identifying parameter
default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)
current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.
task_name : (str) name of the task that this parameter belongs to
constraints: (dict) see the notes below
vertex_id: (str) ID of vertex that this parameter belongs to

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

Return type: AdvancedTuningParamsType

get_all_confusion_charts(fallback_to_parent_insights=False)¶

Retrieve a list of all confusion matrices available for the model.

Parameters

fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns

list of ConfusionChart: Data for all available confusion charts for model.

get_all_feature_impacts(data_slice_filter=None)¶

Retrieve a list of all feature impact results available for the model.

Parameters

data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.

Returns

list of dicts: Data for all available model feature impacts. Or an empty list if not data found.

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()

get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all Lift charts available for the model.

Parameters

fallback_to_parent_insightsbool, optional: (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
data_slice_filterDataSlice, optional: Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns

list of LiftChart: Data for all available model lift charts. Or an empty list if no data found.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()

get_all_multiclass_lift_charts(fallback_to_parent_insights=False)¶

Retrieve a list of all Lift charts available for the model.

Parameters

fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns

list of LiftChart: Data for all available model lift charts.

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all residuals charts available for the model.

Parameters

fallback_to_parent_insightsbool: Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
data_slice_filterDataSlice, optional: Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns

list of ResidualsChart: Data for all available model residuals charts.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()

get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all ROC curves available for the model.

Parameters

fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
data_slice_filterDataSlice, optional: filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns

list of RocCurve: Data for all available model ROC curves. Or an empty list if no RocCurves are found.

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()

get_confusion_chart(source, fallback_to_parent_insights=False)¶

Retrieve them model’s confusion matrix for the specified source.

Parameters

sourcestr: Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns

ConfusionChart: Model ConfusionChart data

Raises

ClientError: If the insight is not available for this model

get_cross_class_accuracy_scores()¶

Retrieves a list of Cross Class Accuracy scores for the model.

Returns

json

get_cross_validation_scores(partition=None, metric=None)¶

Return a dictionary, keyed by metric, showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters

partitionfloat: optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.
metric: unicode: optional name of the metric to filter to resulting cross validation scores by

Returns

cross_validation_scores: dict: A dictionary keyed by metric showing cross validation scores per partition.

get_data_disparity_insights(feature, class_name1, class_name2)¶

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters

featurestr: Bias and Fairness protected feature name.
class_name1str: One of the compared classes
class_name2str: Another compared class

Returns

json

get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)¶

Retrieve a list of Per Class Bias insights for the model.

Parameters

fairness_metrics_setstr, optional: Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
offsetint, optional: Number of items to skip.
limitint, optional: Number of items to return.

Returns

json

get_feature_effect(source, data_slice_id=None)¶

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters

sourcestring: The source Feature Effects are retrieved for.
data_slice_idstring, optional: ID for the data slice used in the request. If None, retrieve unsliced insight data.

Returns

feature_effectsFeatureEffects: The feature effects data.

Raises

ClientError (404): If the feature effects have not been computed or source is not valid value.

get_feature_effect_metadata()¶

Retrieve Feature Effects metadata. Response contains status and available model sources.

Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for the project.

Use source to retrieve Feature Effects, selecting one of the provided sources.

Returns

feature_effect_metadata: FeatureEffectMetadata

get_feature_effects_multiclass(source='training', class_=None)¶

Retrieve Feature Effects for the multiclass model.

Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters

sourcestr: The source Feature Effects are retrieved for.
class_str or None: The class name Feature Effects are retrieved for.

Returns

list: The list of multiclass feature effects.

Raises

ClientError (404): If Feature Effects have not been computed or source is not valid value.

get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters

with_metadatabool: The flag indicating if the result should include the metadata as well.
data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.

Returns

list or dict

The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:

featureImpacts - Feature Impact data as a dictionary. Each item is a dict with
keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.

shapBased - A boolean that indicates whether Feature Impact was calculated using
Shapley values.

ranRedundancyDetection - A boolean that indicates whether redundant feature
identification was run while calculating this Feature Impact.

rowCount - An integer or None that indicates the number of rows that was used to
calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.

count - An integer with the number of features under the featureImpacts.

Raises

ClientError (404): If the feature impacts have not been computed.
ValueError: If data_slice_filter passed as None

get_features_used()¶

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns

featureslist of str: The names of the features used in the model.

Return type: List[str]

get_frozen_child_models()¶

Retrieve the IDs for all models that are frozen from this model.

Returns

A list of Models

get_labelwise_roc_curves(source, fallback_to_parent_insights=False)¶

Retrieve a list of LabelwiseRocCurve instances for a multilabel model the given source and all labels.

New in version v2.24.

Parameters

sourcestr: ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns

list ofclass:LabelwiseRocCurve <datarobot.models.roc_curve.LabelwiseRocCurve>: Labelwise ROC Curve instances for source and all labels

Raises

ClientError: If the insight is not available for this model
(New in version v3.0) TypeError: If the underlying project type is binary

get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶

Retrieve the model Lift chart for the specified source.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

Returns

LiftChart: Model lift chart data

Raises

ClientError: If the insight is not available for this model
ValueError: If data_slice_filter passed as None

get_missing_report_info()¶

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns

An iterable of MissingReportPerFeature: The queried model missing report, sorted by missing count (DESCENDING order).

get_model_blueprint_chart()¶

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns

ModelBlueprintChart: The queried model blueprint chart.

get_model_blueprint_documents()¶

Get documentation for tasks used in this model.

Returns

list of BlueprintTaskDocument: All documents available for the model.

get_model_blueprint_json()¶

Get the blueprint json representation used by this model.

Returns

BlueprintJson: Json representation of the blueprint stages.

Return type: Dict[str, Tuple[List[str], List[str], str]]

get_multiclass_feature_impact()¶

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns

feature_impactslist of dict: The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises

ClientError (404): If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False)¶

Retrieve model Lift chart for the specified source.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns

list of LiftChart: Model lift chart data for each saved target class

Raises

ClientError: If the insight is not available for this model

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)¶

Retrieve model Lift charts for the specified source.

New in version v2.24.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns

list of LiftChart: Model lift chart data for each saved target class

Raises

ClientError: If the insight is not available for this model

get_num_iterations_trained()¶

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns

projectId: str: id of project containing the model
modelId: str: id of the model
data: array: list of numEstimatorsItem objects, one for each modeling stage.
numEstimatorsItem will be of the form:
stage: str: indicates the modeling stage (for multi-stage models); None of single-stage models
numIterations: int: the number of estimators or iterations trained by the model

get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)¶

Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source.

Parameters

sourcestring: The source Feature Effects are retrieved for.
max_waitint, optional: The maximum time to wait for a requested Feature Effect job to complete before erroring.
row_countint, optional: (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
data_slice_idstr, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

feature_effectsFeatureEffects: The Feature Effects data.

get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)¶

Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.

Parameters

sourcestring: The source Feature Effects retrieve for.
class_str or None: The class name Feature Effects retrieve for.
row_countint: The number of rows from dataset to use for Feature Impact calculation.
top_n_featuresint or None: Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
featureslist or None: The list of features used to calculate Feature Effects.
max_waitint, optional: The maximum time to wait for a requested Feature Effects job to complete before erroring.

Returns

feature_effectslist of FeatureEffectsMulticlass: The list of multiclass feature effects data.

get_or_request_feature_impact(max_wait=600, **kwargs)¶

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters

max_waitint, optional: The maximum time to wait for a requested feature impact job to complete before erroring
**kwargs: Arbitrary keyword arguments passed to request_feature_impact.

Returns

feature_impactslist or dict: The feature impact data. See get_feature_impact for the exact schema.

get_parameters()¶

Retrieve model parameters.

Returns

ModelParameters: Model parameters for this model.

get_pareto_front()¶

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns

ParetoFront: Model ParetoFront data

get_prime_eligibility()¶

Check if this model can be approximated with DataRobot Prime

Returns

prime_eligibilitydict: a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶

Retrieve model residuals chart for the specified source.

Parameters

sourcestr: Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.

Returns

ResidualsChart: Model residuals chart data

Raises

ClientError: If the insight is not available for this model
ValueError: If data_slice_filter passed as None

get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶

Retrieve the ROC curve for a binary model for the specified source.

Parameters

sourcestr: ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns

RocCurve: Model ROC curve data

Raises

ClientError: If the insight is not available for this model
(New in version v3.0) TypeError: If the underlying project type is multilabel
ValueError: If data_slice_filter passed as None

get_rulesets()¶

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns

rulesetslist of Ruleset

Return type: List[Ruleset]

get_supported_capabilities()¶

Retrieves a summary of the capabilities supported by a model.

New in version v2.14.

Returns

supportsBlending: bool

whether the model supports blending

supportsMonotonicConstraints: bool

whether the model supports monotonic constraints

hasWordCloud: bool

whether the model has word cloud data available

eligibleForPrime: bool

whether the model is eligible for Prime

hasParameters: bool

whether the model has parameters that can be retrieved

supportsCodeGeneration: bool

(New in version v2.18) whether the model supports code generation

supportsShap: bool

(New in version v2.18) True if the model supports Shapley package. i.e. Shapley based: feature Importance

supportsEarlyStopping: bool

(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_uri()¶

Returns

urlstr: Permanent static hyperlink to this model at leaderboard.

Return type: str

get_word_cloud(exclude_stop_words=False)¶

Retrieve word cloud data for the model.

Parameters

exclude_stop_wordsbool, optional: Set to True if you want stopwords filtered out of response.

Returns

WordCloud: Word cloud data for the model.

incremental_train(data_stage_id, training_data_name=None)¶

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters

data_stage_id: str: The id of the data stage to use for training.
training_data_namestr, optional: The name of the iteration or data stage to indicate what the incremental learning was performed on.

Returns

jobModelJob: The created job that is retraining the model

Return type: ModelJob

open_in_browser()¶

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type: None

request_cross_class_accuracy_scores()¶

Request data disparity insights to be computed for the model.

Returns

status_idstr: A statusId of computation request.

request_data_disparity_insights(feature, compared_class_names)¶

Request data disparity insights to be computed for the model.

Parameters

featurestr: Bias and Fairness protected feature name.
compared_class_nameslist(str): List of two classes to compare

Returns

status_idstr: A statusId of computation request.

request_external_test(dataset_id, actual_value_column=None)¶

Request external test to compute scores and insights on an external test dataset

Parameters

dataset_idstring: The dataset to make predictions against (as uploaded from Project.upload_dataset)
actual_value_columnstring, optional: (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
Returns
——-
jobJob: a Job representing external dataset insights computation

request_fairness_insights(fairness_metrics_set=None)¶

Request fairness insights to be computed for the model.

Parameters

fairness_metrics_setstr, optional: Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

Returns

status_idstr: A statusId of computation request.

request_feature_effect(row_count=None, data_slice_id=None)¶

Submit request to compute Feature Effects for the model.

See get_feature_effect for more information on the result of the job.

Parameters

row_countint: (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
data_slice_idstr, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

jobJob: A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.

Raises

JobAlreadyRequested (422): If the feature effect have already been requested.

request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)¶

Request Feature Effects computation for the multiclass model.

See get_feature_effect for more information on the result of the job.

Parameters

row_countint: The number of rows from dataset to use for Feature Impact calculation.
top_n_featuresint or None: Number of top features (ranked by feature impact) used to calculate Feature Effects.
featureslist or None: The list of features used to calculate Feature Effects.

Returns

jobJob: A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.

request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)¶

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters

row_countint, optional: The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.
with_metadatabool, optional: Flag indicating whether the result should include the metadata. If true, metadata is included.
data_slice_idstr, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

jobJob or status_id: Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.

Raises

JobAlreadyRequested (422): If the feature impacts have already been requested.

request_lift_chart(source, data_slice_id=None)¶

Request the model Lift Chart for the specified source.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
data_slice_idstring, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

status_check_jobStatusCheckJob: Object contains all needed logic for a periodical status check of an async job.

Return type: StatusCheckJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)¶

Requests predictions against a previously uploaded dataset.

Parameters

dataset_idstring, optional: The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
datasetDataset, optional: The dataset to make predictions against (as uploaded from Project.upload_dataset)
dataframepd.DataFrame, optional: (New in v3.0) The dataframe to make predictions against
file_pathstr, optional: (New in v3.0) Path to file to make predictions against
fileIOBase, optional: (New in v3.0) File to make predictions against
include_prediction_intervalsbool, optional: (New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
prediction_intervals_sizeint, optional: (New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
forecast_pointdatetime.datetime or None, optional: (New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
predictions_start_datedatetime.datetime or None, optional: (New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.
predictions_end_datedatetime.datetime or None, optional: (New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.
actual_value_columnstring, optional: (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the: response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
max_explanations: (New in version v2.21) int optional; specifies the maximum number of: explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
max_ngram_explanations: optional; int or str: (New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.

Returns

jobPredictJob: The job computing the predictions

Return type: PredictJob

request_residuals_chart(source, data_slice_id=None)¶

Request the model residuals chart for the specified source.

Parameters

sourcestr: Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
data_slice_idstring, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

status_check_jobStatusCheckJob: Object contains all needed logic for a periodical status check of an async job.

Return type: StatusCheckJob

request_roc_curve(source, data_slice_id=None)¶

Request the model Roc Curve for the specified source.

Parameters

sourcestr: Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
data_slice_idstring, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

status_check_jobStatusCheckJob: Object contains all needed logic for a periodical status check of an async job.

Return type: StatusCheckJob

request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)¶

Start a job to build training predictions

Parameters

data_subsetstr

data set definition to build predictions on. Choices are:

dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
models in datetime partitioned projects

dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
all data except training set. Not valid for models in datetime partitioned projects

dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only

dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.

explanation_algorithmdr.enums.EXPLANATIONS_ALGORITHM

(New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).

max_explanationsint

(New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the max_explanations, the sum of remaining values will also be returned as shap_remaining_total. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored if explanation_algorithm is not set.

Returns

Job: an instance of created async job

request_transferable_export(prediction_intervals_size=None)¶

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Parameters

prediction_intervals_sizeint, optional: (New in v2.19) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Prediction intervals size must be between 1 and 100 (inclusive).

Returns

Job

Examples

model = datarobot.Model.get('project-id', 'model-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')

Return type: Job

retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)¶

Submit a job to the queue to train a blender model.

Parameters

sample_pct: float, optional: The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
featurelist_idstr, optional: The featurelist id
training_row_countint, optional: The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.
n_clusters: int, optional: (new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.

Returns

jobModelJob: The created job that is retraining the model

Return type: ModelJob

set_prediction_threshold(threshold)¶

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters

thresholdfloat: only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()¶

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

start_advanced_tuning_session()¶

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns

AdvancedTuningSession: Session for setting up and running Advanced Tuning on a model

unstar_model()¶

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

BlenderModel¶

class datarobot.models.BlenderModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, model_ids=None, blender_method=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None)¶

Represents blender model that combines prediction results from other models.

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Attributes

idstr: the id of the model
project_idstr: the id of the project the model belongs to
processeslist of str: the processes used by the model
featurelist_namestr: the name of the featurelist used by the model
featurelist_idstr: the id of the featurelist used by the model
sample_pctfloat: the percentage of the project dataset used in training the model
training_row_countint or None: the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
training_durationstr or None: only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
training_start_datedatetime or None: only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
training_end_datedatetime or None: only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
model_typestr: what model this is, e.g. ‘DataRobot Prime’
model_categorystr: what kind of model this is - always ‘prime’ for DataRobot Prime models
is_frozenbool: whether this model is a frozen model
blueprint_idstr: the id of the blueprint used in this model
metricsdict: a mapping from each metric to the model’s scores for that metric
model_idslist of str: List of model ids used in blender
blender_methodstr: Method used to blend results from underlying models
monotonic_increasing_featurelist_idstr: optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
monotonic_decreasing_featurelist_idstr: optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
supports_monotonic_constraintsbool: optional, whether this model supports enforcing monotonic constraints
is_starredbool: whether this model marked as starred
prediction_thresholdfloat: for binary classification projects, the threshold used for predictions
prediction_threshold_read_onlybool: indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
model_numberinteger: model number assigned to a model
parent_model_idstr or None: (New in version v2.20) the id of the model that tuning parameters are derived from
supports_composable_mlbool or None: (New in version v2.26) whether this model is supported in the Composable ML.

classmethod get(project_id, model_id)¶

Retrieve a specific blender.

Parameters

project_idstr: The project’s id.
model_idstr: The model_id of the leaderboard item to retrieve.

Returns

modelBlenderModel: The queried instance.

advanced_tune(params, description=None)¶

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters

paramsdict: Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
descriptionstr: Human-readable string describing the newly advanced-tuned model

Returns

ModelJob: The created job to build the model

Return type: ModelJob

cross_validate()¶

Run cross validation on the model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns

ModelJob: The created job to build the model

delete()¶

Delete a model from the project’s leaderboard.

Return type: None

download_export(filepath)¶

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters

filepathstr: The path at which to save the exported model file.

Return type: None

download_scoring_code(file_name, source_code=False)¶

Download the Scoring Code JAR.

Parameters

file_namestr: File path where scoring code will be saved.
source_codebool, optional: Set to True to download source code archive. It will not be executable.

download_training_artifact(file_name)¶

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters

file_namestr: File path where trained model artifact(s) will be saved.

classmethod from_data(data)¶

Instantiate an object of this class using a dict.

Parameters

datadict: Correctly snake_cased keys and their values.

Return type: TypeVar(T, bound= APIObject)

get_advanced_tuning_parameters()¶

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns

dict

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys

parameter_name : (str) name of the parameter (unique per task, see below)
parameter_id : (str) opaque ID string uniquely identifying parameter
default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)
current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.
task_name : (str) name of the task that this parameter belongs to
constraints: (dict) see the notes below
vertex_id: (str) ID of vertex that this parameter belongs to

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

Return type: AdvancedTuningParamsType

get_all_confusion_charts(fallback_to_parent_insights=False)¶

Retrieve a list of all confusion matrices available for the model.

Parameters

fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns

list of ConfusionChart: Data for all available confusion charts for model.

get_all_feature_impacts(data_slice_filter=None)¶

Retrieve a list of all feature impact results available for the model.

Parameters

data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.

Returns

list of dicts: Data for all available model feature impacts. Or an empty list if not data found.

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()

get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all Lift charts available for the model.

Parameters

fallback_to_parent_insightsbool, optional: (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
data_slice_filterDataSlice, optional: Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns

list of LiftChart: Data for all available model lift charts. Or an empty list if no data found.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()

get_all_multiclass_lift_charts(fallback_to_parent_insights=False)¶

Retrieve a list of all Lift charts available for the model.

Parameters

fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns

list of LiftChart: Data for all available model lift charts.

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all residuals charts available for the model.

Parameters

fallback_to_parent_insightsbool: Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
data_slice_filterDataSlice, optional: Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns

list of ResidualsChart: Data for all available model residuals charts.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()

get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all ROC curves available for the model.

Parameters

fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
data_slice_filterDataSlice, optional: filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns

list of RocCurve: Data for all available model ROC curves. Or an empty list if no RocCurves are found.

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()

get_confusion_chart(source, fallback_to_parent_insights=False)¶

Retrieve them model’s confusion matrix for the specified source.

Parameters

sourcestr: Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns

ConfusionChart: Model ConfusionChart data

Raises

ClientError: If the insight is not available for this model

get_cross_class_accuracy_scores()¶

Retrieves a list of Cross Class Accuracy scores for the model.

Returns

json

get_cross_validation_scores(partition=None, metric=None)¶

Return a dictionary, keyed by metric, showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters

partitionfloat: optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.
metric: unicode: optional name of the metric to filter to resulting cross validation scores by

Returns

cross_validation_scores: dict: A dictionary keyed by metric showing cross validation scores per partition.

get_data_disparity_insights(feature, class_name1, class_name2)¶

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters

featurestr: Bias and Fairness protected feature name.
class_name1str: One of the compared classes
class_name2str: Another compared class

Returns

json

get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)¶

Retrieve a list of Per Class Bias insights for the model.

Parameters

fairness_metrics_setstr, optional: Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
offsetint, optional: Number of items to skip.
limitint, optional: Number of items to return.

Returns

json

get_feature_effect(source, data_slice_id=None)¶

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters

sourcestring: The source Feature Effects are retrieved for.
data_slice_idstring, optional: ID for the data slice used in the request. If None, retrieve unsliced insight data.

Returns

feature_effectsFeatureEffects: The feature effects data.

Raises

ClientError (404): If the feature effects have not been computed or source is not valid value.

get_feature_effect_metadata()¶

Retrieve Feature Effects metadata. Response contains status and available model sources.

Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for the project.

Use source to retrieve Feature Effects, selecting one of the provided sources.

Returns

feature_effect_metadata: FeatureEffectMetadata

get_feature_effects_multiclass(source='training', class_=None)¶

Retrieve Feature Effects for the multiclass model.

Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters

sourcestr: The source Feature Effects are retrieved for.
class_str or None: The class name Feature Effects are retrieved for.

Returns

list: The list of multiclass feature effects.

Raises

ClientError (404): If Feature Effects have not been computed or source is not valid value.

get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters

with_metadatabool: The flag indicating if the result should include the metadata as well.
data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.

Returns

list or dict

The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:

featureImpacts - Feature Impact data as a dictionary. Each item is a dict with
keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.

shapBased - A boolean that indicates whether Feature Impact was calculated using
Shapley values.

ranRedundancyDetection - A boolean that indicates whether redundant feature
identification was run while calculating this Feature Impact.

rowCount - An integer or None that indicates the number of rows that was used to
calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.

count - An integer with the number of features under the featureImpacts.

Raises

ClientError (404): If the feature impacts have not been computed.
ValueError: If data_slice_filter passed as None

get_features_used()¶

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns

featureslist of str: The names of the features used in the model.

Return type: List[str]

get_frozen_child_models()¶

Retrieve the IDs for all models that are frozen from this model.

Returns

A list of Models

get_labelwise_roc_curves(source, fallback_to_parent_insights=False)¶

Retrieve a list of LabelwiseRocCurve instances for a multilabel model the given source and all labels.

New in version v2.24.

Parameters

sourcestr: ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns

list ofclass:LabelwiseRocCurve <datarobot.models.roc_curve.LabelwiseRocCurve>: Labelwise ROC Curve instances for source and all labels

Raises

ClientError: If the insight is not available for this model
(New in version v3.0) TypeError: If the underlying project type is binary

get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶

Retrieve the model Lift chart for the specified source.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

Returns

LiftChart: Model lift chart data

Raises

ClientError: If the insight is not available for this model
ValueError: If data_slice_filter passed as None

get_missing_report_info()¶

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns

An iterable of MissingReportPerFeature: The queried model missing report, sorted by missing count (DESCENDING order).

get_model_blueprint_chart()¶

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns

ModelBlueprintChart: The queried model blueprint chart.

get_model_blueprint_documents()¶

Get documentation for tasks used in this model.

Returns

list of BlueprintTaskDocument: All documents available for the model.

get_model_blueprint_json()¶

Get the blueprint json representation used by this model.

Returns

BlueprintJson: Json representation of the blueprint stages.

Return type: Dict[str, Tuple[List[str], List[str], str]]

get_multiclass_feature_impact()¶

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns

feature_impactslist of dict: The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises

ClientError (404): If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False)¶

Retrieve model Lift chart for the specified source.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns

list of LiftChart: Model lift chart data for each saved target class

Raises

ClientError: If the insight is not available for this model

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)¶

Retrieve model Lift charts for the specified source.

New in version v2.24.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns

list of LiftChart: Model lift chart data for each saved target class

Raises

ClientError: If the insight is not available for this model

get_num_iterations_trained()¶

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns

projectId: str: id of project containing the model
modelId: str: id of the model
data: array: list of numEstimatorsItem objects, one for each modeling stage.
numEstimatorsItem will be of the form:
stage: str: indicates the modeling stage (for multi-stage models); None of single-stage models
numIterations: int: the number of estimators or iterations trained by the model

get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)¶

Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source.

Parameters

sourcestring: The source Feature Effects are retrieved for.
max_waitint, optional: The maximum time to wait for a requested Feature Effect job to complete before erroring.
row_countint, optional: (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
data_slice_idstr, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

feature_effectsFeatureEffects: The Feature Effects data.

get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)¶

Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.

Parameters

sourcestring: The source Feature Effects retrieve for.
class_str or None: The class name Feature Effects retrieve for.
row_countint: The number of rows from dataset to use for Feature Impact calculation.
top_n_featuresint or None: Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
featureslist or None: The list of features used to calculate Feature Effects.
max_waitint, optional: The maximum time to wait for a requested Feature Effects job to complete before erroring.

Returns

feature_effectslist of FeatureEffectsMulticlass: The list of multiclass feature effects data.

get_or_request_feature_impact(max_wait=600, **kwargs)¶

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters

max_waitint, optional: The maximum time to wait for a requested feature impact job to complete before erroring
**kwargs: Arbitrary keyword arguments passed to request_feature_impact.

Returns

feature_impactslist or dict: The feature impact data. See get_feature_impact for the exact schema.

get_parameters()¶

Retrieve model parameters.

Returns

ModelParameters: Model parameters for this model.

get_pareto_front()¶

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns

ParetoFront: Model ParetoFront data

get_prime_eligibility()¶

Check if this model can be approximated with DataRobot Prime

Returns

prime_eligibilitydict: a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶

Retrieve model residuals chart for the specified source.

Parameters

sourcestr: Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.

Returns

ResidualsChart: Model residuals chart data

Raises

ClientError: If the insight is not available for this model
ValueError: If data_slice_filter passed as None

get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶

Retrieve the ROC curve for a binary model for the specified source.

Parameters

sourcestr: ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns

RocCurve: Model ROC curve data

Raises

ClientError: If the insight is not available for this model
(New in version v3.0) TypeError: If the underlying project type is multilabel
ValueError: If data_slice_filter passed as None

get_rulesets()¶

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns

rulesetslist of Ruleset

Return type: List[Ruleset]

get_supported_capabilities()¶

Retrieves a summary of the capabilities supported by a model.

New in version v2.14.

Returns

supportsBlending: bool

whether the model supports blending

supportsMonotonicConstraints: bool

whether the model supports monotonic constraints

hasWordCloud: bool

whether the model has word cloud data available

eligibleForPrime: bool

whether the model is eligible for Prime

hasParameters: bool

whether the model has parameters that can be retrieved

supportsCodeGeneration: bool

(New in version v2.18) whether the model supports code generation

supportsShap: bool

(New in version v2.18) True if the model supports Shapley package. i.e. Shapley based: feature Importance

supportsEarlyStopping: bool

(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_uri()¶

Returns

urlstr: Permanent static hyperlink to this model at leaderboard.

Return type: str

get_word_cloud(exclude_stop_words=False)¶

Retrieve word cloud data for the model.

Parameters

exclude_stop_wordsbool, optional: Set to True if you want stopwords filtered out of response.

Returns

WordCloud: Word cloud data for the model.

incremental_train(data_stage_id, training_data_name=None)¶

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters

data_stage_id: str: The id of the data stage to use for training.
training_data_namestr, optional: The name of the iteration or data stage to indicate what the incremental learning was performed on.

Returns

jobModelJob: The created job that is retraining the model

Return type: ModelJob

open_in_browser()¶

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type: None

request_approximation()¶

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns

jobJob: the job generating the rulesets

request_cross_class_accuracy_scores()¶

Request data disparity insights to be computed for the model.

Returns

status_idstr: A statusId of computation request.

request_data_disparity_insights(feature, compared_class_names)¶

Request data disparity insights to be computed for the model.

Parameters

featurestr: Bias and Fairness protected feature name.
compared_class_nameslist(str): List of two classes to compare

Returns

status_idstr: A statusId of computation request.

request_external_test(dataset_id, actual_value_column=None)¶

Request external test to compute scores and insights on an external test dataset

Parameters

dataset_idstring: The dataset to make predictions against (as uploaded from Project.upload_dataset)
actual_value_columnstring, optional: (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
Returns
——-
jobJob: a Job representing external dataset insights computation

request_fairness_insights(fairness_metrics_set=None)¶

Request fairness insights to be computed for the model.

Parameters

fairness_metrics_setstr, optional: Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

Returns

status_idstr: A statusId of computation request.

request_feature_effect(row_count=None, data_slice_id=None)¶

Submit request to compute Feature Effects for the model.

See get_feature_effect for more information on the result of the job.

Parameters

row_countint: (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
data_slice_idstr, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

jobJob: A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.

Raises

JobAlreadyRequested (422): If the feature effect have already been requested.

request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)¶

Request Feature Effects computation for the multiclass model.

See get_feature_effect for more information on the result of the job.

Parameters

row_countint: The number of rows from dataset to use for Feature Impact calculation.
top_n_featuresint or None: Number of top features (ranked by feature impact) used to calculate Feature Effects.
featureslist or None: The list of features used to calculate Feature Effects.

Returns

jobJob: A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.

request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)¶

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters

row_countint, optional: The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.
with_metadatabool, optional: Flag indicating whether the result should include the metadata. If true, metadata is included.
data_slice_idstr, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

jobJob or status_id: Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.

Raises

JobAlreadyRequested (422): If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)¶

Train a new frozen model with parameters from this model.

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters

training_row_countint, optional: the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
training_durationstr, optional: a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
training_start_datedatetime.datetime, optional: the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
training_end_datedatetime.datetime, optional: the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
time_window_sample_pctint, optional: may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
sampling_methodstr, optional: (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

Returns

model_jobModelJob: the modeling job training a frozen model

Return type: ModelJob

request_frozen_model(sample_pct=None, training_row_count=None)¶

Train a new frozen model with parameters from this model

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters

sample_pctfloat: optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.
training_row_countint: (New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns

model_jobModelJob: the modeling job training a frozen model

Return type: ModelJob

request_lift_chart(source, data_slice_id=None)¶

Request the model Lift Chart for the specified source.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
data_slice_idstring, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

status_check_jobStatusCheckJob: Object contains all needed logic for a periodical status check of an async job.

Return type: StatusCheckJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)¶

Requests predictions against a previously uploaded dataset.

Parameters

dataset_idstring, optional: The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
datasetDataset, optional: The dataset to make predictions against (as uploaded from Project.upload_dataset)
dataframepd.DataFrame, optional: (New in v3.0) The dataframe to make predictions against
file_pathstr, optional: (New in v3.0) Path to file to make predictions against
fileIOBase, optional: (New in v3.0) File to make predictions against
include_prediction_intervalsbool, optional: (New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
prediction_intervals_sizeint, optional: (New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
forecast_pointdatetime.datetime or None, optional: (New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
predictions_start_datedatetime.datetime or None, optional: (New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.
predictions_end_datedatetime.datetime or None, optional: (New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.
actual_value_columnstring, optional: (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the: response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
max_explanations: (New in version v2.21) int optional; specifies the maximum number of: explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
max_ngram_explanations: optional; int or str: (New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.

Returns

jobPredictJob: The job computing the predictions

Return type: PredictJob

request_residuals_chart(source, data_slice_id=None)¶

Request the model residuals chart for the specified source.

Parameters

sourcestr: Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
data_slice_idstring, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

status_check_jobStatusCheckJob: Object contains all needed logic for a periodical status check of an async job.

Return type: StatusCheckJob

request_roc_curve(source, data_slice_id=None)¶

Request the model Roc Curve for the specified source.

Parameters

sourcestr: Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
data_slice_idstring, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

status_check_jobStatusCheckJob: Object contains all needed logic for a periodical status check of an async job.

Return type: StatusCheckJob

request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)¶

Start a job to build training predictions

Parameters

data_subsetstr

data set definition to build predictions on. Choices are:

dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
models in datetime partitioned projects

dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
all data except training set. Not valid for models in datetime partitioned projects

dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only

dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.

explanation_algorithmdr.enums.EXPLANATIONS_ALGORITHM

(New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).

max_explanationsint

(New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the max_explanations, the sum of remaining values will also be returned as shap_remaining_total. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored if explanation_algorithm is not set.

Returns

Job: an instance of created async job

request_transferable_export(prediction_intervals_size=None)¶

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Parameters

prediction_intervals_sizeint, optional: (New in v2.19) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Prediction intervals size must be between 1 and 100 (inclusive).

Returns

Job

Examples

model = datarobot.Model.get('project-id', 'model-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')

Return type: Job

retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)¶

Submit a job to the queue to train a blender model.

Parameters

sample_pct: float, optional: The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
featurelist_idstr, optional: The featurelist id
training_row_countint, optional: The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.
n_clusters: int, optional: (new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.

Returns

jobModelJob: The created job that is retraining the model

Return type: ModelJob

set_prediction_threshold(threshold)¶

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters

thresholdfloat: only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()¶

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

start_advanced_tuning_session()¶

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns

AdvancedTuningSession: Session for setting up and running Advanced Tuning on a model

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)¶

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Note

For datetime partitioned projects, see train_datetime instead.

Parameters

sample_pctfloat, optional: The amount of data to use for training, as a percentage of the project dataset from 0 to 100.
featurelist_idstr, optional: The identifier of the featurelist to use. If not defined, the featurelist of this model is used.
scoring_typestr, optional: Either validation or crossValidation (also dr.SCORING_TYPE.validation or dr.SCORING_TYPE.cross_validation). validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, crossValidation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.
training_row_countint, optional: The number of rows to use to train the requested model.
monotonic_increasing_featurelist_idstr: (new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
monotonic_decreasing_featurelist_idstr: (new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns

model_job_idstr: id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Examples

project = Project.get('project-id')
model = Model.get('project-id', 'model-id')
model_job_id = model.train(training_row_count=project.max_train_rows)

Return type: str

train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)¶

Trains this model on a different featurelist or sample size.

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters

featurelist_idstr, optional: the featurelist to use to train the model. If not specified, the featurelist of this model is used.
training_row_countint, optional: the number of rows of data that should be used to train the model. If specified, neither training_duration nor use_project_settings may be specified.
training_durationstr, optional: a duration string specifying what time range the data used to train the model should span. If specified, neither training_row_count nor use_project_settings may be specified.
use_project_settingsbool, optional: (New in version v2.20) defaults to False. If True, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neither training_row_count nor training_duration may be specified.
time_window_sample_pctint, optional: may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
sampling_methodstr, optional: (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.
monotonic_increasing_featurelist_idstr, optional: (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
monotonic_decreasing_featurelist_idstr, optional: (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
n_clusters: int, optional: (New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.

Returns

jobModelJob: the created job to build the model

Return type: ModelJob

unstar_model()¶

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

DatetimeModel¶

class datarobot.models.DatetimeModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, training_info=None, holdout_score=None, holdout_status=None, data_selection_method=None, backtests=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, effective_feature_derivation_window_start=None, effective_feature_derivation_window_end=None, forecast_window_start=None, forecast_window_end=None, windows_basis_unit=None, model_number=None, parent_model_id=None, use_project_settings=None, supports_composable_ml=None, n_clusters=None, is_n_clusters_dynamically_determined=None, **kwargs)¶

Represents a model from a datetime partitioned project

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Note that only one of training_row_count, training_duration, and training_start_date and training_end_date will be specified, depending on the data_selection_method of the model. Whichever method was selected determines the amount of data used to train on when making predictions and scoring the backtests and the holdout.

Attributes

idstr: the id of the model
project_idstr: the id of the project the model belongs to
processeslist of str: the processes used by the model
featurelist_namestr: the name of the featurelist used by the model
featurelist_idstr: the id of the featurelist used by the model
sample_pctfloat: the percentage of the project dataset used in training the model
training_row_countint or None: If specified, an int specifying the number of rows used to train the model and evaluate backtest scores.
training_durationstr or None: If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
training_start_datedatetime or None: only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
training_end_datedatetime or None: only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
time_window_sample_pctint or None: An integer between 1 and 99 indicating the percentage of sampling within the training window. The points kept are determined by a random uniform sample. If not specified, no sampling was done.
sampling_methodstr or None: (New in v2.23) indicates the way training data has been selected (either how rows have been selected within backtest or how time_window_sample_pct has been applied).
model_typestr: what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
model_categorystr: what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
is_frozenbool: whether this model is a frozen model
blueprint_idstr: the id of the blueprint used in this model
metricsdict: a mapping from each metric to the model’s scores for that metric. The keys in metrics are the different metrics used to evaluate the model, and the values are the results. The dictionaries inside of metrics will be as described here: ‘validation’, the score for a single backtest; ‘crossValidation’, always None; ‘backtesting’, the average score for all backtests if all are available and computed, or None otherwise; ‘backtestingScores’, a list of scores for all backtests where the score is None if that backtest does not have a score available; and ‘holdout’, the score for the holdout or None if the holdout is locked or the score is unavailable.
backtestslist of dict: describes what data was used to fit each backtest, the score for the project metric, and why the backtest score is unavailable if it is not provided.
data_selection_methodstr: which of training_row_count, training_duration, or training_start_data and training_end_date were used to determine the data used to fit the model. One of ‘rowCount’, ‘duration’, or ‘selectedDateRange’.
training_infodict: describes which data was used to train on when scoring the holdout and making predictions. training_info` will have the following keys: holdout_training_start_date, holdout_training_duration, holdout_training_row_count, holdout_training_end_date, prediction_training_start_date, prediction_training_duration, prediction_training_row_count, prediction_training_end_date. Start and end dates will be datetimes, durations will be duration strings, and rows will be integers.
holdout_scorefloat or None: the score against the holdout, if available and the holdout is unlocked, according to the project metric.
holdout_statusstring or None: the status of the holdout score, e.g. “COMPLETED”, “HOLDOUT_BOUNDARIES_EXCEEDED”. Unavailable if the holdout fold was disabled in the partitioning configuration.
monotonic_increasing_featurelist_idstr: optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
monotonic_decreasing_featurelist_idstr: optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
supports_monotonic_constraintsbool: optional, whether this model supports enforcing monotonic constraints
is_starredbool: whether this model marked as starred
prediction_thresholdfloat: for binary classification projects, the threshold used for predictions
prediction_threshold_read_onlybool: indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
effective_feature_derivation_window_startint or None: (New in v2.16) For time series projects only. How many units of the windows_basis_unit into the past relative to the forecast point the user needs to provide history for at prediction time. This can differ from the feature_derivation_window_start set on the project due to the differencing method and period selected, or if the model is a time series native model such as ARIMA. Will be a negative integer in time series projects and None otherwise.
effective_feature_derivation_window_endint or None: (New in v2.16) For time series projects only. How many units of the windows_basis_unit into the past relative to the forecast point the feature derivation window should end. Will be a non-positive integer in time series projects and None otherwise.
forecast_window_startint or None: (New in v2.16) For time series projects only. How many units of the windows_basis_unit into the future relative to the forecast point the forecast window should start. Note that this field will be the same as what is shown in the project settings. Will be a non-negative integer in time series projects and None otherwise.
forecast_window_endint or None: (New in v2.16) For time series projects only. How many units of the windows_basis_unit into the future relative to the forecast point the forecast window should end. Note that this field will be the same as what is shown in the project settings. Will be a non-negative integer in time series projects and None otherwise.
windows_basis_unitstr or None: (New in v2.16) For time series projects only. Indicates which unit is the basis for the feature derivation window and the forecast window. Note that this field will be the same as what is shown in the project settings. In time series projects, will be either the detected time unit or “ROW”, and None otherwise.
model_numberinteger: model number assigned to a model
parent_model_idstr or None: (New in version v2.20) the id of the model that tuning parameters are derived from
use_project_settingsbool or None: (New in version v2.20) If True, indicates that the custom backtest partitioning settings specified by the user were used to train the model and evaluate backtest scores.
supports_composable_mlbool or None: (New in version v2.26) whether this model is supported in the Composable ML.
is_n_clusters_dynamically_determinedbool, optional: (New in version 2.27) if True, indicates that model determines number of clusters automatically.
n_clustersint, optional: (New in version 2.27) Number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.

classmethod get(project, model_id)¶

Retrieve a specific datetime model.

If the project does not use datetime partitioning, a ClientError will occur.

Parameters

projectstr: the id of the project the model belongs to
model_idstr: the id of the model to retrieve

Returns

modelDatetimeModel: the model

score_backtests()¶

Compute the scores for all available backtests.

Some backtests may be unavailable if the model is trained into their validation data.

Returns

jobJob: a job tracking the backtest computation. When it is complete, all available backtests will have scores computed.

cross_validate()¶

Inherited from the model. DatetimeModels cannot request cross validation scores; use backtests instead.

Return type: NoReturn

get_cross_validation_scores(partition=None, metric=None)¶

Inherited from Model - DatetimeModels cannot request Cross Validation scores,

Use backtests instead.

Return type: NoReturn

request_training_predictions(data_subset, *args, **kwargs)¶

Start a job that builds training predictions.

Parameters

data_subsetstr

data set definition to build predictions on. Choices are:

dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only

dr.enums.DATA_SUBSET.ALL_BACKTESTS for downloading the predictions for all
backtest validation folds. Requires the model to have successfully scored all backtests.

Returns

——-

Job

an instance of created async job

get_series_accuracy_as_dataframe(offset=0, limit=100, metric=None, multiseries_value=None, order_by=None, reverse=False)¶

Retrieve series accuracy results for the specified model as a pandas.DataFrame.

Parameters

offsetint, optional: The number of results to skip. Defaults to 0 if not specified.
limitint, optional: The maximum number of results to return. Defaults to 100 if not specified.
metricstr, optional: The name of the metric to retrieve scores for. If omitted, the default project metric will be used.
multiseries_valuestr, optional: If specified, only the series containing the given value in one of the series ID columns will be returned.
order_bystr, optional: Used for sorting the series. Attribute must be one of datarobot.enums.SERIES_ACCURACY_ORDER_BY.
reversebool, optional: Used for sorting the series. If True, will sort the series in descending order by the attribute specified by order_by.

Returns

data: A pandas.DataFrame with the Series Accuracy for the specified model.

download_series_accuracy_as_csv(filename, encoding='utf-8', offset=0, limit=100, metric=None, multiseries_value=None, order_by=None, reverse=False)¶

Save series accuracy results for the specified model in a CSV file.

Parameters

filenamestr or file object: The path or file object to save the data to.
encodingstr, optional: A string representing the encoding to use in the output csv file. Defaults to ‘utf-8’.
offsetint, optional: The number of results to skip. Defaults to 0 if not specified.
limitint, optional: The maximum number of results to return. Defaults to 100 if not specified.
metricstr, optional: The name of the metric to retrieve scores for. If omitted, the default project metric will be used.
multiseries_valuestr, optional: If specified, only the series containing the given value in one of the series ID columns will be returned.
order_bystr, optional: Used for sorting the series. Attribute must be one of datarobot.enums.SERIES_ACCURACY_ORDER_BY.
reversebool, optional: Used for sorting the series. If True, will sort the series in descending order by the attribute specified by order_by.

get_series_clusters(offset=0, limit=100, order_by=None, reverse=False)¶

Retrieve a dictionary of series and the clusters assigned to each series. This is only usable for clustering projects.

Parameters

offsetint, optional: The number of results to skip. Defaults to 0 if not specified.
limitint, optional: The maximum number of results to return. Defaults to 100 if not specified.
order_bystr, optional: Used for sorting the series. Attribute must be one of datarobot.enums.SERIES_ACCURACY_ORDER_BY.
reversebool, optional: Used for sorting the series. If True, will sort the series in descending order by the attribute specified by order_by.

Returns

Dict: A dictionary of the series in the dataset with their associated cluster

Raises

ValueError: If the model type returns an unsupported insight
ClientError: If the insight is not available for this model

Return type: Dict[str, str]

compute_series_accuracy(compute_all_series=False)¶

Compute series accuracy for the model.

Parameters

compute_all_seriesbool, optional: Calculate accuracy for all series or only first 1000.

Returns

Job: an instance of the created async job

retrain(time_window_sample_pct=None, featurelist_id=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, sampling_method=None, n_clusters=None)¶

Retrain an existing datetime model using a new training period for the model’s training set (with optional time window sampling) or a different feature list.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters

featurelist_idstr, optional: The ID of the featurelist to use.
training_row_countint, optional: The number of rows to train the model on. If this parameter is used then sample_pct cannot be specified.
time_window_sample_pctint, optional: An int between 1 and 99 indicating the percentage of sampling within the time window. The points kept are determined by a random uniform sample. If specified, training_row_count must not be specified and either training_duration or training_start_date and training_end_date must be specified.
training_durationstr, optional: A duration string representing the training duration for the submitted model. If specified then training_row_count, training_start_date, and training_end_date cannot be specified.
training_start_datestr, optional: A datetime string representing the start date of the data to use for training this model. If specified, training_end_date must also be specified, and training_duration cannot be specified. The value must be before the training_end_date value.
training_end_datestr, optional: A datetime string representing the end date of the data to use for training this model. If specified, training_start_date must also be specified, and training_duration cannot be specified. The value must be after the training_start_date value.
sampling_methodstr, optional: (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.
n_clustersint, optional: (New in version 2.27) Number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.

Returns

jobModelJob: The created job that is retraining the model

get_feature_effect_metadata()¶

Retrieve Feature Effect metadata for each backtest. Response contains status and available sources for each backtest of the model.

Each backtest is available for training and validation
If holdout is configured for the project it has holdout as backtestIndex. It has training and holdout sources available.

Start/stop models contain a single response item with startstop value for backtestIndex.

Feature Effect of training is always available (except for the old project which supports only Feature Effect for validation).
When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Effect is not available for validation or holdout.
Feature Effect for holdout is not available when there is no holdout configured for the project.

source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.

backtestIndex is expected parameter to submit compute request and retrieve Feature Effect. One of provided backtest indexes shall be used.

Returns

feature_effect_metadata: FeatureEffectMetadataDatetime

request_feature_effect(backtest_index)¶

Request feature effects to be computed for the model.

See get_feature_effect for more information on the result of the job.

See get_feature_effect_metadata for retrieving information of backtest_index.

Parameters

backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.: The backtest index to retrieve Feature Effects for.

Returns

jobJob: A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.

Raises

JobAlreadyRequested (422): If the feature effect have already been requested.

get_feature_effect(source, backtest_index)¶

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information of source, backtest_index.

Parameters

source: string: The source Feature Effects are retrieved for. One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the available sources for feature effect.
backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.: The backtest index to retrieve Feature Effects for.

Returns

feature_effects: FeatureEffects: The feature effects data.

Raises

ClientError (404): If the feature effects have not been computed or source is not valid value.

get_or_request_feature_effect(source, backtest_index, max_wait=600)¶

Retrieve Feature Effects computations for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source, backtest_index.

Parameters

max_waitint, optional: The maximum time to wait for a requested feature effect job to complete before erroring
sourcestring: The source Feature Effects are retrieved for. One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the available sources for feature effect.
backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.: The backtest index to retrieve Feature Effects for.

Returns

feature_effectsFeatureEffects: The feature effects data.

request_feature_effects_multiclass(backtest_index, row_count=None, top_n_features=None, features=None)¶

Request feature effects to be computed for the multiclass datetime model.

See get_feature_effect for more information on the result of the job.

Parameters

backtest_indexstr: The backtest index to use for Feature Effects calculation.
row_countint: The number of rows from dataset to use for Feature Impact calculation.
top_n_featuresint or None: Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
featureslist or None: The list of features to use to calculate Feature Effects.

Returns

jobJob: A Job representing Feature Effects computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.

get_feature_effects_multiclass(backtest_index, source='training', class_=None)¶

Retrieve Feature Effects for the multiclass datetime model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters

backtest_indexstr: The backtest index to retrieve Feature Effects for.
sourcestr: The source Feature Effects are retrieved for.
class_str or None: The class name Feature Effects are retrieved for.

Returns

list: The list of multiclass Feature Effects.

Raises

ClientError (404): If the Feature Effects have not been computed or source is not valid value.

get_or_request_feature_effects_multiclass(backtest_index, source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)¶

Retrieve Feature Effects for a datetime multiclass model, and request a job if it hasn’t been run previously.

Parameters

backtest_indexstr: The backtest index to retrieve Feature Effects for.
sourcestring: The source from which Feature Effects are retrieved.
class_str or None: The class name Feature Effects retrieve for.
row_countint: The number of rows used from the dataset for Feature Impact calculation.
top_n_featuresint or None: Number of top features (ranked by feature impact) used to calculate Feature Effects.
featureslist or None: The list of features used to calculate Feature Effects.
max_waitint, optional: The maximum time to wait for a requested feature effect job to complete before erroring.

Returns

feature_effectslist of FeatureEffectsMulticlass: The list of multiclass feature effects data.

calculate_prediction_intervals(prediction_intervals_size)¶

Calculate prediction intervals for this DatetimeModel for the specified size.

New in version v2.19.

Parameters

prediction_intervals_sizeint: The prediction interval’s size to calculate for this model. See the prediction intervals documentation for more information.

Returns

jobJob: a Job tracking the prediction intervals computation

Return type: Job

get_calculated_prediction_intervals(offset=None, limit=None)¶

Retrieve a list of already-calculated prediction intervals for this model

New in version v2.19.

Parameters

offsetint, optional: If provided, this many results will be skipped
limitint, optional: If provided, at most this many results will be returned. If not provided, will return at most 100 results.

Returns

list[int]: A descending-ordered list of already-calculated prediction interval sizes

compute_datetime_trend_plots(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance_start=None, forecast_distance_end=None)¶

Computes datetime trend plots (Accuracy over Time, Forecast vs Actual, Anomaly over Time) for this model

New in version v2.25.

Parameters

backtestint or string, optional: Compute plots for a specific backtest (use the backtest index starting from zero). To compute plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
sourcestring, optional: The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE
forecast_distance_startint, optional:: The start of forecast distance range (forecast window) to compute. If not specified, the first forecast distance for this project will be used. Only for time series supervised models
forecast_distance_endint, optional:: The end of forecast distance range (forecast window) to compute. If not specified, the last forecast distance for this project will be used. Only for time series supervised models

Returns

jobJob: a Job tracking the datetime trend plots computation

Notes

Forecast distance specifies the number of time steps between the predicted point and the origin point.
For the multiseries models only first 1000 series in alphabetical order and an average plot for them will be computed.
Maximum 100 forecast distances can be requested for calculation in time series supervised projects.

get_accuracy_over_time_plots_metadata(forecast_distance=None)¶

Retrieve Accuracy over Time plots metadata for this model.

New in version v2.25.

Parameters

forecast_distanceint, optional: Forecast distance to retrieve the metadata for. If not specified, the first forecast distance for this project will be used. Only available for time series projects.

Returns

metadataAccuracyOverTimePlotsMetadata: a AccuracyOverTimePlotsMetadata representing Accuracy over Time plots metadata

get_accuracy_over_time_plot(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance=None, series_id=None, resolution=None, max_bin_size=None, start_date=None, end_date=None, max_wait=600)¶

Retrieve Accuracy over Time plots for this model.

New in version v2.25.

Parameters

backtestint or string, optional: Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
sourcestring, optional: The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE
forecast_distanceint, optional: Forecast distance to retrieve the plots for. If not specified, the first forecast distance for this project will be used. Only available for time series projects.
series_idstring, optional: The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
resolutionstring, optional: Specifying at which resolution the data should be binned. If not provided an optimal resolution will be used to build chart data with number of bins <= max_bin_size. One of dr.enums.DATETIME_TREND_PLOTS_RESOLUTION.
max_bin_sizeint, optional: An int between 1 and 1000, which specifies the maximum number of bins for the retrieval. Default is 500.
start_datedatetime.datetime, optional: The start of the date range to return. If not specified, start date for requested plot will be used.
end_datedatetime.datetime, optional: The end of the date range to return. If not specified, end date for requested plot will be used.
max_waitint or None, optional: The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.

Returns

plotAccuracyOverTimePlot: a AccuracyOverTimePlot representing Accuracy over Time plot

Examples

import datarobot as dr
import pandas as pd
model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_accuracy_over_time_plot()
df = pd.DataFrame.from_dict(plot.bins)
figure = df.plot("start_date", ["actual", "predicted"]).get_figure()
figure.savefig("accuracy_over_time.png")

get_accuracy_over_time_plot_preview(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance=None, series_id=None, max_wait=600)¶

Retrieve Accuracy over Time preview plots for this model.

New in version v2.25.

Parameters

backtestint or string, optional: Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
sourcestring, optional: The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE
forecast_distanceint, optional: Forecast distance to retrieve the plots for. If not specified, the first forecast distance for this project will be used. Only available for time series projects.
series_idstring, optional: The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
max_waitint or None, optional: The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.

Returns

plotAccuracyOverTimePlotPreview: a AccuracyOverTimePlotPreview representing Accuracy over Time plot preview

Examples

import datarobot as dr
import pandas as pd
model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_accuracy_over_time_plot_preview()
df = pd.DataFrame.from_dict(plot.bins)
figure = df.plot("start_date", ["actual", "predicted"]).get_figure()
figure.savefig("accuracy_over_time_preview.png")

get_forecast_vs_actual_plots_metadata()¶

Retrieve Forecast vs Actual plots metadata for this model.

New in version v2.25.

Returns

metadataForecastVsActualPlotsMetadata: a ForecastVsActualPlotsMetadata representing Forecast vs Actual plots metadata

get_forecast_vs_actual_plot(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance_start=None, forecast_distance_end=None, series_id=None, resolution=None, max_bin_size=None, start_date=None, end_date=None, max_wait=600)¶

Retrieve Forecast vs Actual plots for this model.

New in version v2.25.

Parameters

backtestint or string, optional: Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
sourcestring, optional: The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE
forecast_distance_startint, optional:: The start of forecast distance range (forecast window) to retrieve. If not specified, the first forecast distance for this project will be used.
forecast_distance_endint, optional:: The end of forecast distance range (forecast window) to retrieve. If not specified, the last forecast distance for this project will be used.
series_idstring, optional: The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
resolutionstring, optional: Specifying at which resolution the data should be binned. If not provided an optimal resolution will be used to build chart data with number of bins <= max_bin_size. One of dr.enums.DATETIME_TREND_PLOTS_RESOLUTION.
max_bin_sizeint, optional: An int between 1 and 1000, which specifies the maximum number of bins for the retrieval. Default is 500.
start_datedatetime.datetime, optional: The start of the date range to return. If not specified, start date for requested plot will be used.
end_datedatetime.datetime, optional: The end of the date range to return. If not specified, end date for requested plot will be used.
max_waitint or None, optional: The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.

Returns

plotForecastVsActualPlot: a ForecastVsActualPlot representing Forecast vs Actual plot

Examples

import datarobot as dr
import pandas as pd
import matplotlib.pyplot as plt

model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_forecast_vs_actual_plot()
df = pd.DataFrame.from_dict(plot.bins)

# As an example, get the forecasts for the 10th point
forecast_point_index = 10
# Pad the forecasts for plotting. The forecasts length must match the df length
forecasts = [None] * forecast_point_index + df.forecasts[forecast_point_index]
forecasts = forecasts + [None] * (len(df) - len(forecasts))

plt.plot(df.start_date, df.actual, label="Actual")
plt.plot(df.start_date, forecasts, label="Forecast")
forecast_point = df.start_date[forecast_point_index]
plt.title("Forecast vs Actual (Forecast Point {})".format(forecast_point))
plt.legend()
plt.savefig("forecast_vs_actual.png")

get_forecast_vs_actual_plot_preview(backtest=0, source=SOURCE_TYPE.VALIDATION, series_id=None, max_wait=600)¶

Retrieve Forecast vs Actual preview plots for this model.

New in version v2.25.

Parameters

backtestint or string, optional: Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
sourcestring, optional: The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE
series_idstring, optional: The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
max_waitint or None, optional: The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.

Returns

plotForecastVsActualPlotPreview: a ForecastVsActualPlotPreview representing Forecast vs Actual plot preview

Examples

import datarobot as dr
import pandas as pd
model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_forecast_vs_actual_plot_preview()
df = pd.DataFrame.from_dict(plot.bins)
figure = df.plot("start_date", ["actual", "predicted"]).get_figure()
figure.savefig("forecast_vs_actual_preview.png")

get_anomaly_over_time_plots_metadata()¶

Retrieve Anomaly over Time plots metadata for this model.

New in version v2.25.

Returns

metadataAnomalyOverTimePlotsMetadata: a AnomalyOverTimePlotsMetadata representing Anomaly over Time plots metadata

get_anomaly_over_time_plot(backtest=0, source=SOURCE_TYPE.VALIDATION, series_id=None, resolution=None, max_bin_size=None, start_date=None, end_date=None, max_wait=600)¶

Retrieve Anomaly over Time plots for this model.

New in version v2.25.

Parameters

backtestint or string, optional: Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
sourcestring, optional: The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE
series_idstring, optional: The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
resolutionstring, optional: Specifying at which resolution the data should be binned. If not provided an optimal resolution will be used to build chart data with number of bins <= max_bin_size. One of dr.enums.DATETIME_TREND_PLOTS_RESOLUTION.
max_bin_sizeint, optional: An int between 1 and 1000, which specifies the maximum number of bins for the retrieval. Default is 500.
start_datedatetime.datetime, optional: The start of the date range to return. If not specified, start date for requested plot will be used.
end_datedatetime.datetime, optional: The end of the date range to return. If not specified, end date for requested plot will be used.
max_waitint or None, optional: The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.

Returns

plotAnomalyOverTimePlot: a AnomalyOverTimePlot representing Anomaly over Time plot

Examples

import datarobot as dr
import pandas as pd
model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_anomaly_over_time_plot()
df = pd.DataFrame.from_dict(plot.bins)
figure = df.plot("start_date", "predicted").get_figure()
figure.savefig("anomaly_over_time.png")

get_anomaly_over_time_plot_preview(prediction_threshold=0.5, backtest=0, source=SOURCE_TYPE.VALIDATION, series_id=None, max_wait=600)¶

Retrieve Anomaly over Time preview plots for this model.

New in version v2.25.

Parameters

prediction_threshold: float, optional: Only bins with predictions exceeding this threshold will be returned in the response.
backtestint or string, optional: Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use dr.enums.DATA_SUBSET.HOLDOUT
sourcestring, optional: The source of the data for the backtest/holdout. Attribute must be one of dr.enums.SOURCE_TYPE
series_idstring, optional: The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
max_waitint or None, optional: The maximum time to wait for a compute job to complete before retrieving the plots. Default is dr.enums.DEFAULT_MAX_WAIT. If 0 or None, the plots would be retrieved without attempting the computation.

Returns

plotAnomalyOverTimePlotPreview: a AnomalyOverTimePlotPreview representing Anomaly over Time plot preview

Examples

import datarobot as dr
import pandas as pd
import matplotlib.pyplot as plt

model = dr.DatetimeModel(project_id=project_id, id=model_id)
plot = model.get_anomaly_over_time_plot_preview(prediction_threshold=0.01)
df = pd.DataFrame.from_dict(plot.bins)
x = pd.date_range(
    plot.start_date, plot.end_date, freq=df.end_date[0] - df.start_date[0]
)
plt.plot(x, [0] * len(x), label="Date range")
plt.plot(df.start_date, [0] * len(df.start_date), "ro", label="Anomaly")
plt.yticks([])
plt.legend()
plt.savefig("anomaly_over_time_preview.png")

initialize_anomaly_assessment(backtest, source, series_id=None)¶

Initialize the anomaly assessment insight and calculate Shapley explanations for the most anomalous points in the subset. The insight is available for anomaly detection models in time series unsupervised projects which also support calculation of Shapley values.

Parameters

backtest: int starting with 0 or “holdout”: The backtest to compute insight for.
source: “training” or “validation”: The source to compute insight for.
series_id: string: Required for multiseries projects. The series id to compute insight for. Say if there is a series column containing cities, the example of the series name to pass would be “Boston”

Returns

AnomalyAssessmentRecord

get_anomaly_assessment_records(backtest=None, source=None, series_id=None, limit=100, offset=0, with_data_only=False)¶

Retrieve computed Anomaly Assessment records for this model. Model must be an anomaly detection model in time series unsupervised project which also supports calculation of Shapley values.

Records can be filtered by the data backtest, source and series_id. The results can be limited.

New in version v2.25.

Parameters

backtest: int starting with 0 or “holdout”: The backtest of the data to filter records by.
source: “training” or “validation”: The source of the data to filter records by.
series_id: string: The series id to filter records by.
limit: int, optional
offset: int, optional
with_data_only: bool, optional: Whether to return only records with preview and explanations available. False by default.

Returns

recordslist of AnomalyAssessmentRecord: a AnomalyAssessmentRecord representing Anomaly Assessment Record

get_feature_impact(with_metadata=False, backtest=None)¶

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters

with_metadatabool: The flag indicating if the result should include the metadata as well.
backtestint or string: The index of the backtest unless it is holdout then it is string ‘holdout’. This is supported only in DatetimeModels

Returns

list or dict

The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:

featureImpacts - Feature Impact data as a dictionary. Each item is a dict with
keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.

shapBased - A boolean that indicates whether Feature Impact was calculated using
Shapley values.

ranRedundancyDetection - A boolean that indicates whether redundant feature
identification was run while calculating this Feature Impact.

rowCount - An integer or None that indicates the number of rows that was used to
calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.

count - An integer with the number of features under the featureImpacts.

Raises

ClientError (404): If the feature impacts have not been computed.

request_feature_impact(row_count=None, with_metadata=False, backtest=None)¶

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters

row_countint: The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multi-class (that has a separate method) and time series projects.
backtestint or string: The index of the backtest unless it is holdout then it is string ‘holdout’. This is supported only in DatetimeModels

Returns

jobJob: A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.

Raises

JobAlreadyRequested (422): If the feature impacts have already been requested.

get_or_request_feature_impact(max_wait=600, row_count=None, with_metadata=False, backtest=None)¶

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters

max_waitint, optional: The maximum time to wait for a requested feature impact job to complete before erroring
**kwargs: Arbitrary keyword arguments passed to request_feature_impact.

Returns

feature_impactslist or dict: The feature impact data. See get_feature_impact for the exact schema.

advanced_tune(params, description=None)¶

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters

paramsdict: Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
descriptionstr: Human-readable string describing the newly advanced-tuned model

Returns

ModelJob: The created job to build the model

Return type: ModelJob

delete()¶

Delete a model from the project’s leaderboard.

Return type: None

download_export(filepath)¶

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters

filepathstr: The path at which to save the exported model file.

Return type: None

download_scoring_code(file_name, source_code=False)¶

Download the Scoring Code JAR.

Parameters

file_namestr: File path where scoring code will be saved.
source_codebool, optional: Set to True to download source code archive. It will not be executable.

download_training_artifact(file_name)¶

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters

file_namestr: File path where trained model artifact(s) will be saved.

classmethod from_data(data)¶

Instantiate an object of this class using a dict.

Parameters

datadict: Correctly snake_cased keys and their values.

Return type: TypeVar(T, bound= APIObject)

get_advanced_tuning_parameters()¶

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns

dict

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys

parameter_name : (str) name of the parameter (unique per task, see below)
parameter_id : (str) opaque ID string uniquely identifying parameter
default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)
current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.
task_name : (str) name of the task that this parameter belongs to
constraints: (dict) see the notes below
vertex_id: (str) ID of vertex that this parameter belongs to

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

Return type: AdvancedTuningParamsType

get_all_confusion_charts(fallback_to_parent_insights=False)¶

Retrieve a list of all confusion matrices available for the model.

Parameters

fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns

list of ConfusionChart: Data for all available confusion charts for model.

get_all_feature_impacts(data_slice_filter=None)¶

Retrieve a list of all feature impact results available for the model.

Parameters

data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.

Returns

list of dicts: Data for all available model feature impacts. Or an empty list if not data found.

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()

get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all Lift charts available for the model.

Parameters

fallback_to_parent_insightsbool, optional: (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
data_slice_filterDataSlice, optional: Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns

list of LiftChart: Data for all available model lift charts. Or an empty list if no data found.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()

get_all_multiclass_lift_charts(fallback_to_parent_insights=False)¶

Retrieve a list of all Lift charts available for the model.

Parameters

fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns

list of LiftChart: Data for all available model lift charts.

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all residuals charts available for the model.

Parameters

fallback_to_parent_insightsbool: Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
data_slice_filterDataSlice, optional: Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns

list of ResidualsChart: Data for all available model residuals charts.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()

get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all ROC curves available for the model.

Parameters

fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
data_slice_filterDataSlice, optional: filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns

list of RocCurve: Data for all available model ROC curves. Or an empty list if no RocCurves are found.

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()

get_confusion_chart(source, fallback_to_parent_insights=False)¶

Retrieve them model’s confusion matrix for the specified source.

Parameters

sourcestr: Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns

ConfusionChart: Model ConfusionChart data

Raises

ClientError: If the insight is not available for this model

get_cross_class_accuracy_scores()¶

Retrieves a list of Cross Class Accuracy scores for the model.

Returns

json

get_data_disparity_insights(feature, class_name1, class_name2)¶

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters

featurestr: Bias and Fairness protected feature name.
class_name1str: One of the compared classes
class_name2str: Another compared class

Returns

json

get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)¶

Retrieve a list of Per Class Bias insights for the model.

Parameters

fairness_metrics_setstr, optional: Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
offsetint, optional: Number of items to skip.
limitint, optional: Number of items to return.

Returns

json

get_features_used()¶

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns

featureslist of str: The names of the features used in the model.

Return type: List[str]

get_frozen_child_models()¶

Retrieve the IDs for all models that are frozen from this model.

Returns

A list of Models

get_labelwise_roc_curves(source, fallback_to_parent_insights=False)¶

Retrieve a list of LabelwiseRocCurve instances for a multilabel model the given source and all labels.

New in version v2.24.

Parameters

sourcestr: ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns

list ofclass:LabelwiseRocCurve <datarobot.models.roc_curve.LabelwiseRocCurve>: Labelwise ROC Curve instances for source and all labels

Raises

ClientError: If the insight is not available for this model
(New in version v3.0) TypeError: If the underlying project type is binary

get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶

Retrieve the model Lift chart for the specified source.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

Returns

LiftChart: Model lift chart data

Raises

ClientError: If the insight is not available for this model
ValueError: If data_slice_filter passed as None

get_missing_report_info()¶

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns

An iterable of MissingReportPerFeature: The queried model missing report, sorted by missing count (DESCENDING order).

get_model_blueprint_chart()¶

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns

ModelBlueprintChart: The queried model blueprint chart.

get_model_blueprint_documents()¶

Get documentation for tasks used in this model.

Returns

list of BlueprintTaskDocument: All documents available for the model.

get_model_blueprint_json()¶

Get the blueprint json representation used by this model.

Returns

BlueprintJson: Json representation of the blueprint stages.

Return type: Dict[str, Tuple[List[str], List[str], str]]

get_multiclass_feature_impact()¶

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns

feature_impactslist of dict: The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises

ClientError (404): If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False)¶

Retrieve model Lift chart for the specified source.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns

list of LiftChart: Model lift chart data for each saved target class

Raises

ClientError: If the insight is not available for this model

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)¶

Retrieve model Lift charts for the specified source.

New in version v2.24.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns

list of LiftChart: Model lift chart data for each saved target class

Raises

ClientError: If the insight is not available for this model

get_num_iterations_trained()¶

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns

projectId: str: id of project containing the model
modelId: str: id of the model
data: array: list of numEstimatorsItem objects, one for each modeling stage.
numEstimatorsItem will be of the form:
stage: str: indicates the modeling stage (for multi-stage models); None of single-stage models
numIterations: int: the number of estimators or iterations trained by the model

get_parameters()¶

Retrieve model parameters.

Returns

ModelParameters: Model parameters for this model.

get_pareto_front()¶

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns

ParetoFront: Model ParetoFront data

get_prime_eligibility()¶

Check if this model can be approximated with DataRobot Prime

Returns

prime_eligibilitydict: a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶

Retrieve model residuals chart for the specified source.

Parameters

sourcestr: Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.

Returns

ResidualsChart: Model residuals chart data

Raises

ClientError: If the insight is not available for this model
ValueError: If data_slice_filter passed as None

get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶

Retrieve the ROC curve for a binary model for the specified source.

Parameters

sourcestr: ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns

RocCurve: Model ROC curve data

Raises

ClientError: If the insight is not available for this model
(New in version v3.0) TypeError: If the underlying project type is multilabel
ValueError: If data_slice_filter passed as None

get_rulesets()¶

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns

rulesetslist of Ruleset

Return type: List[Ruleset]

get_supported_capabilities()¶

Retrieves a summary of the capabilities supported by a model.

New in version v2.14.

Returns

supportsBlending: bool

whether the model supports blending

supportsMonotonicConstraints: bool

whether the model supports monotonic constraints

hasWordCloud: bool

whether the model has word cloud data available

eligibleForPrime: bool

whether the model is eligible for Prime

hasParameters: bool

whether the model has parameters that can be retrieved

supportsCodeGeneration: bool

(New in version v2.18) whether the model supports code generation

supportsShap: bool

(New in version v2.18) True if the model supports Shapley package. i.e. Shapley based: feature Importance

supportsEarlyStopping: bool

(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_uri()¶

Returns

urlstr: Permanent static hyperlink to this model at leaderboard.

Return type: str

get_word_cloud(exclude_stop_words=False)¶

Retrieve word cloud data for the model.

Parameters

exclude_stop_wordsbool, optional: Set to True if you want stopwords filtered out of response.

Returns

WordCloud: Word cloud data for the model.

incremental_train(data_stage_id, training_data_name=None)¶

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters

data_stage_id: str: The id of the data stage to use for training.
training_data_namestr, optional: The name of the iteration or data stage to indicate what the incremental learning was performed on.

Returns

jobModelJob: The created job that is retraining the model

Return type: ModelJob

open_in_browser()¶

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type: None

request_approximation()¶

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns

jobJob: the job generating the rulesets

request_cross_class_accuracy_scores()¶

Request data disparity insights to be computed for the model.

Returns

status_idstr: A statusId of computation request.

request_data_disparity_insights(feature, compared_class_names)¶

Request data disparity insights to be computed for the model.

Parameters

featurestr: Bias and Fairness protected feature name.
compared_class_nameslist(str): List of two classes to compare

Returns

status_idstr: A statusId of computation request.

request_external_test(dataset_id, actual_value_column=None)¶

Request external test to compute scores and insights on an external test dataset

Parameters

dataset_idstring: The dataset to make predictions against (as uploaded from Project.upload_dataset)
actual_value_columnstring, optional: (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
Returns
——-
jobJob: a Job representing external dataset insights computation

request_fairness_insights(fairness_metrics_set=None)¶

Request fairness insights to be computed for the model.

Parameters

fairness_metrics_setstr, optional: Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

Returns

status_idstr: A statusId of computation request.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)¶

Train a new frozen model with parameters from this model.

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters

training_row_countint, optional: the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
training_durationstr, optional: a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
training_start_datedatetime.datetime, optional: the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
training_end_datedatetime.datetime, optional: the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
time_window_sample_pctint, optional: may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
sampling_methodstr, optional: (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

Returns

model_jobModelJob: the modeling job training a frozen model

Return type: ModelJob

request_lift_chart(source, data_slice_id=None)¶

Request the model Lift Chart for the specified source.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
data_slice_idstring, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

status_check_jobStatusCheckJob: Object contains all needed logic for a periodical status check of an async job.

Return type: StatusCheckJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)¶

Requests predictions against a previously uploaded dataset.

Parameters

dataset_idstring, optional: The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
datasetDataset, optional: The dataset to make predictions against (as uploaded from Project.upload_dataset)
dataframepd.DataFrame, optional: (New in v3.0) The dataframe to make predictions against
file_pathstr, optional: (New in v3.0) Path to file to make predictions against
fileIOBase, optional: (New in v3.0) File to make predictions against
include_prediction_intervalsbool, optional: (New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
prediction_intervals_sizeint, optional: (New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
forecast_pointdatetime.datetime or None, optional: (New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
predictions_start_datedatetime.datetime or None, optional: (New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.
predictions_end_datedatetime.datetime or None, optional: (New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.
actual_value_columnstring, optional: (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the: response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
max_explanations: (New in version v2.21) int optional; specifies the maximum number of: explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
max_ngram_explanations: optional; int or str: (New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.

Returns

jobPredictJob: The job computing the predictions

Return type: PredictJob

request_residuals_chart(source, data_slice_id=None)¶

Request the model residuals chart for the specified source.

Parameters

sourcestr: Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
data_slice_idstring, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

status_check_jobStatusCheckJob: Object contains all needed logic for a periodical status check of an async job.

Return type: StatusCheckJob

request_roc_curve(source, data_slice_id=None)¶

Request the model Roc Curve for the specified source.

Parameters

sourcestr: Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
data_slice_idstring, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

status_check_jobStatusCheckJob: Object contains all needed logic for a periodical status check of an async job.

Return type: StatusCheckJob

request_transferable_export(prediction_intervals_size=None)¶

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Parameters

prediction_intervals_sizeint, optional: (New in v2.19) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Prediction intervals size must be between 1 and 100 (inclusive).

Returns

Job

Examples

model = datarobot.Model.get('project-id', 'model-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')

Return type: Job

set_prediction_threshold(threshold)¶

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters

thresholdfloat: only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()¶

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

start_advanced_tuning_session()¶

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns

AdvancedTuningSession: Session for setting up and running Advanced Tuning on a model

train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)¶

Trains this model on a different featurelist or sample size.

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters

featurelist_idstr, optional: the featurelist to use to train the model. If not specified, the featurelist of this model is used.
training_row_countint, optional: the number of rows of data that should be used to train the model. If specified, neither training_duration nor use_project_settings may be specified.
training_durationstr, optional: a duration string specifying what time range the data used to train the model should span. If specified, neither training_row_count nor use_project_settings may be specified.
use_project_settingsbool, optional: (New in version v2.20) defaults to False. If True, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neither training_row_count nor training_duration may be specified.
time_window_sample_pctint, optional: may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
sampling_methodstr, optional: (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.
monotonic_increasing_featurelist_idstr, optional: (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
monotonic_decreasing_featurelist_idstr, optional: (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
n_clusters: int, optional: (New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.

Returns

jobModelJob: the created job to build the model

Return type: ModelJob

unstar_model()¶

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

Frozen Model¶

class datarobot.models.FrozenModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, parent_model_id=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, supports_composable_ml=None)¶

Represents a model tuned with parameters which are derived from another model

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Attributes

idstr: the id of the model
project_idstr: the id of the project the model belongs to
processeslist of str: the processes used by the model
featurelist_namestr: the name of the featurelist used by the model
featurelist_idstr: the id of the featurelist used by the model
sample_pctfloat: the percentage of the project dataset used in training the model
training_row_countint or None: the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
training_durationstr or None: only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
training_start_datedatetime or None: only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
training_end_datedatetime or None: only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
model_typestr: what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
model_categorystr: what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
is_frozenbool: whether this model is a frozen model
parent_model_idstr: the id of the model that tuning parameters are derived from
blueprint_idstr: the id of the blueprint used in this model
metricsdict: a mapping from each metric to the model’s scores for that metric
monotonic_increasing_featurelist_idstr: optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
monotonic_decreasing_featurelist_idstr: optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
supports_monotonic_constraintsbool: optional, whether this model supports enforcing monotonic constraints
is_starredbool: whether this model marked as starred
prediction_thresholdfloat: for binary classification projects, the threshold used for predictions
prediction_threshold_read_onlybool: indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
model_numberinteger: model number assigned to a model
supports_composable_mlbool or None: (New in version v2.26) whether this model is supported in the Composable ML.

classmethod get(project_id, model_id)¶

Retrieve a specific frozen model.

Parameters

project_idstr: The project’s id.
model_idstr: The model_id of the leaderboard item to retrieve.

Returns

modelFrozenModel: The queried instance.

Imported Model¶

Note

Imported Models are used in Stand Alone Scoring Engines. If you are not an administrator of such an engine, they are not relevant to you.

class datarobot.models.ImportedModel(id, imported_at=None, model_id=None, target=None, featurelist_name=None, dataset_name=None, model_name=None, project_id=None, note=None, origin_url=None, imported_by_username=None, project_name=None, created_by_username=None, created_by_id=None, imported_by_id=None, display_name=None)¶

Represents an imported model available for making predictions. These are only relevant for administrators of on-premise Stand Alone Scoring Engines.

ImportedModels are trained in one DataRobot application, exported as a .drmodel file, and then imported for use in a Stand Alone Scoring Engine.

Attributes

idstr: id of the import
model_namestr: model type describing the model generated by DataRobot
display_namestr: manually specified human-readable name of the imported model
notestr: manually added node about this imported model
imported_atdatetime: the time the model was imported
imported_by_usernamestr: username of the user who imported the model
imported_by_idstr: id of the user who imported the model
origin_urlstr: URL of the application the model was exported from
model_idstr: original id of the model prior to export
featurelist_namestr: name of the featurelist used to train the model
project_idstr: id of the project the model belonged to prior to export
project_namestr: name of the project the model belonged to prior to export
targetstr: the target of the project the model belonged to prior to export
dataset_namestr: filename of the dataset used to create the project the model belonged to
created_by_usernamestr: username of the user who created the model prior to export
created_by_idstr: id of the user who created the model prior to export

classmethod create(path, max_wait=600)¶

Import a previously exported model for predictions.

Parameters

pathstr: The path to the exported model file
max_waitint, optional: Time in seconds after which model import is considered unsuccessful

Return type: ImportedModel

classmethod get(import_id)¶

Retrieve imported model info

Parameters

import_idstr: The ID of the imported model.

Returns

imported_modelImportedModel: The ImportedModel instance

Return type: ImportedModel

classmethod list(limit=None, offset=None)¶

List the imported models.

Parameters

limitint: The number of records to return. The server will use a (possibly finite) default if not specified.
offsetint: The number of records to skip.

Returns

imported_modelslist[ImportedModel]

Return type: List[ImportedModel]

update(display_name=None, note=None)¶

Update the display name or note for an imported model. The ImportedModel object is updated in place.

Parameters

display_namestr: The new display name.
notestr: The new note.

Return type: None

delete()¶

Delete this imported model.

Return type: None

RatingTableModel¶

class datarobot.models.RatingTableModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, rating_table_id=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, supports_composable_ml=None)¶

A model that has a rating table.

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Attributes

idstr: the id of the model
project_idstr: the id of the project the model belongs to
processeslist of str: the processes used by the model
featurelist_namestr: the name of the featurelist used by the model
featurelist_idstr: the id of the featurelist used by the model
sample_pctfloat or None: the percentage of the project dataset used in training the model. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date and training_end_date instead.
training_row_countint or None: the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
training_durationstr or None: only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
training_start_datedatetime or None: only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
training_end_datedatetime or None: only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
model_typestr: what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
model_categorystr: what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
is_frozenbool: whether this model is a frozen model
blueprint_idstr: the id of the blueprint used in this model
metricsdict: a mapping from each metric to the model’s scores for that metric
rating_table_idstr: the id of the rating table that belongs to this model
monotonic_increasing_featurelist_idstr: optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
monotonic_decreasing_featurelist_idstr: optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
supports_monotonic_constraintsbool: optional, whether this model supports enforcing monotonic constraints
is_starredbool: whether this model marked as starred
prediction_thresholdfloat: for binary classification projects, the threshold used for predictions
prediction_threshold_read_onlybool: indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
model_numberinteger: model number assigned to a model
supports_composable_mlbool or None: (New in version v2.26) whether this model is supported in the Composable ML.

classmethod get(project_id, model_id)¶

Retrieve a specific rating table model

If the project does not have a rating table, a ClientError will occur.

Parameters

project_idstr: the id of the project the model belongs to
model_idstr: the id of the model to retrieve

Returns

modelRatingTableModel: the model

classmethod create_from_rating_table(project_id, rating_table_id)¶

Creates a new model from a validated rating table record. The RatingTable must not be associated with an existing model.

Parameters

project_idstr: the id of the project the rating table belongs to
rating_table_idstr: the id of the rating table to create this model from

Returns

job: Job: an instance of created async job

Raises

ClientError (422): Raised if creating model from a RatingTable that failed validation
JobAlreadyRequested: Raised if creating model from a RatingTable that is already associated with a RatingTableModel

Return type: Job

advanced_tune(params, description=None)¶

Generate a new model with the specified advanced-tuning parameters

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Parameters

paramsdict: Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
descriptionstr: Human-readable string describing the newly advanced-tuned model

Returns

ModelJob: The created job to build the model

Return type: ModelJob

cross_validate()¶

Run cross validation on the model.

Note

To perform Cross Validation on a new model with new parameters, use train instead.

Returns

ModelJob: The created job to build the model

delete()¶

Delete a model from the project’s leaderboard.

Return type: None

download_export(filepath)¶

Download an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

Parameters

filepathstr: The path at which to save the exported model file.

Return type: None

download_scoring_code(file_name, source_code=False)¶

Download the Scoring Code JAR.

Parameters

file_namestr: File path where scoring code will be saved.
source_codebool, optional: Set to True to download source code archive. It will not be executable.

download_training_artifact(file_name)¶

Retrieve trained artifact(s) from a model containing one or more custom tasks.

Artifact(s) will be downloaded to the specified local filepath.

Parameters

file_namestr: File path where trained model artifact(s) will be saved.

classmethod from_data(data)¶

Instantiate an object of this class using a dict.

Parameters

datadict: Correctly snake_cased keys and their values.

Return type: TypeVar(T, bound= APIObject)

get_advanced_tuning_parameters()¶

Get the advanced-tuning parameters available for this model.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns

dict

A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.

tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.

tuning_parameters is a list of a dicts, each has the following keys

parameter_name : (str) name of the parameter (unique per task, see below)
parameter_id : (str) opaque ID string uniquely identifying parameter
default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)
current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.
task_name : (str) name of the task that this parameter belongs to
constraints: (dict) see the notes below
vertex_id: (str) ID of vertex that this parameter belongs to

Notes

The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.

constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.

"constraints": {
    "select": {
        "values": [<list(basestring or number) : possible values>]
    },
    "ascii": {},
    "unicode": {},
    "int": {
        "min": <int : minimum valid value>,
        "max": <int : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "float": {
        "min": <float : minimum valid value>,
        "max": <float : maximum valid value>,
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "intList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <int : minimum valid value>,
        "max_val": <int : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    },
    "floatList": {
        "min_length": <int : minimum valid length>,
        "max_length": <int : maximum valid length>
        "min_val": <float : minimum valid value>,
        "max_val": <float : maximum valid value>
        "supports_grid_search": <bool : True if Grid Search may be
                                        requested for this param>
    }
}

The keys have meaning as follows:

select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).

Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.

Return type: AdvancedTuningParamsType

get_all_confusion_charts(fallback_to_parent_insights=False)¶

Retrieve a list of all confusion matrices available for the model.

Parameters

fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns

list of ConfusionChart: Data for all available confusion charts for model.

get_all_feature_impacts(data_slice_filter=None)¶

Retrieve a list of all feature impact results available for the model.

Parameters

data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.

Returns

list of dicts: Data for all available model feature impacts. Or an empty list if not data found.

Examples

model = datarobot.Model(id='model-id', project_id='project-id')

# Get feature impact insights for sliced data
data_slice = datarobot.DataSlice(id='data-slice-id')
sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get feature impact insights for unsliced data
data_slice = datarobot.DataSlice()
unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice)

# Get all feature impact insights
all_fi = model.get_all_feature_impacts()

get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all Lift charts available for the model.

Parameters

fallback_to_parent_insightsbool, optional: (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
data_slice_filterDataSlice, optional: Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns

list of LiftChart: Data for all available model lift charts. Or an empty list if no data found.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get lift chart insights for sliced data
sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id')

# Get lift chart insights for unsliced data
unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True)

# Get all lift chart insights
all_lift_charts = model.get_all_lift_charts()

get_all_multiclass_lift_charts(fallback_to_parent_insights=False)¶

Retrieve a list of all Lift charts available for the model.

Parameters

fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.

Returns

list of LiftChart: Data for all available model lift charts.

get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all residuals charts available for the model.

Parameters

fallback_to_parent_insightsbool: Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
data_slice_filterDataSlice, optional: Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns

list of ResidualsChart: Data for all available model residuals charts.

Examples

model = datarobot.Model.get('project-id', 'model-id')

# Get residuals chart insights for sliced data
sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id')

# Get residuals chart insights for unsliced data
unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True)

# Get all residuals chart insights
all_residuals_charts = model.get_all_residuals_charts()

get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)¶

Retrieve a list of all ROC curves available for the model.

Parameters

fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
data_slice_filterDataSlice, optional: filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.

Returns

list of RocCurve: Data for all available model ROC curves. Or an empty list if no RocCurves are found.

Examples

model = datarobot.Model.get('project-id', 'model-id')
ds_filter=DataSlice(id='data-slice-id')

# Get roc curve insights for sliced data
sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get roc curve insights for unsliced data
data_slice_filter=DataSlice(id=None)
unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter)

# Get all roc curve insights
all_roc_curves = model.get_all_roc_curves()

get_confusion_chart(source, fallback_to_parent_insights=False)¶

Retrieve them model’s confusion matrix for the specified source.

Parameters

sourcestr: Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns

ConfusionChart: Model ConfusionChart data

Raises

ClientError: If the insight is not available for this model

get_cross_class_accuracy_scores()¶

Retrieves a list of Cross Class Accuracy scores for the model.

Returns

json

get_cross_validation_scores(partition=None, metric=None)¶

Return a dictionary, keyed by metric, showing cross validation scores per partition.

Cross Validation should already have been performed using cross_validate or train.

Note

Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.

Parameters

partitionfloat: optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.
metric: unicode: optional name of the metric to filter to resulting cross validation scores by

Returns

cross_validation_scores: dict: A dictionary keyed by metric showing cross validation scores per partition.

get_data_disparity_insights(feature, class_name1, class_name2)¶

Retrieve a list of Cross Class Data Disparity insights for the model.

Parameters

featurestr: Bias and Fairness protected feature name.
class_name1str: One of the compared classes
class_name2str: Another compared class

Returns

json

get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)¶

Retrieve a list of Per Class Bias insights for the model.

Parameters

fairness_metrics_setstr, optional: Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
offsetint, optional: Number of items to skip.
limitint, optional: Number of items to return.

Returns

json

get_feature_effect(source, data_slice_id=None)¶

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters

sourcestring: The source Feature Effects are retrieved for.
data_slice_idstring, optional: ID for the data slice used in the request. If None, retrieve unsliced insight data.

Returns

feature_effectsFeatureEffects: The feature effects data.

Raises

ClientError (404): If the feature effects have not been computed or source is not valid value.

get_feature_effect_metadata()¶

Retrieve Feature Effects metadata. Response contains status and available model sources.

Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for the project.

Use source to retrieve Feature Effects, selecting one of the provided sources.

Returns

feature_effect_metadata: FeatureEffectMetadata

get_feature_effects_multiclass(source='training', class_=None)¶

Retrieve Feature Effects for the multiclass model.

Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters

sourcestr: The source Feature Effects are retrieved for.
class_str or None: The class name Feature Effects are retrieved for.

Returns

list: The list of multiclass feature effects.

Raises

ClientError (404): If Feature Effects have not been computed or source is not valid value.

get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶

Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.

Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.

If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.

Elsewhere this technique is sometimes called ‘Permutation Importance’.

Requires that Feature Impact has already been computed with request_feature_impact.

Parameters

with_metadatabool: The flag indicating if the result should include the metadata as well.
data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.

Returns

list or dict

The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.

Each List item is a dict with the keys featureName, impactNormalized, and impactUnnormalized, redundantWith and count.

For dict response available keys are:

featureImpacts - Feature Impact data as a dictionary. Each item is a dict with
keys: featureName, impactNormalized, and impactUnnormalized, and redundantWith.

shapBased - A boolean that indicates whether Feature Impact was calculated using
Shapley values.

ranRedundancyDetection - A boolean that indicates whether redundant feature
identification was run while calculating this Feature Impact.

rowCount - An integer or None that indicates the number of rows that was used to
calculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.

count - An integer with the number of features under the featureImpacts.

Raises

ClientError (404): If the feature impacts have not been computed.
ValueError: If data_slice_filter passed as None

get_features_used()¶

Query the server to determine which features were used.

Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.

Returns

featureslist of str: The names of the features used in the model.

Return type: List[str]

get_frozen_child_models()¶

Retrieve the IDs for all models that are frozen from this model.

Returns

A list of Models

get_labelwise_roc_curves(source, fallback_to_parent_insights=False)¶

Retrieve a list of LabelwiseRocCurve instances for a multilabel model the given source and all labels.

New in version v2.24.

Parameters

sourcestr: ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.

Returns

list ofclass:LabelwiseRocCurve <datarobot.models.roc_curve.LabelwiseRocCurve>: Labelwise ROC Curve instances for source and all labels

Raises

ClientError: If the insight is not available for this model
(New in version v3.0) TypeError: If the underlying project type is binary

get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶

Retrieve the model Lift chart for the specified source.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.

Returns

LiftChart: Model lift chart data

Raises

ClientError: If the insight is not available for this model
ValueError: If data_slice_filter passed as None

get_missing_report_info()¶

Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.

Returns

An iterable of MissingReportPerFeature: The queried model missing report, sorted by missing count (DESCENDING order).

get_model_blueprint_chart()¶

Retrieve a diagram that can be used to understand data flow in the blueprint.

Returns

ModelBlueprintChart: The queried model blueprint chart.

get_model_blueprint_documents()¶

Get documentation for tasks used in this model.

Returns

list of BlueprintTaskDocument: All documents available for the model.

get_model_blueprint_json()¶

Get the blueprint json representation used by this model.

Returns

BlueprintJson: Json representation of the blueprint stages.

Return type: Dict[str, Tuple[List[str], List[str], str]]

get_multiclass_feature_impact()¶

For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.

Requires that Feature Impact has already been computed with request_feature_impact.

Returns

feature_impactslist of dict: The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises

ClientError (404): If the multiclass feature impacts have not been computed.

get_multiclass_lift_chart(source, fallback_to_parent_insights=False)¶

Retrieve model Lift chart for the specified source.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns

list of LiftChart: Model lift chart data for each saved target class

Raises

ClientError: If the insight is not available for this model

get_multilabel_lift_charts(source, fallback_to_parent_insights=False)¶

Retrieve model Lift charts for the specified source.

New in version v2.24.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.

Returns

list of LiftChart: Model lift chart data for each saved target class

Raises

ClientError: If the insight is not available for this model

get_num_iterations_trained()¶

Retrieves the number of estimators trained by early-stopping tree-based models.

– versionadded:: v2.22

Returns

projectId: str: id of project containing the model
modelId: str: id of the model
data: array: list of numEstimatorsItem objects, one for each modeling stage.
numEstimatorsItem will be of the form:
stage: str: indicates the modeling stage (for multi-stage models); None of single-stage models
numIterations: int: the number of estimators or iterations trained by the model

get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)¶

Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.

See get_feature_effect_metadata for retrieving information of source.

Parameters

sourcestring: The source Feature Effects are retrieved for.
max_waitint, optional: The maximum time to wait for a requested Feature Effect job to complete before erroring.
row_countint, optional: (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
data_slice_idstr, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

feature_effectsFeatureEffects: The Feature Effects data.

get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)¶

Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.

Parameters

sourcestring: The source Feature Effects retrieve for.
class_str or None: The class name Feature Effects retrieve for.
row_countint: The number of rows from dataset to use for Feature Impact calculation.
top_n_featuresint or None: Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
featureslist or None: The list of features used to calculate Feature Effects.
max_waitint, optional: The maximum time to wait for a requested Feature Effects job to complete before erroring.

Returns

feature_effectslist of FeatureEffectsMulticlass: The list of multiclass feature effects data.

get_or_request_feature_impact(max_wait=600, **kwargs)¶

Retrieve feature impact for the model, requesting a job if it hasn’t been run previously

Parameters

max_waitint, optional: The maximum time to wait for a requested feature impact job to complete before erroring
**kwargs: Arbitrary keyword arguments passed to request_feature_impact.

Returns

feature_impactslist or dict: The feature impact data. See get_feature_impact for the exact schema.

get_parameters()¶

Retrieve model parameters.

Returns

ModelParameters: Model parameters for this model.

get_pareto_front()¶

Retrieve the Pareto Front for a Eureqa model.

This method is only supported for Eureqa models.

Returns

ParetoFront: Model ParetoFront data

get_prime_eligibility()¶

Check if this model can be approximated with DataRobot Prime

Returns

prime_eligibilitydict: a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)

get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶

Retrieve model residuals chart for the specified source.

Parameters

sourcestr: Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
fallback_to_parent_insightsbool: Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.

Returns

ResidualsChart: Model residuals chart data

Raises

ClientError: If the insight is not available for this model
ValueError: If data_slice_filter passed as None

get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶

Retrieve the ROC curve for a binary model for the specified source.

Parameters

sourcestr: ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
fallback_to_parent_insightsbool: (New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
data_slice_filterDataSlice, optional: A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.

Returns

RocCurve: Model ROC curve data

Raises

ClientError: If the insight is not available for this model
(New in version v3.0) TypeError: If the underlying project type is multilabel
ValueError: If data_slice_filter passed as None

get_rulesets()¶

List the rulesets approximating this model generated by DataRobot Prime

If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.

Returns

rulesetslist of Ruleset

Return type: List[Ruleset]

get_supported_capabilities()¶

Retrieves a summary of the capabilities supported by a model.

New in version v2.14.

Returns

supportsBlending: bool

whether the model supports blending

supportsMonotonicConstraints: bool

whether the model supports monotonic constraints

hasWordCloud: bool

whether the model has word cloud data available

eligibleForPrime: bool

whether the model is eligible for Prime

hasParameters: bool

whether the model has parameters that can be retrieved

supportsCodeGeneration: bool

(New in version v2.18) whether the model supports code generation

supportsShap: bool

(New in version v2.18) True if the model supports Shapley package. i.e. Shapley based: feature Importance

supportsEarlyStopping: bool

(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.

get_uri()¶

Returns

urlstr: Permanent static hyperlink to this model at leaderboard.

Return type: str

get_word_cloud(exclude_stop_words=False)¶

Retrieve word cloud data for the model.

Parameters

exclude_stop_wordsbool, optional: Set to True if you want stopwords filtered out of response.

Returns

WordCloud: Word cloud data for the model.

incremental_train(data_stage_id, training_data_name=None)¶

Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters

data_stage_id: str: The id of the data stage to use for training.
training_data_namestr, optional: The name of the iteration or data stage to indicate what the incremental learning was performed on.

Returns

jobModelJob: The created job that is retraining the model

Return type: ModelJob

open_in_browser()¶

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

Return type: None

request_approximation()¶

Request an approximation of this model using DataRobot Prime

This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.

Returns

jobJob: the job generating the rulesets

request_cross_class_accuracy_scores()¶

Request data disparity insights to be computed for the model.

Returns

status_idstr: A statusId of computation request.

request_data_disparity_insights(feature, compared_class_names)¶

Request data disparity insights to be computed for the model.

Parameters

featurestr: Bias and Fairness protected feature name.
compared_class_nameslist(str): List of two classes to compare

Returns

status_idstr: A statusId of computation request.

request_external_test(dataset_id, actual_value_column=None)¶

Request external test to compute scores and insights on an external test dataset

Parameters

dataset_idstring: The dataset to make predictions against (as uploaded from Project.upload_dataset)
actual_value_columnstring, optional: (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
Returns
——-
jobJob: a Job representing external dataset insights computation

request_fairness_insights(fairness_metrics_set=None)¶

Request fairness insights to be computed for the model.

Parameters

fairness_metrics_setstr, optional: Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.

Returns

status_idstr: A statusId of computation request.

request_feature_effect(row_count=None, data_slice_id=None)¶

Submit request to compute Feature Effects for the model.

See get_feature_effect for more information on the result of the job.

Parameters

row_countint: (New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
data_slice_idstr, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

jobJob: A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.

Raises

JobAlreadyRequested (422): If the feature effect have already been requested.

request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)¶

Request Feature Effects computation for the multiclass model.

See get_feature_effect for more information on the result of the job.

Parameters

row_countint: The number of rows from dataset to use for Feature Impact calculation.
top_n_featuresint or None: Number of top features (ranked by feature impact) used to calculate Feature Effects.
featureslist or None: The list of features used to calculate Feature Effects.

Returns

jobJob: A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.

request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)¶

Request feature impacts to be computed for the model.

See get_feature_impact for more information on the result of the job.

Parameters

row_countint, optional: The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.
with_metadatabool, optional: Flag indicating whether the result should include the metadata. If true, metadata is included.
data_slice_idstr, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

jobJob or status_id: Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.

Raises

JobAlreadyRequested (422): If the feature impacts have already been requested.

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)¶

Train a new frozen model with parameters from this model.

Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.

Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters

training_row_countint, optional: the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
training_durationstr, optional: a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
training_start_datedatetime.datetime, optional: the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
training_end_datedatetime.datetime, optional: the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
time_window_sample_pctint, optional: may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
sampling_methodstr, optional: (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.

Returns

model_jobModelJob: the modeling job training a frozen model

Return type: ModelJob

request_frozen_model(sample_pct=None, training_row_count=None)¶

Train a new frozen model with parameters from this model

Note

This method only works if project the model belongs to is not datetime partitioned. If it is, use request_frozen_datetime_model instead.

Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.

Parameters

sample_pctfloat: optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.
training_row_countint: (New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.

Returns

model_jobModelJob: the modeling job training a frozen model

Return type: ModelJob

request_lift_chart(source, data_slice_id=None)¶

Request the model Lift Chart for the specified source.

Parameters

sourcestr: Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
data_slice_idstring, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

status_check_jobStatusCheckJob: Object contains all needed logic for a periodical status check of an async job.

Return type: StatusCheckJob

request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)¶

Requests predictions against a previously uploaded dataset.

Parameters

dataset_idstring, optional: The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
datasetDataset, optional: The dataset to make predictions against (as uploaded from Project.upload_dataset)
dataframepd.DataFrame, optional: (New in v3.0) The dataframe to make predictions against
file_pathstr, optional: (New in v3.0) Path to file to make predictions against
fileIOBase, optional: (New in v3.0) File to make predictions against
include_prediction_intervalsbool, optional: (New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
prediction_intervals_sizeint, optional: (New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
forecast_pointdatetime.datetime or None, optional: (New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
predictions_start_datedatetime.datetime or None, optional: (New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.
predictions_end_datedatetime.datetime or None, optional: (New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.
actual_value_columnstring, optional: (New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the: response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
max_explanations: (New in version v2.21) int optional; specifies the maximum number of: explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
max_ngram_explanations: optional; int or str: (New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.

Returns

jobPredictJob: The job computing the predictions

Return type: PredictJob

request_residuals_chart(source, data_slice_id=None)¶

Request the model residuals chart for the specified source.

Parameters

sourcestr: Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
data_slice_idstring, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

status_check_jobStatusCheckJob: Object contains all needed logic for a periodical status check of an async job.

Return type: StatusCheckJob

request_roc_curve(source, data_slice_id=None)¶

Request the model Roc Curve for the specified source.

Parameters

sourcestr: Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
data_slice_idstring, optional: ID for the data slice used in the request. If None, request unsliced insight data.

Returns

status_check_jobStatusCheckJob: Object contains all needed logic for a periodical status check of an async job.

Return type: StatusCheckJob

request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)¶

Start a job to build training predictions

Parameters

data_subsetstr

data set definition to build predictions on. Choices are:

dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
models in datetime partitioned projects

dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
all data except training set. Not valid for models in datetime partitioned projects

dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only

dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.

explanation_algorithmdr.enums.EXPLANATIONS_ALGORITHM

(New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).

max_explanationsint

(New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the max_explanations, the sum of remaining values will also be returned as shap_remaining_total. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored if explanation_algorithm is not set.

Returns

Job: an instance of created async job

request_transferable_export(prediction_intervals_size=None)¶

Request generation of an exportable model file for use in an on-premise DataRobot standalone prediction environment.

This function can only be used if model export is enabled, and will only be useful if you have an on-premise environment in which to import it.

This function does not download the exported file. Use download_export for that.

Parameters

prediction_intervals_sizeint, optional: (New in v2.19) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Prediction intervals size must be between 1 and 100 (inclusive).

Returns

Job

Examples

model = datarobot.Model.get('project-id', 'model-id')
job = model.request_transferable_export()
job.wait_for_completion()
model.download_export('my_exported_model.drmodel')

# Client must be configured to use standalone prediction server for import:
datarobot.Client(token='my-token-at-standalone-server',
                 endpoint='standalone-server-url/api/v2')

imported_model = datarobot.ImportedModel.create('my_exported_model.drmodel')

Return type: Job

retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)¶

Submit a job to the queue to train a blender model.

Parameters

sample_pct: float, optional: The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
featurelist_idstr, optional: The featurelist id
training_row_countint, optional: The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.
n_clusters: int, optional: (new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.

Returns

jobModelJob: The created job that is retraining the model

Return type: ModelJob

set_prediction_threshold(threshold)¶

Set a custom prediction threshold for the model.

May not be used once prediction_threshold_read_only is True for this model.

Parameters

thresholdfloat: only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).

star_model()¶

Mark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

start_advanced_tuning_session()¶

Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.

As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.

Returns

AdvancedTuningSession: Session for setting up and running Advanced Tuning on a model

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)¶

Train the blueprint used in model on a particular featurelist or amount of data.

This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.

Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.

In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.

Note

For datetime partitioned projects, see train_datetime instead.

Parameters

sample_pctfloat, optional: The amount of data to use for training, as a percentage of the project dataset from 0 to 100.
featurelist_idstr, optional: The identifier of the featurelist to use. If not defined, the featurelist of this model is used.
scoring_typestr, optional: Either validation or crossValidation (also dr.SCORING_TYPE.validation or dr.SCORING_TYPE.cross_validation). validation is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning, crossValidation can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.
training_row_countint, optional: The number of rows to use to train the requested model.
monotonic_increasing_featurelist_idstr: (new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
monotonic_decreasing_featurelist_idstr: (new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.

Returns

model_job_idstr: id of created job, can be used as parameter to ModelJob.get method or wait_for_async_model_creation function

Examples

project = Project.get('project-id')
model = Model.get('project-id', 'model-id')
model_job_id = model.train(training_row_count=project.max_train_rows)

Return type: str

train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)¶

Trains this model on a different featurelist or sample size.

Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Parameters

featurelist_idstr, optional: the featurelist to use to train the model. If not specified, the featurelist of this model is used.
training_row_countint, optional: the number of rows of data that should be used to train the model. If specified, neither training_duration nor use_project_settings may be specified.
training_durationstr, optional: a duration string specifying what time range the data used to train the model should span. If specified, neither training_row_count nor use_project_settings may be specified.
use_project_settingsbool, optional: (New in version v2.20) defaults to False. If True, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neither training_row_count nor training_duration may be specified.
time_window_sample_pctint, optional: may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
sampling_methodstr, optional: (New in version v2.23) defines the way training data is selected. Can be either random or latest. In combination with training_row_count defines how rows are selected from backtest (latest by default). When training data is defined using time range (training_duration or use_project_settings) this setting changes the way time_window_sample_pct is applied (random by default). Applicable to OTV projects only.
monotonic_increasing_featurelist_idstr, optional: (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing None disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
monotonic_decreasing_featurelist_idstr, optional: (New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing None disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT) is the one specified by the blueprint.
n_clusters: int, optional: (New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.

Returns

jobModelJob: the created job to build the model

Return type: ModelJob

unstar_model()¶

Unmark the model as starred.

Model stars propagate to the web application and the API, and can be used to filter when listing models.

Return type: None

Combined Model¶

See API reference for Combined Model in Segmented Modeling API Reference

Advanced Tuning¶

class datarobot.models.advanced_tuning.AdvancedTuningSession(model)¶

A session enabling users to configure and run advanced tuning for a model.

Every model contains a set of one or more tasks. Every task contains a set of zero or more parameters. This class allows tuning the values of each parameter on each task of a model, before running that model.

This session is client-side only and is not persistent. Only the final model, constructed when run is called, is persisted on the DataRobot server.

Attributes

descriptionstr: Description for the new advance-tuned model. Defaults to the same description as the base model.

get_task_names()¶

Get the list of task names that are available for this model

Returns

list(str): List of task names

Return type: List[str]

get_parameter_names(task_name)¶

Get the list of parameter names available for a specific task

Returns

list(str): List of parameter names

Return type: List[str]

set_parameter(value, task_name=None, parameter_name=None, parameter_id=None)¶

Set the value of a parameter to be used

The caller must supply enough of the optional arguments to this function to uniquely identify the parameter that is being set. For example, a less-common parameter name such as ‘building_block__complementary_error_function’ might only be used once (if at all) by a single task in a model. In which case it may be sufficient to simply specify ‘parameter_name’. But a more-common name such as ‘random_seed’ might be used by several of the model’s tasks, and it may be necessary to also specify ‘task_name’ to clarify which task’s random seed is to be set. This function only affects client-side state. It will not check that the new parameter value(s) are valid.

Parameters

task_namestr: Name of the task whose parameter needs to be set
parameter_namestr: Name of the parameter to set
parameter_idstr: ID of the parameter to set
valueint, float, list, or str: New value for the parameter, with legal values determined by the parameter being set

Raises

NoParametersFoundException: if no matching parameters are found.
NonUniqueParametersException: if multiple parameters matched the specified filtering criteria

Return type: None

get_parameters()¶

Returns the set of parameters available to this model

The returned parameters have one additional key, “value”, reflecting any new values that have been set in this AdvancedTuningSession. When the session is run, “value” will be used, or if it is unset, “current_value”.

Returns

parametersdict: “Parameters” dictionary, same as specified on Model.get_advanced_tuning_params.
An additional field is added per parameter to the ‘tuning_parameters’ list in the dictionary:
valueint, float, list, or str: The current value of the parameter. None if none has been specified.

Return type: AdvancedTuningParamsType

run()¶

Submit this model for Advanced Tuning.

Returns

datarobot.models.modeljob.ModelJob: The created job to build the model

Return type: ModelJob

ModelJob¶

datarobot.models.modeljob.wait_for_async_model_creation(project_id, model_job_id, max_wait=600)¶

Given a Project id and ModelJob id poll for status of process responsible for model creation until model is created.

Parameters

project_idstr: The identifier of the project
model_job_idstr: The identifier of the ModelJob
max_waitint, optional: Time in seconds after which model creation is considered unsuccessful

Returns

modelModel: Newly created model

Raises

AsyncModelCreationError: Raised if status of fetched ModelJob object is error
AsyncTimeoutError: Model wasn’t created in time, specified by max_wait parameter

Return type: Model

class datarobot.models.ModelJob(data, completed_resource_url=None)¶

Tracks asynchronous work being done within a project

Attributes

idint: the id of the job
project_idstr: the id of the project the job belongs to
statusstr: the status of the job - will be one of datarobot.enums.QUEUE_STATUS
job_typestr: what kind of work the job is doing - will be ‘model’ for modeling jobs
is_blockedbool: if true, the job is blocked (cannot be executed) until its dependencies are resolved
sample_pctfloat: the percentage of the project’s dataset used in this modeling job
model_typestr: the model this job builds (e.g. ‘Nystroem Kernel SVM Regressor’)
processeslist of str: the processes used by the model
featurelist_idstr: the id of the featurelist used in this modeling job
blueprintBlueprint: the blueprint used in this modeling job

classmethod from_job(job)¶

Transforms a generic Job into a ModelJob

Parameters

job: Job: A generic job representing a ModelJob

Returns

model_job: ModelJob: A fully populated ModelJob with all the details of the job

Raises

ValueError:: If the generic Job was not a model job, e.g. job_type != JOB_TYPE.MODEL

Return type: ModelJob

classmethod get(project_id, model_job_id)¶

Fetches one ModelJob. If the job finished, raises PendingJobFinished exception.

Parameters

project_idstr: The identifier of the project the model belongs to
model_job_idstr: The identifier of the model_job

Returns

model_jobModelJob: The pending ModelJob

Raises

PendingJobFinished: If the job being queried already finished, and the server is re-routing to the finished model.
AsyncFailureError: Querying this resource gave a status code other than 200 or 303

Return type: ModelJob

classmethod get_model(project_id, model_job_id)¶

Fetches a finished model from the job used to create it.

Parameters

project_idstr: The identifier of the project the model belongs to
model_job_idstr: The identifier of the model_job

Returns

modelModel: The finished model

Raises

JobNotFinished: If the job has not finished yet
AsyncFailureError: Querying the model_job in question gave a status code other than 200 or 303

Return type: Model

cancel()¶: Cancel this job. If this job has not finished running, it will be removed and canceled.

get_result(params=None)¶

Parameters

paramsdict or None: Query parameters to be added to request to get results.
For featureEffects, source param is required to define source,
otherwise the default is `training`

Returns

resultobject

Return type depends on the job type:

for model jobs, a Model is returned
for predict jobs, a pandas.DataFrame (with predictions) is returned
for featureImpact jobs, a list of dicts by default (see with_metadata parameter of the FeatureImpactJob class and its get() method).
for primeRulesets jobs, a list of Rulesets
for primeModel jobs, a PrimeModel
for primeDownloadValidation jobs, a PrimeFile
for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
for predictionExplanations jobs, a PredictionExplanations
for featureEffects, a FeatureEffects

Raises

JobNotFinished: If the job is not finished, the result is not available.
AsyncProcessUnsuccessfulError: If the job errored or was aborted

get_result_when_complete(max_wait=600, params=None)¶

Parameters

max_waitint, optional: How long to wait for the job to finish.
paramsdict, optional: Query parameters to be added to request.

Returns

result: object: Return type is the same as would be returned by Job.get_result.

Raises

AsyncTimeoutError: If the job does not finish in time
AsyncProcessUnsuccessfulError: If the job errored or was aborted

refresh()¶: Update this object with the latest job data from the server.

wait_for_completion(max_wait=600)¶

Waits for job to complete.

Parameters

max_waitint, optional: How long to wait for the job to finish.

Return type: None

Pareto Front¶

class datarobot.models.pareto_front.ParetoFront(project_id, error_metric, hyperparameters, target_type, solutions)¶

Pareto front data for a Eureqa model.

The pareto front reflects the tradeoffs between error and complexity for particular model. The solutions reflect possible Eureqa models that are different levels of complexity. By default, only one solution will have a corresponding model, but models can be created for each solution.

Attributes

project_idstr: the ID of the project the model belongs to
error_metricstr: Eureqa error-metric identifier used to compute error metrics for this search. Note that Eureqa error metrics do NOT correspond 1:1 with DataRobot error metrics – the available metrics are not the same, and are computed from a subset of the training data rather than from the validation data.
hyperparametersdict: Hyperparameters used by this run of the Eureqa blueprint
target_typestr: Indicating what kind of modeling is being done in this project, either ‘Regression’, ‘Binary’ (Binary classification), or ‘Multiclass’ (Multiclass classification).
solutionslist(Solution): Solutions that Eureqa has found to model this data. Some solutions will have greater accuracy. Others will have slightly less accuracy but will use simpler expressions.

classmethod from_server_data(data, keep_attrs=None)¶

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters

datadict: The directly translated dict of JSON from the server. No casing fixes have taken place
keep_attrslist: List of the dotted namespace notations for attributes to keep within the object structure even if their values are None

class datarobot.models.pareto_front.Solution(eureqa_solution_id, complexity, error, expression, expression_annotated, best_model, project_id)¶

Eureqa Solution.

A solution represents a possible Eureqa model; however not all solutions have models associated with them. It must have a model created before it can be used to make predictions, etc.

Attributes

eureqa_solution_id: str: ID of this Solution
complexity: int: Complexity score for this solution. Complexity score is a function of the mathematical operators used in the current solution. The Complexity calculation can be tuned via model hyperparameters.
error: float or None: Error for the current solution, as computed by Eureqa using the ‘error_metric’ error metric. It will be None if model refitted existing solution.
expression: str: Eureqa model equation string.
expression_annotated: str: Eureqa model equation string with variable names tagged for easy identification.
best_model: bool: True, if the model is determined to be the best

create_model()¶: Add this solution to the leaderboard, if it is not already present.

Partitioning¶

class datarobot.RandomCV(holdout_pct, reps, seed=0)¶

A partition in which observations are randomly assigned to cross-validation groups and the holdout set.

Parameters

holdout_pctint: the desired percentage of dataset to assign to holdout set
repsint: number of cross validation folds to use
seedint: a seed to use for randomization

class datarobot.StratifiedCV(holdout_pct, reps, seed=0)¶

A partition in which observations are randomly assigned to cross-validation groups and the holdout set, preserving in each group the same ratio of positive to negative cases as in the original data.

Parameters

holdout_pctint: the desired percentage of dataset to assign to holdout set
repsint: number of cross validation folds to use
seedint: a seed to use for randomization

class datarobot.GroupCV(holdout_pct, reps, partition_key_cols, seed=0)¶

A partition in which one column is specified, and rows sharing a common value for that column are guaranteed to stay together in the partitioning into cross-validation groups and the holdout set.

Parameters

holdout_pctint: the desired percentage of dataset to assign to holdout set
repsint: number of cross validation folds to use
partition_key_colslist: a list containing a single string, where the string is the name of the column whose values should remain together in partitioning
seedint: a seed to use for randomization

class datarobot.UserCV(user_partition_col, cv_holdout_level, seed=0)¶

A partition where the cross-validation folds and the holdout set are specified by the user.

Parameters

user_partition_colstring: the name of the column containing the partition assignments
cv_holdout_level: the value of the partition column indicating a row is part of the holdout set
seedint: a seed to use for randomization

class datarobot.RandomTVH(holdout_pct, validation_pct, seed=0)¶

Specifies a partitioning method in which rows are randomly assigned to training, validation, and holdout.

Parameters

holdout_pctint: the desired percentage of dataset to assign to holdout set
validation_pctint: the desired percentage of dataset to assign to validation set
seedint: a seed to use for randomization

class datarobot.UserTVH(user_partition_col, training_level, validation_level, holdout_level, seed=0)¶

Specifies a partitioning method in which rows are assigned by the user to training, validation, and holdout sets.

Parameters

user_partition_colstring: the name of the column containing the partition assignments
training_level: the value of the partition column indicating a row is part of the training set
validation_level: the value of the partition column indicating a row is part of the validation set
holdout_level: the value of the partition column indicating a row is part of the holdout set (use None if you want no holdout set)
seedint: a seed to use for randomization

class datarobot.StratifiedTVH(holdout_pct, validation_pct, seed=0)¶

A partition in which observations are randomly assigned to train, validation, and holdout sets, preserving in each group the same ratio of positive to negative cases as in the original data.

Parameters

holdout_pctint: the desired percentage of dataset to assign to holdout set
validation_pctint: the desired percentage of dataset to assign to validation set
seedint: a seed to use for randomization

class datarobot.GroupTVH(holdout_pct, validation_pct, partition_key_cols, seed=0)¶

A partition in which one column is specified, and rows sharing a common value for that column are guaranteed to stay together in the partitioning into the training, validation, and holdout sets.

Parameters

holdout_pctint: the desired percentage of dataset to assign to holdout set
validation_pctint: the desired percentage of dataset to assign to validation set
partition_key_colslist: a list containing a single string, where the string is the name of the column whose values should remain together in partitioning
seedint: a seed to use for randomization

class datarobot.DatetimePartitioningSpecification(datetime_partition_column, autopilot_data_selection_method=None, validation_duration=None, holdout_start_date=None, holdout_duration=None, disable_holdout=None, gap_duration=None, number_of_backtests=None, backtests=None, use_time_series=False, default_to_known_in_advance=False, default_to_do_not_derive=False, feature_derivation_window_start=None, feature_derivation_window_end=None, feature_settings=None, forecast_window_start=None, forecast_window_end=None, windows_basis_unit=None, treat_as_exponential=None, differencing_method=None, periodicities=None, multiseries_id_columns=None, use_cross_series_features=None, aggregation_type=None, cross_series_group_by_columns=None, calendar_id=None, holdout_end_date=None, unsupervised_mode=False, model_splits=None, allow_partial_history_time_series_predictions=False, unsupervised_type=None)¶

Uniquely defines a DatetimePartitioning for some project

Includes only the attributes of DatetimePartitioning that are directly controllable by users, not those determined by the DataRobot application based on the project dataset and the user-controlled settings.

This is the specification that should be passed to Project.analyze_and_model via the partitioning_method parameter. To see the full partitioning based on the project dataset, use DatetimePartitioning.generate.

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Note that either (holdout_start_date, holdout_duration) or (holdout_start_date, holdout_end_date) can be used to specify holdout partitioning settings.

Attributes

datetime_partition_columnstr: the name of the column whose values as dates are used to assign a row to a particular partition
autopilot_data_selection_methodstr: one of datarobot.enums.DATETIME_AUTOPILOT_DATA_SELECTION_METHOD. Whether models created by the autopilot should use “rowCount” or “duration” as their data_selection_method.
validation_durationstr or None: the default validation_duration for the backtests
holdout_start_datedatetime.datetime or None: The start date of holdout scoring data. If holdout_start_date is specified, either holdout_duration or holdout_end_date must also be specified. If disable_holdout is set to True, holdout_start_date, holdout_duration, and holdout_end_date may not be specified.
holdout_durationstr or None: The duration of the holdout scoring data. If holdout_duration is specified, holdout_start_date must also be specified. If disable_holdout is set to True, holdout_duration, holdout_start_date, and holdout_end_date may not be specified.
holdout_end_datedatetime.datetime or None: The end date of holdout scoring data. If holdout_end_date is specified, holdout_start_date must also be specified. If disable_holdout is set to True, holdout_end_date, holdout_start_date, and holdout_duration may not be specified.
disable_holdoutbool or None: (New in version v2.8) Whether to suppress allocating a holdout fold. If set to True, holdout_start_date, holdout_duration, and holdout_end_date may not be specified.
gap_durationstr or None: The duration of the gap between training and holdout scoring data
number_of_backtestsint or None: the number of backtests to use
backtestslist of BacktestSpecification: the exact specification of backtests to use. The indices of the specified backtests should range from 0 to number_of_backtests - 1. If any backtest is left unspecified, a default configuration will be chosen.
use_time_seriesbool: (New in version v2.8) Whether to create a time series project (if True) or an OTV project which uses datetime partitioning (if False). The default behavior is to create an OTV project.
default_to_known_in_advancebool: (New in version v2.11) Optional, default False. Used for time series projects only. Sets whether all features default to being treated as known in advance. Known in advance features are expected to be known for dates in the future when making predictions, e.g., “is this a holiday?”. Individual features can be set to a value different than the default using the feature_settings parameter.
default_to_do_not_derivebool: (New in v2.17) Optional, default False. Used for time series projects only. Sets whether all features default to being treated as do-not-derive features, excluding them from feature derivation. Individual features can be set to a value different than the default by using the feature_settings parameter.
feature_derivation_window_startint or None: (New in version v2.8) Only used for time series projects. Offset into the past to define how far back relative to the forecast point the feature derivation window should start. Expressed in terms of the windows_basis_unit and should be negative value or zero.
feature_derivation_window_endint or None: (New in version v2.8) Only used for time series projects. Offset into the past to define how far back relative to the forecast point the feature derivation window should end. Expressed in terms of the windows_basis_unit and should be a negative value or zero.
feature_settingslist of FeatureSettings: (New in version v2.9) Optional, a list specifying per feature settings, can be left unspecified.
forecast_window_startint or None: (New in version v2.8) Only used for time series projects. Offset into the future to define how far forward relative to the forecast point the forecast window should start. Expressed in terms of the windows_basis_unit.
forecast_window_endint or None: (New in version v2.8) Only used for time series projects. Offset into the future to define how far forward relative to the forecast point the forecast window should end. Expressed in terms of the windows_basis_unit.
windows_basis_unitstring, optional: (New in version v2.14) Only used for time series projects. Indicates which unit is a basis for feature derivation window and forecast window. Valid options are detected time unit (one of the datarobot.enums.TIME_UNITS) or “ROW”. If omitted, the default value is the detected time unit.
treat_as_exponentialstring, optional: (New in version v2.9) defaults to “auto”. Used to specify whether to treat data as exponential trend and apply transformations like log-transform. Use values from the datarobot.enums.TREAT_AS_EXPONENTIAL enum.
differencing_methodstring, optional: (New in version v2.9) defaults to “auto”. Used to specify which differencing method to apply of case if data is stationary. Use values from datarobot.enums.DIFFERENCING_METHOD enum.
periodicitieslist of Periodicity, optional: (New in version v2.9) a list of datarobot.Periodicity. Periodicities units should be “ROW”, if the windows_basis_unit is “ROW”.
multiseries_id_columnslist of str or null: (New in version v2.11) a list of the names of multiseries id columns to define series within the training data. Currently only one multiseries id column is supported.
use_cross_series_featuresbool: (New in version v2.14) Whether to use cross series features.
aggregation_typestr, optional: (New in version v2.14) The aggregation type to apply when creating cross series features. Optional, must be one of “total” or “average”.
cross_series_group_by_columnslist of str, optional: (New in version v2.15) List of columns (currently of length 1). Optional setting that indicates how to further split series into related groups. For example, if every series is sales of an individual product, the series group-by could be the product category with values like “men’s clothing”, “sports equipment”, etc.. Can only be used in a multiseries project with use_cross_series_features set to True.
calendar_idstr, optional: (New in version v2.15) The id of the CalendarFile to use with this project.
unsupervised_mode: bool, optional: (New in version v2.20) defaults to False, indicates whether partitioning should be constructed for the unsupervised project.
model_splits: int, optional: (New in version v2.21) Sets the cap on the number of jobs per model used when building models to control number of jobs in the queue. Higher number of model splits will allow for less downsampling leading to the use of more post-processed data.
allow_partial_history_time_series_predictions: bool, optional: (New in version v2.24) Whether to allow time series models to make predictions using partial historical data.
unsupervised_type: str, optional: (New in version v3.2) The unsupervised project type, only valid if unsupervised_mode is True. Use values from datarobot.enums.UnsupervisedTypeEnum enum. If not specified then the project defaults to ‘anomaly’ when unsupervised_mode is True.

collect_payload()¶

Set up the dict that should be sent to the server when setting the target Returns ——- partitioning_spec : dict

Return type: Dict[str, Any]

prep_payload(project_id, max_wait=600)¶

Run any necessary validation and prep of the payload, including async operations

Mainly used for the datetime partitioning spec but implemented in general for consistency

Return type: None

update(**kwargs)¶

Update this instance, matching attributes to kwargs

Mainly used for the datetime partitioning spec but implemented in general for consistency

Return type: None

class datarobot.BacktestSpecification(index, gap_duration=None, validation_start_date=None, validation_duration=None, validation_end_date=None, primary_training_start_date=None, primary_training_end_date=None)¶

Uniquely defines a Backtest used in a DatetimePartitioning

Includes only the attributes of a backtest directly controllable by users. The other attributes are assigned by the DataRobot application based on the project dataset and the user-controlled settings.

There are two ways to specify an individual backtest:

Option 1: Use index, gap_duration, validation_start_date, and validation_duration. All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method.

import datarobot as dr

partitioning_spec = dr.DatetimePartitioningSpecification(
    backtests=[
        # modify the first backtest using option 1
        dr.BacktestSpecification(
            index=0,
            gap_duration=dr.partitioning_methods.construct_duration_string(),
            validation_start_date=datetime(year=2010, month=1, day=1),
            validation_duration=dr.partitioning_methods.construct_duration_string(years=1),
        )
    ],
    # other partitioning settings...
)

Option 2 (New in version v2.20): Use index, primary_training_start_date, primary_training_end_date, validation_start_date, and validation_end_date. In this case, note that setting primary_training_end_date and validation_start_date to the same timestamp will result with no gap being created.

import datarobot as dr

partitioning_spec = dr.DatetimePartitioningSpecification(
    backtests=[
        # modify the first backtest using option 2
        dr.BacktestSpecification(
            index=0,
            primary_training_start_date=datetime(year=2005, month=1, day=1),
            primary_training_end_date=datetime(year=2010, month=1, day=1),
            validation_start_date=datetime(year=2010, month=1, day=1),
            validation_end_date=datetime(year=2011, month=1, day=1),
        )
    ],
    # other partitioning settings...
)

All durations should be specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Attributes

indexint: the index of the backtest to update
gap_durationstr: a duration string specifying the desired duration of the gap between training and validation scoring data for the backtest
validation_start_datedatetime.datetime: the desired start date of the validation scoring data for this backtest
validation_durationstr: a duration string specifying the desired duration of the validation scoring data for this backtest
validation_end_datedatetime.datetime: the desired end date of the validation scoring data for this backtest
primary_training_start_datedatetime.datetime: the desired start date of the training partition for this backtest
primary_training_end_datedatetime.datetime: the desired end date of the training partition for this backtest

class datarobot.FeatureSettings(feature_name, known_in_advance=None, do_not_derive=None)¶

Per feature settings

Attributes

feature_namestring: name of the feature
known_in_advancebool: (New in version v2.11) Optional, for time series projects only. Sets whether the feature is known in advance, i.e., values for future dates are known at prediction time. If not specified, the feature uses the value from the default_to_known_in_advance flag.
do_not_derivebool: (New in v2.17) Optional, for time series projects only. Sets whether the feature is excluded from feature derivation. If not specified, the feature uses the value from the default_to_do_not_derive flag.

collect_payload(use_a_priori=False)¶

Parameters

use_a_prioriboolSwitch to using the older a_priori key name instead of known_in_advance. Default: False

Returns

BacktestSpecification dictionary representation

Return type: FeatureSettingsPayload

class datarobot.Periodicity(time_steps, time_unit)¶

Periodicity configuration

Parameters

time_stepsint: Time step value
time_unitstring: Time step unit, valid options are values from datarobot.enums.TIME_UNITS

Examples

from datarobot as dr
periodicities = [
    dr.Periodicity(time_steps=10, time_unit=dr.enums.TIME_UNITS.HOUR),
    dr.Periodicity(time_steps=600, time_unit=dr.enums.TIME_UNITS.MINUTE)]
spec = dr.DatetimePartitioningSpecification(
    # ...
    periodicities=periodicities
)

class datarobot.DatetimePartitioning(project_id=None, datetime_partitioning_id=None, datetime_partition_column=None, date_format=None, autopilot_data_selection_method=None, validation_duration=None, available_training_start_date=None, available_training_duration=None, available_training_row_count=None, available_training_end_date=None, primary_training_start_date=None, primary_training_duration=None, primary_training_row_count=None, primary_training_end_date=None, gap_start_date=None, gap_duration=None, gap_row_count=None, gap_end_date=None, disable_holdout=None, holdout_start_date=None, holdout_duration=None, holdout_row_count=None, holdout_end_date=None, number_of_backtests=None, backtests=None, total_row_count=None, use_time_series=False, default_to_known_in_advance=False, default_to_do_not_derive=False, feature_derivation_window_start=None, feature_derivation_window_end=None, feature_settings=None, forecast_window_start=None, forecast_window_end=None, windows_basis_unit=None, treat_as_exponential=None, differencing_method=None, periodicities=None, multiseries_id_columns=None, number_of_known_in_advance_features=0, number_of_do_not_derive_features=0, use_cross_series_features=None, aggregation_type=None, cross_series_group_by_columns=None, calendar_id=None, calendar_name=None, model_splits=None, allow_partial_history_time_series_predictions=False, unsupervised_mode=False, unsupervised_type=None)¶

Full partitioning of a project for datetime partitioning.

To instantiate, use DatetimePartitioning.get(project_id).

Includes both the attributes specified by the user, as well as those determined by the DataRobot application based on the project dataset. In order to use a partitioning to set the target, call to_specification and pass the resulting DatetimePartitioningSpecification to Project.analyze_and_model via the partitioning_method parameter.

The available training data corresponds to all the data available for training, while the primary training data corresponds to the data that can be used to train while ensuring that all backtests are available. If a model is trained with more data than is available in the primary training data, then all backtests may not have scores available.

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Attributes

project_idstr: the id of the project this partitioning applies to
datetime_partitioning_idstr or None: the id of the datetime partitioning it is an optimized partitioning
datetime_partition_columnstr: the name of the column whose values as dates are used to assign a row to a particular partition
date_formatstr: the format (e.g. “%Y-%m-%d %H:%M:%S”) by which the partition column was interpreted (compatible with strftime)
autopilot_data_selection_methodstr: one of datarobot.enums.DATETIME_AUTOPILOT_DATA_SELECTION_METHOD. Whether models created by the autopilot use “rowCount” or “duration” as their data_selection_method.
validation_durationstr or None: the validation duration specified when initializing the partitioning - not directly significant if the backtests have been modified, but used as the default validation_duration for the backtests. Can be absent if this is a time series project with an irregular primary date/time feature.
available_training_start_datedatetime.datetime: The start date of the available training data for scoring the holdout
available_training_durationstr: The duration of the available training data for scoring the holdout
available_training_row_countint or None: The number of rows in the available training data for scoring the holdout. Only available when retrieving the partitioning after setting the target.
available_training_end_datedatetime.datetime: The end date of the available training data for scoring the holdout
primary_training_start_datedatetime.datetime or None: The start date of primary training data for scoring the holdout. Unavailable when the holdout fold is disabled.
primary_training_durationstr: The duration of the primary training data for scoring the holdout
primary_training_row_countint or None: The number of rows in the primary training data for scoring the holdout. Only available when retrieving the partitioning after setting the target.
primary_training_end_datedatetime.datetime or None: The end date of the primary training data for scoring the holdout. Unavailable when the holdout fold is disabled.
gap_start_datedatetime.datetime or None: The start date of the gap between training and holdout scoring data. Unavailable when the holdout fold is disabled.
gap_durationstr: The duration of the gap between training and holdout scoring data
gap_row_countint or None: The number of rows in the gap between training and holdout scoring data. Only available when retrieving the partitioning after setting the target.
gap_end_datedatetime.datetime or None: The end date of the gap between training and holdout scoring data. Unavailable when the holdout fold is disabled.
disable_holdoutbool or None: Whether to suppress allocating a holdout fold. If set to True, holdout_start_date, holdout_duration, and holdout_end_date may not be specified.
holdout_start_datedatetime.datetime or None: The start date of holdout scoring data. Unavailable when the holdout fold is disabled.
holdout_durationstr: The duration of the holdout scoring data
holdout_row_countint or None: The number of rows in the holdout scoring data. Only available when retrieving the partitioning after setting the target.
holdout_end_datedatetime.datetime or None: The end date of the holdout scoring data. Unavailable when the holdout fold is disabled.
number_of_backtestsint: the number of backtests used.
backtestslist of Backtest: the configured backtests.
total_row_countint: the number of rows in the project dataset. Only available when retrieving the partitioning after setting the target.
use_time_seriesbool: (New in version v2.8) Whether to create a time series project (if True) or an OTV project which uses datetime partitioning (if False). The default behavior is to create an OTV project.
default_to_known_in_advancebool: (New in version v2.11) Optional, default False. Used for time series projects only. Sets whether all features default to being treated as known in advance. Known in advance features are expected to be known for dates in the future when making predictions, e.g., “is this a holiday?”. Individual features can be set to a value different from the default using the feature_settings parameter.
default_to_do_not_derivebool: (New in v2.17) Optional, default False. Used for time series projects only. Sets whether all features default to being treated as do-not-derive features, excluding them from feature derivation. Individual features can be set to a value different from the default by using the feature_settings parameter.
feature_derivation_window_startint or None: (New in version v2.8) Only used for time series projects. Offset into the past to define how far back relative to the forecast point the feature derivation window should start. Expressed in terms of the windows_basis_unit.
feature_derivation_window_endint or None: (New in version v2.8) Only used for time series projects. Offset into the past to define how far back relative to the forecast point the feature derivation window should end. Expressed in terms of the windows_basis_unit.
feature_settingslist of FeatureSettings: (New in version v2.9) Optional, a list specifying per feature settings, can be left unspecified.
forecast_window_startint or None: (New in version v2.8) Only used for time series projects. Offset into the future to define how far forward relative to the forecast point the forecast window should start. Expressed in terms of the windows_basis_unit.
forecast_window_endint or None: (New in version v2.8) Only used for time series projects. Offset into the future to define how far forward relative to the forecast point the forecast window should end. Expressed in terms of the windows_basis_unit.
windows_basis_unitstring, optional: (New in version v2.14) Only used for time series projects. Indicates which unit is a basis for feature derivation window and forecast window. Valid options are detected time unit (one of the datarobot.enums.TIME_UNITS) or “ROW”. If omitted, the default value is detected time unit.
treat_as_exponentialstring, optional: (New in version v2.9) defaults to “auto”. Used to specify whether to treat data as exponential trend and apply transformations like log-transform. Use values from the datarobot.enums.TREAT_AS_EXPONENTIAL enum.
differencing_methodstring, optional: (New in version v2.9) defaults to “auto”. Used to specify which differencing method to apply of case if data is stationary. Use values from the datarobot.enums.DIFFERENCING_METHOD enum.
periodicitieslist of Periodicity, optional: (New in version v2.9) a list of datarobot.Periodicity. Periodicities units should be “ROW”, if the windows_basis_unit is “ROW”.
multiseries_id_columnslist of str or null: (New in version v2.11) a list of the names of multiseries id columns to define series within the training data. Currently only one multiseries id column is supported.
number_of_known_in_advance_featuresint: (New in version v2.14) Number of features that are marked as known in advance.
number_of_do_not_derive_featuresint: (New in v2.17) Number of features that are excluded from derivation.
use_cross_series_featuresbool: (New in version v2.14) Whether to use cross series features.
aggregation_typestr, optional: (New in version v2.14) The aggregation type to apply when creating cross series features. Optional, must be one of “total” or “average”.
cross_series_group_by_columnslist of str, optional: (New in version v2.15) List of columns (currently of length 1). Optional setting that indicates how to further split series into related groups. For example, if every series is sales of an individual product, the series group-by could be the product category with values like “men’s clothing”, “sports equipment”, etc.. Can only be used in a multiseries project with use_cross_series_features set to True.
calendar_idstr, optional: (New in version v2.15) Only available for time series projects. The id of the CalendarFile to use with this project.
calendar_namestr, optional: (New in version v2.17) Only available for time series projects. The name of the CalendarFile used with this project.
model_splits: int, optional: (New in version v2.21) Sets the cap on the number of jobs per model used when building models to control number of jobs in the queue. Higher number of model splits will allow for less downsampling leading to the use of more post-processed data.
allow_partial_history_time_series_predictions: bool, optional: (New in version v2.24) Whether to allow time series models to make predictions using partial historical data.
unsupervised_mode: bool, optional: (New in version v3.1) Whether the date/time partitioning is for an unsupervised project
unsupervised_type: str, optional: (New in version v3.2) The unsupervised project type, only valid if unsupervised_mode is True. Use values from datarobot.enums.UnsupervisedTypeEnum enum. If not specified then the project defaults to ‘anomaly’ when unsupervised_mode is True.

classmethod generate(project_id, spec, max_wait=600, target=None)¶

Preview the full partitioning determined by a DatetimePartitioningSpecification

Based on the project dataset and the partitioning specification, inspect the full partitioning that would be used if the same specification were passed into Project.analyze_and_model.

Parameters

project_idstr: the id of the project
specDatetimePartitioningSpec: the desired partitioning
max_waitint, optional: For some settings (e.g. generating a partitioning preview for a multiseries project for the first time), an asynchronous task must be run to analyze the dataset. max_wait governs the maximum time (in seconds) to wait before giving up. In all non-multiseries projects, this is unused.
targetstr, optional: the name of the target column. For unsupervised projects target may be None. Providing a target will ensure that partitions are correctly optimized for your dataset.

Returns

DatetimePartitioning: the full generated partitioning

classmethod get(project_id)¶

Retrieve the DatetimePartitioning from a project

Only available if the project has already set the target as a datetime project.

Parameters

project_idstr: the id of the project to retrieve partitioning for

Returns

DatetimePartitioningthe full partitioning for the project

Return type: DatetimePartitioning

classmethod generate_optimized(project_id, spec, target, max_wait=600)¶

Preview the full partitioning determined by a DatetimePartitioningSpecification

Based on the project dataset and the partitioning specification, inspect the full partitioning that would be used if the same specification were passed into Project.analyze_and_model.

Parameters

project_idstr: the id of the project
specDatetimePartitioningSpecification: the desired partitioning
targetstr: the name of the target column. For unsupervised projects target may be None.
max_waitint, optional: Governs the maximum time (in seconds) to wait before giving up.

Returns

DatetimePartitioning: the full generated partitioning

Return type: DatetimePartitioning

classmethod get_optimized(project_id, datetime_partitioning_id)¶

Retrieve an Optimized DatetimePartitioning from a project for the specified datetime_partitioning_id. A datetime_partitioning_id is created by using the generate_optimized function.

Parameters

project_idstr: the id of the project to retrieve partitioning for
datetime_partitioning_idObjectId: the ObjectId associated with the project to retrieve from mongo

Returns

DatetimePartitioningthe full partitioning for the project

Return type: DatetimePartitioning

classmethod feature_log_list(project_id, offset=None, limit=None)¶

Retrieve the feature derivation log content and log length for a time series project.

The Time Series Feature Log provides details about the feature generation process for a time series project. It includes information about which features are generated and their priority, as well as the detected properties of the time series data such as whether the series is stationary, and periodicities detected.

This route is only supported for time series projects that have finished partitioning.

The feature derivation log will include information about:

Detected stationarity of the series:

e.g. ‘Series detected as non-stationary’
Detected presence of multiplicative trend in the series:

e.g. ‘Multiplicative trend detected’
Detected presence of multiplicative trend in the series:

e.g. ‘Detected periodicities: 7 day’
Maximum number of feature to be generated:

e.g. ‘Maximum number of feature to be generated is 1440’
Window sizes used in rolling statistics / lag extractors

e.g. ‘The window sizes chosen to be: 2 months

(because the time step is 1 month and Feature Derivation Window is 2 months)’
Features that are specified as known-in-advance

e.g. ‘Variables treated as apriori: holiday’
Details about why certain variables are transformed in the input data

e.g. ‘Generating variable “y (log)” from “y” because multiplicative trend

is detected’
Details about features generated as timeseries features, and their priority

e.g. ‘Generating feature “date (actual)” from “date” (priority: 1)’

Parameters

project_idstr: project id to retrieve a feature derivation log for.
offsetint: optional, defaults is 0, this many results will be skipped.
limitint: optional, defaults to 100, at most this many results are returned. To specify no limit, use 0. The default may change without notice.

classmethod feature_log_retrieve(project_id)¶

Retrieve the feature derivation log content and log length for a time series project.

The Time Series Feature Log provides details about the feature generation process for a time series project. It includes information about which features are generated and their priority, as well as the detected properties of the time series data such as whether the series is stationary, and periodicities detected.

This route is only supported for time series projects that have finished partitioning.

The feature derivation log will include information about:

Detected stationarity of the series:

e.g. ‘Series detected as non-stationary’
Detected presence of multiplicative trend in the series:

e.g. ‘Multiplicative trend detected’
Detected presence of multiplicative trend in the series:

e.g. ‘Detected periodicities: 7 day’
Maximum number of feature to be generated:

e.g. ‘Maximum number of feature to be generated is 1440’
Window sizes used in rolling statistics / lag extractors

e.g. ‘The window sizes chosen to be: 2 months

(because the time step is 1 month and Feature Derivation Window is 2 months)’
Features that are specified as known-in-advance

e.g. ‘Variables treated as apriori: holiday’
Details about why certain variables are transformed in the input data

e.g. ‘Generating variable “y (log)” from “y” because multiplicative trend

is detected’
Details about features generated as timeseries features, and their priority

e.g. ‘Generating feature “date (actual)” from “date” (priority: 1)’

Parameters

project_idstr: project id to retrieve a feature derivation log for.

Return type: str

to_specification(use_holdout_start_end_format=False, use_backtest_start_end_format=False)¶

Render the DatetimePartitioning as a DatetimePartitioningSpecification

The resulting specification can be used when setting the target, and contains only the attributes directly controllable by users.

Parameters

use_holdout_start_end_formatbool, optional: Defaults to False. If True, will use holdout_end_date when configuring the holdout partition. If False, will use holdout_duration instead.
use_backtest_start_end_formatbool, optional: Defaults to False. If False, will use a duration-based approach for specifying backtests (gap_duration, validation_start_date, and validation_duration). If True, will use a start/end date approach for specifying backtests (primary_training_start_date, primary_training_end_date, validation_start_date, validation_end_date). In contrast, projects created in the Web UI will use the start/end date approach for specifying backtests. Set this parameter to True to mirror the behavior in the Web UI.

Returns

DatetimePartitioningSpecification: the specification for this partitioning

Return type: DatetimePartitioningSpecification

to_dataframe()¶

Render the partitioning settings as a dataframe for convenience of display

Excludes project_id, datetime_partition_column, date_format, autopilot_data_selection_method, validation_duration, and number_of_backtests, as well as the row count information, if present.

Also excludes the time series specific parameters for use_time_series, default_to_known_in_advance, default_to_do_not_derive, and defining the feature derivation and forecast windows.

Return type: DataFrame

classmethod datetime_partitioning_log_retrieve(project_id, datetime_partitioning_id)¶

Retrieve the datetime partitioning log content for an optimized datetime partitioning.

The datetime partitioning log provides details about the partitioning process for an OTV or time series project.

Parameters

project_idstr: The project ID of the project associated with the datetime partitioning.
datetime_partitioning_idstr: id of the optimized datetime partitioning

Return type: Any

classmethod datetime_partitioning_log_list(project_id, datetime_partitioning_id, offset=None, limit=None)¶

Retrieve the datetime partitioning log content and log length for an optimized datetime partitioning.

The Datetime Partitioning Log provides details about the partitioning process for an OTV or Time Series project.

Parameters

project_idstr: project id of the project associated with the datetime partitioning.
datetime_partitioning_idstr: id of the optimized datetime partitioning
offsetint or None: optional, defaults is 0, this many results will be skipped.
limitint or None: optional, defaults to 100, at most this many results are returned. To specify no limit, use 0. The default may change without notice.

Return type: Any

classmethod get_input_data(project_id, datetime_partitioning_id)¶

Retrieve the input used to create an optimized DatetimePartitioning from a project for the specified datetime_partitioning_id. A datetime_partitioning_id is created by using the generate_optimized function.

Parameters

project_idstr: The ID of the project to retrieve partitioning for.
datetime_partitioning_idObjectId: The ObjectId associated with the project to retrieve from Mongo.

Returns

DatetimePartitioningInputThe input to optimized datetime partitioning.

Return type: DatetimePartitioningSpecification

class datarobot.helpers.partitioning_methods.DatetimePartitioningId(datetime_partitioning_id, project_id)¶

Defines a DatetimePartitioningId used for datetime partitioning.

This class only includes the datetime_partitioning_id that identifies a previously optimized datetime partitioning and the project_id for the associated project.

This is the specification that should be passed to Project.analyze_and_model via the partitioning_method parameter. To see the full partitioning use DatetimePartitioning.get_optimized.

Attributes

datetime_partitioning_idstr: The ID of the datetime partitioning to use.
project_idstr: The ID of the project that the datetime partitioning is associated with.

collect_payload()¶

Set up the dict that should be sent to the server when setting the target Returns ——- partitioning_spec : dict

Return type: Dict[str, Any]

prep_payload(project_id, max_wait=600)¶

Run any necessary validation and prep of the payload, including async operations

Mainly used for the datetime partitioning spec but implemented in general for consistency

Return type: None

update(**kwargs)¶

Update this instance, matching attributes to kwargs

Mainly used for the datetime partitioning spec but implemented in general for consistency

Return type: NoReturn

class datarobot.helpers.partitioning_methods.Backtest(index=None, available_training_start_date=None, available_training_duration=None, available_training_row_count=None, available_training_end_date=None, primary_training_start_date=None, primary_training_duration=None, primary_training_row_count=None, primary_training_end_date=None, gap_start_date=None, gap_duration=None, gap_row_count=None, gap_end_date=None, validation_start_date=None, validation_duration=None, validation_row_count=None, validation_end_date=None, total_row_count=None)¶

A backtest used to evaluate models trained in a datetime partitioned project

When setting up a datetime partitioning project, backtests are specified by a BacktestSpecification.

The available training data corresponds to all the data available for training, while the primary training data corresponds to the data that can be used to train while ensuring that all backtests are available. If a model is trained with more data than is available in the primary training data, then all backtests may not have scores available.

All durations are specified with a duration string such as those returned by the partitioning_methods.construct_duration_string helper method. Please see datetime partitioned project documentation for more information on duration strings.

Attributes

indexint: the index of the backtest
available_training_start_datedatetime.datetime: the start date of the available training data for this backtest
available_training_durationstr: the duration of available training data for this backtest
available_training_row_countint or None: the number of rows of available training data for this backtest. Only available when retrieving from a project where the target is set.
available_training_end_datedatetime.datetime: the end date of the available training data for this backtest
primary_training_start_datedatetime.datetime: the start date of the primary training data for this backtest
primary_training_durationstr: the duration of the primary training data for this backtest
primary_training_row_countint or None: the number of rows of primary training data for this backtest. Only available when retrieving from a project where the target is set.
primary_training_end_datedatetime.datetime: the end date of the primary training data for this backtest
gap_start_datedatetime.datetime: the start date of the gap between training and validation scoring data for this backtest
gap_durationstr: the duration of the gap between training and validation scoring data for this backtest
gap_row_countint or None: the number of rows in the gap between training and validation scoring data for this backtest. Only available when retrieving from a project where the target is set.
gap_end_datedatetime.datetime: the end date of the gap between training and validation scoring data for this backtest
validation_start_datedatetime.datetime: the start date of the validation scoring data for this backtest
validation_durationstr: the duration of the validation scoring data for this backtest
validation_row_countint or None: the number of rows of validation scoring data for this backtest. Only available when retrieving from a project where the target is set.
validation_end_datedatetime.datetime: the end date of the validation scoring data for this backtest
total_row_countint or None: the number of rows in this backtest. Only available when retrieving from a project where the target is set.

to_specification(use_start_end_format=False)¶

Render this backtest as a BacktestSpecification.

The resulting specification includes only the attributes users can directly control, not those indirectly determined by the project dataset.

Parameters

use_start_end_formatbool: Default False. If False, will use a duration-based approach for specifying backtests (gap_duration, validation_start_date, and validation_duration). If True, will use a start/end date approach for specifying backtests (primary_training_start_date, primary_training_end_date, validation_start_date, validation_end_date). In contrast, projects created in the Web UI will use the start/end date approach for specifying backtests. Set this parameter to True to mirror the behavior in the Web UI.

Returns

BacktestSpecification: the specification for this backtest

Return type: BacktestSpecification

to_dataframe()¶

Render this backtest as a dataframe for convenience of display

Returns

backtest_partitioningpandas.Dataframe: the backtest attributes, formatted into a dataframe

Return type: DataFrame

class datarobot.helpers.partitioning_methods.FeatureSettingsPayload() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶

datarobot.helpers.partitioning_methods.construct_duration_string(years=0, months=0, days=0, hours=0, minutes=0, seconds=0)¶

Construct a valid string representing a duration in accordance with ISO8601

A duration of six months, 3 days, and 12 hours could be represented as P6M3DT12H.

Parameters

yearsint: the number of years in the duration
monthsint: the number of months in the duration
daysint: the number of days in the duration
hoursint: the number of hours in the duration
minutesint: the number of minutes in the duration
secondsint: the number of seconds in the duration

Returns

duration_string: str: The duration string, specified compatibly with ISO8601

Return type: str

PayoffMatrix¶

class datarobot.models.PayoffMatrix(project_id, id, name=None, true_positive_value=None, true_negative_value=None, false_positive_value=None, false_negative_value=None)¶

Represents a Payoff Matrix, a costs/benefit scenario used for creating a profit curve.

Examples

import datarobot as dr

# create a payoff matrix
payoff_matrix = dr.PayoffMatrix.create(
    project_id,
    name,
    true_positive_value=100,
    true_negative_value=10,
    false_positive_value=0,
    false_negative_value=-10,
)

# list available payoff matrices
payoff_matrices = dr.PayoffMatrix.list(project_id)
payoff_matrix = payoff_matrices[0]

Attributes

project_idstr: id of the project with which the payoff matrix is associated.
idstr: id of the payoff matrix.
namestr: User-supplied label for the payoff matrix.
true_positive_valuefloat: Cost or benefit of a true positive classification
true_negative_valuefloat: Cost or benefit of a true negative classification
false_positive_valuefloat: Cost or benefit of a false positive classification
false_negative_valuefloat: Cost or benefit of a false negative classification

classmethod create(project_id, name, true_positive_value=1, true_negative_value=1, false_positive_value=- 1, false_negative_value=- 1)¶

Create a payoff matrix associated with a specific project.

Parameters

project_idstr: id of the project with which the payoff matrix will be associated

Returns

payoff_matrixPayoffMatrix: The newly created payoff matrix

Return type: PayoffMatrix

classmethod list(project_id)¶

Fetch all the payoff matrices for a project.

Parameters

project_idstr: id of the project
Returns
——-
List of PayoffMatrix: A list of PayoffMatrix objects
Raises
——
datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: List[PayoffMatrix]

classmethod get(project_id, id)¶

Retrieve a specified payoff matrix.

Parameters

project_idstr: id of the project the model belongs to
idstr: id of the payoff matrix

Returns

PayoffMatrix object representing specified
payoff matrix

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: PayoffMatrix

classmethod update(project_id, id, name, true_positive_value, true_negative_value, false_positive_value, false_negative_value)¶

Update (replace) a payoff matrix. Note that all data fields are required.

Parameters

project_idstr: id of the project to which the payoff matrix belongs
idstr: id of the payoff matrix
namestr: User-supplied label for the payoff matrix
true_positive_valuefloat: True positive payoff value to use for the profit curve
true_negative_valuefloat: True negative payoff value to use for the profit curve
false_positive_valuefloat: False positive payoff value to use for the profit curve
false_negative_valuefloat: False negative payoff value to use for the profit curve

Returns

payoff_matrix: PayoffMatrix with updated values

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: PayoffMatrix

classmethod delete(project_id, id)¶

Delete a specified payoff matrix.

Parameters

project_idstr: id of the project the model belongs to
idstr: id of the payoff matrix

Returns

responserequests.Response: Empty response (204)

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: Response

classmethod from_data(data)¶

Instantiate an object of this class using a dict.

Parameters

datadict: Correctly snake_cased keys and their values.

Return type: TypeVar(T, bound= APIObject)

classmethod from_server_data(data, keep_attrs=None)¶

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters

datadict: The directly translated dict of JSON from the server. No casing fixes have taken place
keep_attrsiterable: List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

Return type: TypeVar(T, bound= APIObject)

PredictJob¶

datarobot.models.predict_job.wait_for_async_predictions(project_id, predict_job_id, max_wait=600)¶

Given a Project id and PredictJob id poll for status of process responsible for predictions generation until it’s finished

Parameters

project_idstr: The identifier of the project
predict_job_idstr: The identifier of the PredictJob
max_waitint, optional: Time in seconds after which predictions creation is considered unsuccessful

Returns

predictionspandas.DataFrame: Generated predictions.

Raises

AsyncPredictionsGenerationError: Raised if status of fetched PredictJob object is error
AsyncTimeoutError: Predictions weren’t generated in time, specified by max_wait parameter

Return type: DataFrame

class datarobot.models.PredictJob(data, completed_resource_url=None)¶

Tracks asynchronous work being done within a project

Attributes

idint: the id of the job
project_idstr: the id of the project the job belongs to
statusstr: the status of the job - will be one of datarobot.enums.QUEUE_STATUS
job_typestr: what kind of work the job is doing - will be ‘predict’ for predict jobs
is_blockedbool: if true, the job is blocked (cannot be executed) until its dependencies are resolved
messagestr: a message about the state of the job, typically explaining why an error occurred

classmethod from_job(job)¶

Transforms a generic Job into a PredictJob

Parameters

job: Job: A generic job representing a PredictJob

Returns

predict_job: PredictJob: A fully populated PredictJob with all the details of the job

Raises

ValueError:: If the generic Job was not a predict job, e.g. job_type != JOB_TYPE.PREDICT

Return type: PredictJob

classmethod get(project_id, predict_job_id)¶

Fetches one PredictJob. If the job finished, raises PendingJobFinished exception.

Parameters

project_idstr: The identifier of the project the model on which prediction was started belongs to
predict_job_idstr: The identifier of the predict_job

Returns

predict_jobPredictJob: The pending PredictJob

Raises

PendingJobFinished: If the job being queried already finished, and the server is re-routing to the finished predictions.
AsyncFailureError: Querying this resource gave a status code other than 200 or 303

Return type: PredictJob

classmethod get_predictions(project_id, predict_job_id, class_prefix='class_')¶

Fetches finished predictions from the job used to generate them.

Note

The prediction API for classifications now returns an additional prediction_values dictionary that is converted into a series of class_prefixed columns in the final dataframe. For example, <label> = 1.0 is converted to ‘class_1.0’. If you are on an older version of the client (prior to v2.8), you must update to v2.8 to correctly pivot this data.

Parameters

project_idstr: The identifier of the project to which belongs the model used for predictions generation
predict_job_idstr: The identifier of the predict_job
class_prefixstr: The prefix to append to labels in the final dataframe (e.g., apple -> class_apple)

Returns

predictionspandas.DataFrame: Generated predictions

Raises

JobNotFinished: If the job has not finished yet
AsyncFailureError: Querying the predict_job in question gave a status code other than 200 or 303

Return type: DataFrame

cancel()¶: Cancel this job. If this job has not finished running, it will be removed and canceled.

get_result(params=None)¶

Parameters

paramsdict or None: Query parameters to be added to request to get results.
For featureEffects, source param is required to define source,
otherwise the default is `training`

Returns

resultobject

Return type depends on the job type:

for model jobs, a Model is returned
for predict jobs, a pandas.DataFrame (with predictions) is returned
for featureImpact jobs, a list of dicts by default (see with_metadata parameter of the FeatureImpactJob class and its get() method).
for primeRulesets jobs, a list of Rulesets
for primeModel jobs, a PrimeModel
for primeDownloadValidation jobs, a PrimeFile
for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
for predictionExplanations jobs, a PredictionExplanations
for featureEffects, a FeatureEffects

Raises

JobNotFinished: If the job is not finished, the result is not available.
AsyncProcessUnsuccessfulError: If the job errored or was aborted

get_result_when_complete(max_wait=600, params=None)¶

Parameters

max_waitint, optional: How long to wait for the job to finish.
paramsdict, optional: Query parameters to be added to request.

Returns

result: object: Return type is the same as would be returned by Job.get_result.

Raises

AsyncTimeoutError: If the job does not finish in time
AsyncProcessUnsuccessfulError: If the job errored or was aborted

refresh()¶: Update this object with the latest job data from the server.

wait_for_completion(max_wait=600)¶

Waits for job to complete.

Parameters

max_waitint, optional: How long to wait for the job to finish.

Return type: None

Prediction Dataset¶

class datarobot.models.PredictionDataset(project_id, id, name, created, num_rows, num_columns, forecast_point=None, predictions_start_date=None, predictions_end_date=None, relax_known_in_advance_features_check=None, data_quality_warnings=None, forecast_point_range=None, data_start_date=None, data_end_date=None, max_forecast_date=None, actual_value_column=None, detected_actual_value_columns=None, contains_target_values=None, secondary_datasets_config_id=None)¶

A dataset uploaded to make predictions

Typically created via project.upload_dataset

Attributes

idstr

the id of the dataset

project_idstr

the id of the project the dataset belongs to

createdstr

the time the dataset was created

namestr

the name of the dataset

num_rowsint

the number of rows in the dataset

num_columnsint

the number of columns in the dataset

forecast_pointdatetime.datetime or None

For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series predictions documentation for more information.

predictions_start_datedatetime.datetime or None, optional

For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.

predictions_end_datedatetime.datetime or None, optional

For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.

relax_known_in_advance_features_checkbool, optional

(New in version v2.15) For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

data_quality_warningsdict, optional

(New in version v2.15) A dictionary that contains available warnings about potential problems in this prediction dataset. Available warnings include:

has_kia_missing_values_in_forecast_windowbool: Applicable for time series projects. If True, known in advance features have missing values in forecast window which may decrease prediction accuracy.
insufficient_rows_for_evaluating_modelsbool: Applicable for datasets which are used as external test sets. If True, there is not enough rows in dataset to calculate insights.
single_class_actual_value_columnbool: Applicable for datasets which are used as external test sets. If True, actual value column has only one class and such insights as ROC curve can not be calculated. Only applies for binary classification projects or unsupervised projects.

forecast_point_rangelist[datetime.datetime] or None, optional

(New in version v2.20) For time series projects only. Specifies the range of dates available for use as a forecast point.

data_start_datedatetime.datetime or None, optional

(New in version v2.20) For time series projects only. The minimum primary date of this prediction dataset.

data_end_datedatetime.datetime or None, optional

(New in version v2.20) For time series projects only. The maximum primary date of this prediction dataset.

max_forecast_datedatetime.datetime or None, optional

(New in version v2.20) For time series projects only. The maximum forecast date of this prediction dataset.

actual_value_columnstring, optional

(New in version v2.21) Optional, only available for unsupervised projects, in case dataset was uploaded with actual value column specified. Name of the column which will be used to calculate the classification metrics and insights.

detected_actual_value_columnslist of dict, optional

(New in version v2.21) For unsupervised projects only, list of detected actual value columns information containing missing count and name for each column.

contains_target_valuesbool, optional

(New in version v2.21) Only for supervised projects. If True, dataset contains target values and can be used to calculate the classification metrics and insights.

secondary_datasets_config_id: string or None, optional

(New in version v2.23) The Id of the alternative secondary dataset config to use during prediction for Feature discovery project.

classmethod get(project_id, dataset_id)¶

Retrieve information about a dataset uploaded for predictions

Parameters

project_id:: the id of the project to query
dataset_id:: the id of the dataset to retrieve

Returns

dataset: PredictionDataset: A dataset uploaded to make predictions

Return type: PredictionDataset

delete()¶

Delete a dataset uploaded for predictions

Will also delete predictions made using this dataset and cancel any predict jobs using this dataset.

Return type: None

Prediction Explanations¶

class datarobot.PredictionExplanationsInitialization(project_id, model_id, prediction_explanations_sample=None)¶

Represents a prediction explanations initialization of a model.

Attributes

project_idstr: id of the project the model belongs to
model_idstr: id of the model the prediction explanations initialization is for
prediction_explanations_samplelist of dict: a small sample of prediction explanations that could be generated for the model

classmethod get(project_id, model_id)¶

Retrieve the prediction explanations initialization for a model.

Prediction explanations initializations are a prerequisite for computing prediction explanations, and include a sample what the computed prediction explanations for a prediction dataset would look like.

Parameters

project_idstr: id of the project the model belongs to
model_idstr: id of the model the prediction explanations initialization is for

Returns

prediction_explanations_initializationPredictionExplanationsInitialization: The queried instance.

Raises

ClientError (404): If the project or model does not exist or the initialization has not been computed.

classmethod create(project_id, model_id)¶

Create a prediction explanations initialization for the specified model.

Parameters

project_idstr: id of the project the model belongs to
model_idstr: id of the model for which initialization is requested

Returns

jobJob: an instance of created async job

delete()¶: Delete this prediction explanations initialization.

class datarobot.PredictionExplanations(id, project_id, model_id, dataset_id, max_explanations, num_columns, finish_time, prediction_explanations_location, threshold_low=None, threshold_high=None, class_names=None, num_top_classes=None)¶

Represents prediction explanations metadata and provides access to computation results.

Examples

prediction_explanations = dr.PredictionExplanations.get(project_id, explanations_id)
for row in prediction_explanations.get_rows():
    print(row)  # row is an instance of PredictionExplanationsRow

Attributes

idstr: id of the record and prediction explanations computation result
project_idstr: id of the project the model belongs to
model_idstr: id of the model the prediction explanations are for
dataset_idstr: id of the prediction dataset prediction explanations were computed for
max_explanationsint: maximum number of prediction explanations to supply per row of the dataset
threshold_lowfloat: the lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset
threshold_highfloat: the high threshold, above which a prediction must score in order for prediction explanations to be computed for a row in the dataset
num_columnsint: the number of columns prediction explanations were computed for
finish_timefloat: timestamp referencing when computation for these prediction explanations finished
prediction_explanations_locationstr: where to retrieve the prediction explanations

classmethod get(project_id, prediction_explanations_id)¶

Retrieve a specific prediction explanations metadata.

Parameters

project_idstr: id of the project the explanations belong to
prediction_explanations_idstr: id of the prediction explanations

Returns

prediction_explanationsPredictionExplanations: The queried instance.

classmethod create(project_id, model_id, dataset_id, max_explanations=None, threshold_low=None, threshold_high=None, mode=None)¶

Create prediction explanations for the specified dataset.

In order to create PredictionExplanations for a particular model and dataset, you must first:

Compute feature impact for the model via datarobot.Model.get_feature_impact()

Compute a PredictionExplanationsInitialization for the model via datarobot.PredictionExplanationsInitialization.create(project_id, model_id)

Compute predictions for the model and dataset via datarobot.Model.request_predictions(dataset_id)

threshold_high and threshold_low are optional filters applied to speed up computation. When at least one is specified, only the selected outlier rows will have prediction explanations computed. Rows are considered to be outliers if their predicted value (in case of regression projects) or probability of being the positive class (in case of classification projects) is less than threshold_low or greater than thresholdHigh. If neither is specified, prediction explanations will be computed for all rows.

Parameters

project_idstr: id of the project the model belongs to
model_idstr: id of the model for which prediction explanations are requested
dataset_idstr: id of the prediction dataset for which prediction explanations are requested
threshold_lowfloat, optional: the lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset. If neither threshold_high nor threshold_low is specified, prediction explanations will be computed for all rows.
threshold_highfloat, optional: the high threshold, above which a prediction must score in order for prediction explanations to be computed. If neither threshold_high nor threshold_low is specified, prediction explanations will be computed for all rows.
max_explanationsint, optional: the maximum number of prediction explanations to supply per row of the dataset, default: 3.
modePredictionExplanationsMode, optional: mode of calculation for multiclass models, if not specified - server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).

Returns

job: Job: an instance of created async job

classmethod list(project_id, model_id=None, limit=None, offset=None)¶

List of prediction explanations metadata for a specified project.

Parameters

project_idstr: id of the project to list prediction explanations for
model_idstr, optional: if specified, only prediction explanations computed for this model will be returned
limitint or None: at most this many results are returned, default: no limit
offsetint or None: this many results will be skipped, default: 0

Returns

prediction_explanationslist[PredictionExplanations]

get_rows(batch_size=None, exclude_adjusted_predictions=True)¶

Retrieve prediction explanations rows.

Parameters

batch_sizeint or None, optional: maximum number of prediction explanations rows to retrieve per request
exclude_adjusted_predictionsbool: Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

Yields

prediction_explanations_rowPredictionExplanationsRow: Represents prediction explanations computed for a prediction row.

is_multiclass()¶: Whether these explanations are for a multiclass project or a non-multiclass project

is_unsupervised_clustering_or_multiclass()¶: Clustering and muliclass XEMP always has either one of num_top_classes or class_names parameters set

get_number_of_explained_classes()¶: How many classes we attempt to explain for each row

get_all_as_dataframe(exclude_adjusted_predictions=True)¶

Retrieve all prediction explanations rows and return them as a pandas.DataFrame.

Returned dataframe has the following structure:

row_id : row id from prediction dataset

prediction : the output of the model for this row

adjusted_prediction : adjusted prediction values (only appears for projects that utilize prediction adjustments, e.g. projects with an exposure column)

class_0_label : a class level from the target (only appears for classification projects)

class_0_probability : the probability that the target is this class (only appears for classification projects)

class_1_label : a class level from the target (only appears for classification projects)

class_1_probability : the probability that the target is this class (only appears for classification projects)

explanation_0_feature : the name of the feature contributing to the prediction for this explanation

explanation_0_feature_value : the value the feature took on

explanation_0_label : the output being driven by this explanation. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.

explanation_0_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this explanation

explanation_0_per_ngram_text_explanations : Text prediction explanations data in json formatted string.

explanation_0_strength : the amount this feature’s value affected the prediction

…

explanation_N_feature : the name of the feature contributing to the prediction for this explanation

explanation_N_feature_value : the value the feature took on

explanation_N_label : the output being driven by this explanation. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.

explanation_N_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this explanation

explanation_N_per_ngram_text_explanations : Text prediction explanations data in json formatted string.

explanation_N_strength : the amount this feature’s value affected the prediction

For classification projects, the server does not guarantee any ordering on the prediction values, however within this function we sort the values so that class_X corresponds to the same class from row to row.

Parameters

exclude_adjusted_predictionsbool: Optional, defaults to True. Set this to False to include adjusted prediction values in the returned dataframe.

Returns

dataframe: pandas.DataFrame

download_to_csv(filename, encoding='utf-8', exclude_adjusted_predictions=True)¶

Save prediction explanations rows into CSV file.

Parameters

filenamestr or file object: path or file object to save prediction explanations rows
encodingstring, optional: A string representing the encoding to use in the output file, defaults to ‘utf-8’
exclude_adjusted_predictionsbool: Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

get_prediction_explanations_page(limit=None, offset=None, exclude_adjusted_predictions=True)¶

Get prediction explanations.

If you don’t want use a generator interface, you can access paginated prediction explanations directly.

Parameters

limitint or None: the number of records to return, the server will use a (possibly finite) default if not specified
offsetint or None: the number of records to skip, default 0
exclude_adjusted_predictionsbool: Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

Returns

prediction_explanationsPredictionExplanationsPage

delete()¶: Delete these prediction explanations.

class datarobot.models.prediction_explanations.PredictionExplanationsRow(row_id, prediction, prediction_values, prediction_explanations=None, adjusted_prediction=None, adjusted_prediction_values=None)¶

Represents prediction explanations computed for a prediction row.

Notes

PredictionValue contains:

label : describes what this model output corresponds to. For regression projects, it is the name of the target feature. For classification projects, it is a level from the target feature.
value : the output of the prediction. For regression projects, it is the predicted value of the target. For classification projects, it is the predicted probability the row belongs to the class identified by the label.

PredictionExplanation contains:

label : described what output was driven by this explanation. For regression projects, it is the name of the target feature. For classification projects, it is the class whose probability increasing would correspond to a positive strength of this prediction explanation.
feature : the name of the feature contributing to the prediction
feature_value : the value the feature took on for this row
strength : the amount this feature’s value affected the prediction
qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’)

Attributes

row_idint: which row this PredictionExplanationsRow describes
predictionfloat: the output of the model for this row
adjusted_predictionfloat or None: adjusted prediction value for projects that provide this information, None otherwise
prediction_valueslist: an array of dictionaries with a schema described as PredictionValue
adjusted_prediction_valueslist: same as prediction_values but for adjusted predictions
prediction_explanationslist: an array of dictionaries with a schema described as PredictionExplanation

class datarobot.models.prediction_explanations.PredictionExplanationsPage(id, count=None, previous=None, next=None, data=None, prediction_explanations_record_location=None, adjustment_method=None)¶

Represents a batch of prediction explanations received by one request.

Attributes

idstr: id of the prediction explanations computation result
datalist[dict]: list of raw prediction explanations; each row corresponds to a row of the prediction dataset
countint: total number of rows computed
previous_pagestr: where to retrieve previous page of prediction explanations, None if current page is the first
next_pagestr: where to retrieve next page of prediction explanations, None if current page is the last
prediction_explanations_record_locationstr: where to retrieve the prediction explanations metadata
adjustment_methodstr: Adjustment method that was applied to predictions, or ‘N/A’ if no adjustments were done.

classmethod get(project_id, prediction_explanations_id, limit=None, offset=0, exclude_adjusted_predictions=True)¶

Retrieve prediction explanations.

Parameters

project_idstr: id of the project the model belongs to
prediction_explanations_idstr: id of the prediction explanations
limitint or None: the number of records to return; the server will use a (possibly finite) default if not specified
offsetint or None: the number of records to skip, default 0
exclude_adjusted_predictionsbool: Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.

Returns

prediction_explanationsPredictionExplanationsPage: The queried instance.

class datarobot.models.ShapMatrix(project_id, id, model_id=None, dataset_id=None)¶

Represents SHAP based prediction explanations and provides access to score values.

Examples

import datarobot as dr

# request SHAP matrix calculation
shap_matrix_job = dr.ShapMatrix.create(project_id, model_id, dataset_id)
shap_matrix = shap_matrix_job.get_result_when_complete()

# list available SHAP matrices
shap_matrices = dr.ShapMatrix.list(project_id)
shap_matrix = shap_matrices[0]

# get SHAP matrix as dataframe
shap_matrix_values = shap_matrix.get_as_dataframe()

Attributes

project_idstr: id of the project the model belongs to
shap_matrix_idstr: id of the generated SHAP matrix
model_idstr: id of the model used to
dataset_idstr: id of the prediction dataset SHAP values were computed for

classmethod create(project_id, model_id, dataset_id)¶

Calculate SHAP based prediction explanations against previously uploaded dataset.

Parameters

project_idstr: id of the project the model belongs to
model_idstr: id of the model for which prediction explanations are requested
dataset_idstr: id of the prediction dataset for which prediction explanations are requested (as uploaded from Project.upload_dataset)

Returns

jobShapMatrixJob: The job computing the SHAP based prediction explanations

Raises

ClientError: If the server responded with 4xx status. Possible reasons are project, model or dataset don’t exist, user is not allowed or model doesn’t support SHAP based prediction explanations
ServerError: If the server responded with 5xx status

Return type: ShapMatrixJob

classmethod list(project_id)¶

Fetch all the computed SHAP prediction explanations for a project.

Parameters

project_idstr: id of the project

Returns

List of ShapMatrix: A list of ShapMatrix objects

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: List[ShapMatrix]

classmethod get(project_id, id)¶

Retrieve the specific SHAP matrix.

Parameters

project_idstr: id of the project the model belongs to
idstr: id of the SHAP matrix

Returns

ShapMatrix object representing specified record

Return type: ShapMatrix

get_as_dataframe(read_timeout=60)¶

Retrieve SHAP matrix values as dataframe.

Returns

dataframepandas.DataFrame: A dataframe with SHAP scores
read_timeoutint (optional, default 60): New in version 2.29.

Wait this many seconds for the server to respond.

Raises

datarobot.errors.ClientError: if the server responded with 4xx status.
datarobot.errors.ServerError: if the server responded with 5xx status.

Return type: DataFrame

class datarobot.models.ClassListMode(class_names)¶

Calculate prediction explanations for the specified classes in each row.

Attributes

class_nameslist: List of class names that will be explained for each dataset row.

get_api_parameters(batch_route=False)¶

Get parameters passed in corresponding API call

Parameters

batch_routebool: Batch routes describe prediction calls with all possible parameters, so to distinguish explanation parameters from others they have prefix in parameters.

Returns

dict

class datarobot.models.TopPredictionsMode(num_top_classes)¶

Calculate prediction explanations for the number of top predicted classes in each row.

Attributes

num_top_classesint: Number of top predicted classes [1..10] that will be explained for each dataset row.

get_api_parameters(batch_route=False)¶

Get parameters passed in corresponding API call

Parameters

batch_routebool: Batch routes describe prediction calls with all possible parameters, so to distinguish explanation parameters from others they have prefix in parameters.

Returns

dict

Predictions¶

class datarobot.models.Predictions(project_id, prediction_id, model_id=None, dataset_id=None, includes_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, shap_warnings=None)¶

Represents predictions metadata and provides access to prediction results.

Examples

List all predictions for a project

import datarobot as dr

# Fetch all predictions for a project
all_predictions = dr.Predictions.list(project_id)

# Inspect all calculated predictions
for predictions in all_predictions:
    print(predictions)  # repr includes project_id, model_id, and dataset_id

Retrieve predictions by id

import datarobot as dr

# Getting predictions by id
predictions = dr.Predictions.get(project_id, prediction_id)

# Dump actual predictions
df = predictions.get_all_as_dataframe()
print(df)

Attributes

project_idstr: id of the project the model belongs to
model_idstr: id of the model
prediction_idstr: id of generated predictions
includes_prediction_intervalsbool, optional: (New in v2.16) For time series projects only. Indicates if prediction intervals will be part of the response. Defaults to False.
prediction_intervals_sizeint, optional: (New in v2.16) For time series projects only. Indicates the percentile used for prediction intervals calculation. Will be present only if includes_prediction_intervals is True.
forecast_pointdatetime.datetime, optional: (New in v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
predictions_start_datedatetime.datetime or None, optional: (New in v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_end_date. Can’t be provided with the forecast_point parameter.
predictions_end_datedatetime.datetime or None, optional: (New in v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with predictions_start_date. Can’t be provided with the forecast_point parameter.
actual_value_columnstring, optional: (New in version v2.21) For time series unsupervised projects only. Actual value column which was used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the forecast_point parameter.
explanation_algorithmdatarobot.enums.EXPLANATIONS_ALGORITHM, optional: (New in version v2.21) If set to ‘shap’, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
max_explanationsint, optional: (New in version v2.21) The maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
shap_warningsdict, optional: (New in version v2.21) Will be present if explanation_algorithm was set to datarobot.enums.EXPLANATIONS_ALGORITHM.SHAP and there were additivity failures during SHAP values calculation.

classmethod list(project_id, model_id=None, dataset_id=None)¶

Fetch all the computed predictions metadata for a project.

Parameters

project_idstr: id of the project
model_idstr, optional: if specified, only predictions metadata for this model will be retrieved
dataset_idstr, optional: if specified, only predictions metadata for this dataset will be retrieved

Returns

A list ofpy:class:Predictions <datarobot.models.Predictions> objects

Return type: List[Predictions]

classmethod get(project_id, prediction_id)¶

Retrieve the specific predictions metadata

Parameters

project_idstr: id of the project the model belongs to
prediction_idstr: id of the prediction set

Returns

Predictions object representing specified
predictions

Return type: Predictions

get_all_as_dataframe(class_prefix='class_', serializer='json')¶

Retrieve all prediction rows and return them as a pandas.DataFrame.

Parameters

class_prefixstr, optional: The prefix to append to labels in the final dataframe. Default is class_ (e.g., apple -> class_apple)
serializerstr, optional: Serializer to use for the download. Options: json (default) or csv.

Returns

dataframe: pandas.DataFrame

Raises

datarobot.errors.ClientError: if the server responded with 4xx status.
datarobot.errors.ServerError: if the server responded with 5xx status.

Return type: DataFrame

download_to_csv(filename, encoding='utf-8', serializer='json')¶

Save prediction rows into CSV file.

Parameters

filenamestr or file object: path or file object to save prediction rows
encodingstring, optional: A string representing the encoding to use in the output file, defaults to ‘utf-8’
serializerstr, optional: Serializer to use for the download. Options: json (default) or csv.

Return type: None

PredictionServer¶

class datarobot.PredictionServer(id=None, url=None, datarobot_key=None)¶

A prediction server can be used to make predictions.

Attributes

idstr, optional: The id of the prediction server.
urlstr: The url of the prediction server.
datarobot_keystr, optional: The Datarobot-Key HTTP header used in requests to this prediction server. Note that in the datarobot.models.Deployment instance there is the default_prediction_server property which has this value as a “kebab-cased” key as opposed to “snake_cased”.

classmethod list()¶

Returns a list of prediction servers a user can use to make predictions.

New in version v2.17.

Returns

prediction_serverslist of PredictionServer instances: Contains a list of prediction servers that can be used to make predictions.

Examples

prediction_servers = PredictionServer.list()
prediction_servers
>>> [PredictionServer('https://example.com')]

Return type: List[PredictionServer]

PrimeFile¶

class datarobot.models.PrimeFile(id=None, project_id=None, parent_model_id=None, model_id=None, ruleset_id=None, language=None, is_valid=None)¶

Represents an executable file available for download of the code for a DataRobot Prime model

Attributes

idstr: the id of the PrimeFile
project_idstr: the id of the project this PrimeFile belongs to
parent_model_idstr: the model being approximated by this PrimeFile
model_idstr: the prime model this file represents
ruleset_idint: the ruleset being used in this PrimeFile
languagestr: the language of the code in this file - see enums.LANGUAGE for possibilities
is_validbool: whether the code passed basic validation

download(filepath)¶

Download the code and save it to a file

Parameters

filepath: string: the location to save the file to

Return type: None

Project¶

class datarobot.models.Project(id=None, project_name=None, mode=None, target=None, target_type=None, holdout_unlocked=None, metric=None, stage=None, partition=None, positive_class=None, created=None, advanced_options=None, max_train_pct=None, max_train_rows=None, file_name=None, credentials=None, feature_engineering_prediction_point=None, unsupervised_mode=None, use_feature_discovery=None, relationships_configuration_id=None, project_description=None, query_generator_id=None, segmentation=None, partitioning_method=None, catalog_id=None, catalog_version_id=None, use_gpu=None)¶

A project built from a particular training dataset

Attributes

idstr: the id of the project
project_namestr: the name of the project
project_descriptionstr: an optional description for the project
modeint: The current autopilot mode. 0: Full Autopilot. 2: Manual Mode. 4: Comprehensive Autopilot. null: Mode not set.
targetstr: the name of the selected target features
target_typestr: Indicating what kind of modeling is being done in this project Options are: ‘Regression’, ‘Binary’ (Binary classification), ‘Multiclass’ (Multiclass classification), ‘Multilabel’ (Multilabel classification)
holdout_unlockedbool: whether the holdout has been unlocked
metricstr: the selected project metric (e.g. LogLoss)
stagestr: the stage the project has reached - one of datarobot.enums.PROJECT_STAGE
partitiondict: information about the selected partitioning options
positive_classstr: for binary classification projects, the selected positive class; otherwise, None
createddatetime: the time the project was created
advanced_optionsAdvancedOptions: information on the advanced options that were selected for the project settings, e.g. a weights column or a cap of the runtime of models that can advance autopilot stages
max_train_pctfloat: The maximum percentage of the project dataset that can be used without going into the validation data or being too large to submit any blueprint for training
max_train_rowsint: the maximum number of rows that can be trained on without going into the validation data or being too large to submit any blueprint for training
file_namestr: The name of the file uploaded for the project dataset
credentialslist, optional: A list of credentials for the datasets used in relationship configuration (previously graphs). For Feature Discovery projects, the list must be formatted in dictionary record format. Provide the catalogVersionId and credentialId for each dataset that is to be used in the project that requires authentication.
feature_engineering_prediction_pointstr, optional: For time-aware Feature Engineering, this parameter specifies the column from the primary dataset to use as the prediction point.
unsupervised_modebool, optional: (New in version v2.20) defaults to False, indicates whether this is an unsupervised project.
relationships_configuration_idstr, optional: (New in version v2.21) id of the relationships configuration to use
query_generator_id: str, optional: (New in version v2.27) id of the query generator applied for time series data prep
segmentationdict, optional: information on the segmentation options for segmented project
partitioning_methodPartitioningMethod, optional: (New in version v3.0) The partitioning class for this project. This attribute should only be used with newly-created projects and before calling Project.analyze_and_model(). After the project has been aimed, see Project.partition for actual partitioning options.
catalog_idstr: (New in version v3.0) ID of the dataset used during creation of the project.
catalog_version_idstr: (New in version v3.0) The object ID of the catalog_version which the project’s dataset belongs to.
use_gpu: bool: (New in version v3.2) Whether project allows usage of GPUs

set_options(options=None, **kwargs)¶

Update the advanced options of this project.

Either accepts an AdvancedOptions object or indiviudal keyword arguments. This is an inplace update.

Raises

ValueError: Raised if an object passed to the options parameter is not an AdvancedOptions instance, a valid keyword argument from the AdvancedOptions class, or a combination of an AdvancedOptions instance AND keyword arguments.

Return type: None

get_options()¶

Return the stored advanced options for this project.

Returns

AdvancedOptions

Return type: AdvancedOptions

classmethod get(project_id)¶

Gets information about a project.

Parameters

project_idstr: The identifier of the project you want to load.

Returns

projectProject: The queried project

Examples

import datarobot as dr
p = dr.Project.get(project_id='54e639a18bd88f08078ca831')
p.id
>>>'54e639a18bd88f08078ca831'
p.project_name
>>>'Some project name'

Return type: TypeVar(TProject, bound= Project)

classmethod create(cls, sourcedata, project_name='Untitled Project', max_wait=600, read_timeout=600, dataset_filename=None, *, use_case=None)¶

Creates a project with provided data.

Project creation is asynchronous process, which means that after initial request we will keep polling status of async process that is responsible for project creation until it’s finished. For SDK users this only means that this method might raise exceptions related to it’s async nature.

Parameters

sourcedatabasestring, file, pathlib.Path or pandas.DataFrame: Dataset to use for the project. If string can be either a path to a local file, url to publicly available file or raw file content. If using a file, the filename must consist of ASCII characters only.
project_namestr, unicode, optional: The name to assign to the empty project.
max_waitint, optional: Time in seconds after which project creation is considered unsuccessful
read_timeout: int: The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete
dataset_filenamestring or None, optional: (New in version v2.14) File name to use for dataset. Ignored for url and file path sources.
use_case: UseCase | string, optional: A single UseCase object or ID to add this new Project to. Must be a kwarg.

Returns

projectProject: Instance with initialized data.

Raises

InputNotUnderstoodError: Raised if sourcedata isn’t one of supported types.
AsyncFailureError: Polling for status of async process resulted in response with unsupported status code. Beginning in version 2.1, this will be ProjectAsyncFailureError, a subclass of AsyncFailureError
AsyncProcessUnsuccessfulError: Raised if project creation was unsuccessful
AsyncTimeoutError: Raised if project creation took more time, than specified by max_wait parameter

Examples

p = Project.create('/home/datasets/somedataset.csv',
                   project_name="New API project")
p.id
>>> '5921731dkqshda8yd28h'
p.project_name
>>> 'New API project'

Return type: TypeVar(TProject, bound= Project)

classmethod encrypted_string(plaintext)¶

Sends a string to DataRobot to be encrypted

This is used for passwords that DataRobot uses to access external data sources

Parameters

plaintextstr: The string to encrypt

Returns

ciphertextstr: The encrypted string

Return type: str

classmethod create_from_hdfs(cls, url, port=None, project_name=None, max_wait=600)¶

Create a project from a datasource on a WebHDFS server.

Parameters

urlstr: The location of the WebHDFS file, both server and full path. Per the DataRobot specification, must begin with hdfs://, e.g. hdfs:///tmp/10kDiabetes.csv
portint, optional: The port to use. If not specified, will default to the server default (50070)
project_namestr, optional: A name to give to the project
max_waitint: The maximum number of seconds to wait before giving up.

Returns

Project

Examples

p = Project.create_from_hdfs('hdfs:///tmp/somedataset.csv',
                             project_name="New API project")
p.id
>>> '5921731dkqshda8yd28h'
p.project_name
>>> 'New API project'

classmethod create_from_data_source(cls, data_source_id, username=None, password=None, credential_id=None, use_kerberos=None, credential_data=None, project_name=None, max_wait=600, *, use_case=None)¶

Create a project from a data source. Either data_source or data_source_id should be specified.

Parameters

data_source_idstr: the identifier of the data source.
usernamestr, optional: The username for database authentication. If supplied password must also be supplied.
passwordstr, optional: The password for database authentication. The password is encrypted at server side and never saved / stored. If supplied username must also be supplied.
credential_id: str, optional: The ID of the set of credentials to use instead of user and password. Note that with this change, username and password will become optional.
use_kerberos: bool, optional: Server default is False. If true, use kerberos authentication for database authentication.
credential_data: dict, optional: The credentials to authenticate with the database, to use instead of user/password or credential ID.
project_namestr, optional: optional, a name to give to the project.
max_waitint: optional, the maximum number of seconds to wait before giving up.
use_case: UseCase | string, optional: A single UseCase object or ID to add this new Project to. Must be a kwarg.

Returns

Project

Raises

InvalidUsageError: Raised if either username or password is passed without the other.

classmethod create_from_dataset(cls, dataset_id, dataset_version_id=None, project_name=None, user=None, password=None, credential_id=None, use_kerberos=None, credential_data=None, max_wait=600, *, use_case=None)¶

Create a Project from a datarobot.models.Dataset

Parameters

dataset_id: string: The ID of the dataset entry to user for the project’s Dataset
dataset_version_id: string, optional: The ID of the dataset version to use for the project dataset. If not specified - uses latest version associated with dataset_id
project_name: string, optional: The name of the project to be created. If not specified, will be “Untitled Project” for database connections, otherwise the project name will be based on the file used.
user: string, optional: The username for database authentication.
password: string, optional: The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored
credential_id: string, optional: The ID of the set of credentials to use instead of user and password.
use_kerberos: bool, optional: Server default is False. If true, use kerberos authentication for database authentication.
credential_data: dict, optional: The credentials to authenticate with the database, to use instead of user/password or credential ID.
max_wait: int: optional, the maximum number of seconds to wait before giving up.
use_case: UseCase | string, optional: A single UseCase object or ID to add this new Project to. Must be a kwarg.

Returns

Project

Return type: TypeVar(TProject, bound= Project)

classmethod create_segmented_project_from_clustering_model(cls, clustering_project_id, clustering_model_id, target, max_wait=600, *, use_case=None)¶

Create a new segmented project from a clustering model

Parameters

clustering_project_idstr: The identifier of the clustering project you want to use as the base.
clustering_model_idstr: The identifier of the clustering model you want to use as the segmentation method.
targetstr: The name of the target column that will be used from the clustering project.
max_wait: int: optional, the maximum number of seconds to wait before giving up.
use_case: UseCase | string, optional: A single UseCase object or ID to add this new Project to. Must be a kwarg.

Returns

projectProject: The created project

Return type: TypeVar(TProject, bound= Project)

classmethod from_async(async_location, max_wait=600)¶

Given a temporary async status location poll for no more than max_wait seconds until the async process (project creation or setting the target, for example) finishes successfully, then return the ready project

Parameters

async_locationstr: The URL for the temporary async status resource. This is returned as a header in the response to a request that initiates an async process
max_waitint: The maximum number of seconds to wait before giving up.

Returns

projectProject: The project, now ready

Raises

ProjectAsyncFailureError: If the server returned an unexpected response while polling for the asynchronous operation to resolve
AsyncProcessUnsuccessfulError: If the final result of the asynchronous operation was a failure
AsyncTimeoutError: If the asynchronous operation did not resolve within the time specified

Return type: TypeVar(TProject, bound= Project)

classmethod start(cls, sourcedata, target=None, project_name='Untitled Project', worker_count=None, metric=None, autopilot_on=True, blueprint_threshold=None, response_cap=None, partitioning_method=None, positive_class=None, target_type=None, unsupervised_mode=False, blend_best_models=None, prepare_model_for_deployment=None, consider_blenders_in_recommendation=None, scoring_code_only=None, min_secondary_validation_model_count=None, shap_only_mode=None, relationships_configuration_id=None, autopilot_with_feature_discovery=None, feature_discovery_supervised_feature_reduction=None, unsupervised_type=None, autopilot_cluster_list=None, bias_mitigation_feature_name=None, bias_mitigation_technique=None, include_bias_mitigation_feature_as_predictor_variable=None, *, use_case=None)¶

Chain together project creation, file upload, and target selection.

Note

While this function provides a simple means to get started, it does not expose all possible parameters. For advanced usage, using create, set_options and analyze_and_model directly is recommended.

Parameters

sourcedatastr or pandas.DataFrame: The path to the file to upload. Can be either a path to a local file or a publicly accessible URL (starting with http://, https://, file://, or s3://). If the source is a DataFrame, it will be serialized to a temporary buffer. If using a file, the filename must consist of ASCII characters only.
targetstr, optional: The name of the target column in the uploaded file. Should not be provided if unsupervised_mode is True.
project_namestr: The project name.

Returns

projectProject: The newly created and initialized project.

Other Parameters

worker_countint, optional: The number of workers that you want to allocate to this project.
metricstr, optional: The name of metric to use.
autopilot_onboolean, default True: Whether or not to begin modeling automatically.
blueprint_thresholdint, optional: Number of hours the model is permitted to run. Minimum 1
response_capfloat, optional: Quantile of the response distribution to use for response capping Must be in range 0.5 .. 1.0
partitioning_methodPartitioningMethod object, optional: Instance of one of the Partition Classes defined in datarobot.helpers.partitioning_methods. As an alternative, use Project.set_partitioning_method or Project.set_datetime_partitioning to set the partitioning for the project.
positive_classstr, float, or int; optional: Specifies a level of the target column that should be treated as the positive class for binary classification. May only be specified for binary classification targets.
target_typestr, optional: Override the automatically selected target_type. An example usage would be setting the target_type=’Multiclass’ when you want to preform a multiclass classification task on a numeric column that has a low cardinality. You can use TARGET_TYPE enum.
unsupervised_modeboolean, default False: Specifies whether to create an unsupervised project.
blend_best_models: bool, optional: blend best models during Autopilot run
scoring_code_only: bool, optional: Keep only models that can be converted to scorable java code during Autopilot run.
shap_only_mode: bool, optional: Keep only models that support SHAP values during Autopilot run. Use SHAP-based insights wherever possible. Defaults to False.
prepare_model_for_deployment: bool, optional: Prepare model for deployment during Autopilot run. The preparation includes creating reduced feature list models, retraining best model on higher sample size, computing insights and assigning “RECOMMENDED FOR DEPLOYMENT” label.
consider_blenders_in_recommendation: bool, optional: Include blenders when selecting a model to prepare for deployment in an Autopilot Run. Defaults to False.
min_secondary_validation_model_count: int, optional: Compute “All backtest” scores (datetime models) or cross validation scores for the specified number of highest ranking models on the Leaderboard, if over the Autopilot default.
relationships_configuration_idstr, optional: (New in version v2.23) id of the relationships configuration to use
autopilot_with_feature_discovery: bool, optional.: (New in version v2.23) If true, autopilot will run on a feature list that includes features found via search for interactions.
feature_discovery_supervised_feature_reduction: bool, optional: (New in version v2.23) Run supervised feature reduction for feature discovery projects.
unsupervised_typeUnsupervisedTypeEnum, optional: (New in version v2.27) Specifies whether an unsupervised project is anomaly detection or clustering.
autopilot_cluster_listlist(int), optional: (New in version v2.27) Specifies the list of clusters to build for each model during Autopilot. Specifying multiple values in a list will build models with each number of clusters for the Leaderboard.
bias_mitigation_feature_namestr, optional: The feature from protected features that will be used in a bias mitigation task to mitigate bias
bias_mitigation_techniquestr, optional: One of datarobot.enums.BiasMitigationTechnique Options: - ‘preprocessingReweighing’ - ‘postProcessingRejectionOptionBasedClassification’ The technique by which we’ll mitigate bias, which will inform which bias mitigation task we insert into blueprints
include_bias_mitigation_feature_as_predictor_variablebool, optional: Whether we should also use the mitigation feature as in input to the modeler just like any other categorical used for training, i.e. do we want the model to “train on” this feature in addition to using it for bias mitigation
use_case: UseCase | string, optional: A single UseCase object or ID to add this new Project to. Must be a kwarg.

Raises

AsyncFailureError: Polling for status of async process resulted in response with unsupported status code
AsyncProcessUnsuccessfulError: Raised if project creation or target setting was unsuccessful
AsyncTimeoutError: Raised if project creation or target setting timed out

Examples

Project.start("./tests/fixtures/file.csv",
              "a_target",
              project_name="test_name",
              worker_count=4,
              metric="a_metric")

This is an example of using a URL to specify the datasource:

Project.start("https://example.com/data/file.csv",
              "a_target",
              project_name="test_name",
              worker_count=4,
              metric="a_metric")

Return type: TypeVar(TProject, bound= Project)

classmethod list(search_params=None, use_cases=None, offset=None, limit=None)¶

Returns the projects associated with this account.

Parameters

search_paramsdict, optional.

If not None, the returned projects are filtered by lookup. Currently you can query projects by:

project_name

use_casesUnion[UseCase, List[UseCase], str, List[str]], optional.

If not None, the returned projects are filtered to those associated with a specific Use Case or Use Cases. Accepts either the entity or the ID.

offsetint, optional

If provided, specifies the number of results to skip.

limitint, optional

If provided, specifies the maximum number of results to return. If not provided, returns a maximum of 1000 results.

Returns

projectslist of Project instances: Contains a list of projects associated with this user account.

Raises

TypeError: Raised if search_params parameter is provided, but is not of supported type.

Examples

List all projects .. code-block:: python

p_list = Project.list() p_list >>> [Project(‘Project One’), Project(‘Two’)]

Search for projects by name .. code-block:: python

Project.list(search_params={‘project_name’: ‘red’}) >>> [Project(‘Predtime’), Project(‘Fred Project’)]

List 2nd and 3rd projects .. code-block:: python

Project.list(offset=1, limit=2) >>> [Project(‘Project 2’), Project(‘Project 3’)]

Return type: List[Project]

refresh()¶

Fetches the latest state of the project, and updates this object with that information. This is an in place update, not a new object.

Returns

selfProject: the now-updated project

Return type: None

delete()¶

Removes this project from your account.

Return type: None

analyze_and_model(target=None, mode='quick', metric=None, worker_count=None, positive_class=None, partitioning_method=None, featurelist_id=None, advanced_options=None, max_wait=600, target_type=None, credentials=None, feature_engineering_prediction_point=None, unsupervised_mode=False, relationships_configuration_id=None, class_mapping_aggregation_settings=None, segmentation_task_id=None, unsupervised_type=None, autopilot_cluster_list=None, use_gpu=None)¶

Set target variable of an existing project and begin the autopilot process or send data to DataRobot for feature analysis only if manual mode is specified.

Any options saved using set_options will be used if nothing is passed to advanced_options. However, saved options will be ignored if advanced_options are passed.

Target setting is an asynchronous process, which means that after initial request we will keep polling status of async process that is responsible for target setting until it’s finished. For SDK users this only means that this method might raise exceptions related to it’s async nature.

When execution returns to the caller, the autopilot process will already have commenced (again, unless manual mode is specified).

Parameters

targetstr, optional

The name of the target column in the uploaded file. Should not be provided if unsupervised_mode is True.

modestr, optional

You can use AUTOPILOT_MODE enum to choose between

AUTOPILOT_MODE.FULL_AUTO
AUTOPILOT_MODE.MANUAL
AUTOPILOT_MODE.QUICK
AUTOPILOT_MODE.COMPREHENSIVE: Runs all blueprints in the repository (warning: this may be extremely slow).

If unspecified, QUICK is used. If the MANUAL value is used, the model creation process will need to be started by executing the start_autopilot function with the desired featurelist. It will start immediately otherwise.

metricstr, optional

Name of the metric to use for evaluating models. You can query the metrics available for the target by way of Project.get_metrics. If none is specified, then the default recommended by DataRobot is used.

worker_countint, optional

The number of concurrent workers to request for this project. If None, then the default is used. (New in version v2.14) Setting this to -1 will request the maximum number available to your account.

partitioning_methodPartitioningMethod object, optional

Instance of one of the Partition Classes defined in datarobot.helpers.partitioning_methods. As an alternative, use Project.set_partitioning_method or Project.set_datetime_partitioning to set the partitioning for the project.

positive_classstr, float, or int; optional

Specifies a level of the target column that should be treated as the positive class for binary classification. May only be specified for binary classification targets.

featurelist_idstr, optional

Specifies which feature list to use.

advanced_optionsAdvancedOptions, optional

Used to set advanced options of project creation. Will override any options saved using set_options.

max_waitint, optional

Time in seconds after which target setting is considered unsuccessful.

target_typestr, optional

Override the automatically selected target_type. An example usage would be setting the target_type=’Multiclass’ when you want to preform a multiclass classification task on a numeric column that has a low cardinality. You can use TARGET_TYPE enum.

credentials: list, optional,

a list of credentials for the datasets used in relationship configuration (previously graphs).

feature_engineering_prediction_pointstr, optional

additional aim parameter.

unsupervised_modeboolean, default False

(New in version v2.20) Specifies whether to create an unsupervised project. If True, target may not be provided.

relationships_configuration_idstr, optional

(New in version v2.21) ID of the relationships configuration to use.

segmentation_task_idstr or SegmentationTask, optional

(New in version v2.28) The segmentation task that should be used to split the project for segmented modeling.

unsupervised_typeUnsupervisedTypeEnum, optional

(New in version v2.27) Specifies whether an unsupervised project is anomaly detection or clustering.

autopilot_cluster_listlist(int), optional

(New in version v2.27) Specifies the list of clusters to build for each model during Autopilot. Specifying multiple values in a list will build models with each number of clusters for the Leaderboard.

use_gpubool, optional

(New in version v3.2) Specifies whether project should use GPUs

Returns

projectProject: The instance with updated attributes.

Raises

AsyncFailureError: Polling for status of async process resulted in response with unsupported status code
AsyncProcessUnsuccessfulError: Raised if target setting was unsuccessful
AsyncTimeoutError: Raised if target setting took more time, than specified by max_wait parameter
TypeError: Raised if advanced_options, partitioning_method or target_type is provided, but is not of supported type

Rating Table¶

class datarobot.models.RatingTable(id, rating_table_name, original_filename, project_id, parent_model_id, model_id=None, model_job_id=None, validation_job_id=None, validation_error=None)¶

Interface to modify and download rating tables.

Attributes

idstr: The id of the rating table.
project_idstr: The id of the project this rating table belongs to.
rating_table_namestr: The name of the rating table.
original_filenamestr: The name of the file used to create the rating table.
parent_model_idstr: The model id of the model the rating table was validated against.
model_idstr: The model id of the model that was created from the rating table. Can be None if a model has not been created from the rating table.
model_job_idstr: The id of the job to create a model from this rating table. Can be None if a model has not been created from the rating table.
validation_job_idstr: The id of the created job to validate the rating table. Can be None if the rating table has not been validated.
validation_errorstr: Contains a description of any errors caused during validation.

classmethod get(project_id, rating_table_id)¶

Retrieve a single rating table

Parameters

project_idstr: The ID of the project the rating table is associated with.
rating_table_idstr: The ID of the rating table

Returns

rating_tableRatingTable: The queried instance

Return type: RatingTable

classmethod create(project_id, parent_model_id, filename, rating_table_name='Uploaded Rating Table')¶

Uploads and validates a new rating table CSV

Parameters

project_idstr: id of the project the rating table belongs to
parent_model_idstr: id of the model for which this rating table should be validated against
filenamestr: The path of the CSV file containing the modified rating table.
rating_table_namestr, optional: A human friendly name for the new rating table. The string may be truncated and a suffix may be added to maintain unique names of all rating tables.

Returns

job: Job: an instance of created async job

Raises

InputNotUnderstoodError: Raised if filename isn’t one of supported types.
ClientError (400): Raised if parent_model_id is invalid.

Return type: Job

download(filepath)¶

Download a csv file containing the contents of this rating table

Parameters

filepathstr: The path at which to save the rating table file.

Return type: None

rename(rating_table_name)¶

Renames a rating table to a different name.

Parameters

rating_table_namestr: The new name to rename the rating table to.

Return type: None

create_model()¶

Creates a new model from this rating table record. This rating table must not already be associated with a model and must be valid.

Returns

job: Job: an instance of created async job

Raises

ClientError (422): Raised if creating model from a RatingTable that failed validation
JobAlreadyRequested: Raised if creating model from a RatingTable that is already associated with a RatingTableModel

Return type: Job

Recommended Models¶

class datarobot.models.ModelRecommendation(project_id, model_id, recommendation_type)¶

A collection of information about a recommended model for a project.

Attributes

project_idstr: the id of the project the model belongs to
model_idstr: the id of the recommended model
recommendation_typestr: the type of model recommendation

classmethod get(project_id, recommendation_type=None)¶

Retrieves the default or specified by recommendation_type recommendation.

Parameters

project_idstr: The project’s id.
recommendation_typeenums.RECOMMENDED_MODEL_TYPE: The type of recommendation to get. If None, returns the default recommendation.

Returns

recommended_modelModelRecommendation

Return type: Optional[ModelRecommendation]

classmethod get_all(project_id)¶

Retrieves all of the current recommended models for the project.

Parameters

project_idstr: The project’s id.

Returns

recommended_modelslist of ModelRecommendation

Return type: List[ModelRecommendation]

classmethod get_recommendation(recommended_models, recommendation_type)¶

Returns the model in the given list with the requested type.

Parameters

recommended_modelslist of ModelRecommendation
recommendation_typeenums.RECOMMENDED_MODEL_TYPE: the type of model to extract from the recommended_models list

Returns

recommended_modelModelRecommendation or None if no model with the requested type exists

Return type: Optional[ModelRecommendation]

get_model()¶

Returns the Model associated with this ModelRecommendation.

Returns

recommended_modelModel or DatetimeModel if the project is datetime-partitioned

Return type: Union[DatetimeModel, Model]

Registered Model¶

class datarobot.models.RegisteredModel(id, name, description, created_at, modified_at, target, created_by, last_version_num, is_archived, modified_by=None)¶

A registered model is a logical grouping of model packages that are related to each other.

Attributes

idstr: The ID of the registered model.
namestr: The name of the registered model.
descriptionstr: The description of the registered model.
created_atstr: The creation time of the registered model.
modified_atstr: The last modification time for the registered model.
modified_bydatarobot.models.model_registry.common.UserMetadata: Information on the user who last modified the registered model.
targetTarget: Information on the target variable.
created_bydatarobot.models.model_registry.common.UserMetadata: Information on the creator of the registered model.
last_version_numint: The latest version number associated to this registered model.
is_archivedbool: Determines whether the registered model is archived.

classmethod get(registered_model_id)¶

Get a registered model by ID.

Parameters

registered_model_idstr: ID of the registered model to retrieve

Returns

registered_modelRegisteredModel: Registered Model Object

Examples

from datarobot import RegisteredModel
registered_model = RegisteredModel.get(registered_model_id='5c939e08962d741e34f609f0')
registered_model.id
>>>'5c939e08962d741e34f609f0'
registered_model.name
>>>'My Registered Model'

Return type: TypeVar(TRegisteredModel, bound= RegisteredModel)

classmethod list(limit=100, offset=None, sort_key=None, sort_direction=None, search=None, filters=None)¶

List all registered models a user can view.

Parameters

limitint, optional: Maximum number of registered models to return
offsetint, optional: Number of registered models to skip before returning results
sort_keyRegisteredModelSortKey, optional: Key to order result by
sort_directionRegisteredModelSortDirection, optional: Sort direction
searchstr, optional: A term to search for in registered model name, description, or target name
filtersRegisteredModelListFilters, optional: An object containing all filters that you’d like to apply to the resulting list of registered models.
Returns
——-
registered_modelsList[RegisteredModel]: A list of registered models user can view.

Examples

from datarobot import RegisteredModel
registered_models = RegisteredModel.list()
>>> [RegisteredModel('My Registered Model'), RegisteredModel('My Other Registered Model')]

from datarobot import RegisteredModel
from datarobot.models.model_registry import RegisteredModelListFilters
from datarobot.enums import RegisteredModelSortKey, RegisteredModelSortDirection
filters = RegisteredModelListFilters(target_type='Regression')
registered_models = RegisteredModel.list(
    filters=filters,
    sort_key=RegisteredModelSortKey.NAME.value,
    sort_direction=RegisteredModelSortDirection.DESC.value
    search='other')
>>> [RegisteredModel('My Other Registered Model')]

Return type: List[TypeVar(TRegisteredModel, bound= RegisteredModel)]

classmethod archive(registered_model_id)¶

Permanently archive a registered model and all of its versions.

Parameters

registered_model_idstr: ID of the registered model to be archived

Returns

Return type: None

classmethod update(registered_model_id, name)¶

Update the name of a registered model.

Parameters

registered_model_idstr: ID of the registered model to be updated
namestr: New name for the registered model

Returns

registered_modelRegisteredModel: Updated registered model object

Return type: TypeVar(TRegisteredModel, bound= RegisteredModel)

get_shared_roles(offset=None, limit=None, id=None)¶

Retrieve access control information for this registered model.

Parameters

offsetOptional[int]: The number of records to skip over. Optional. Default is 0.
limit: Optional[int]: The number of records to return. Optional. Default is 100.
id: Optional[str]: Return the access control information for a user with this user ID. Optional.

Return type: List[SharingRole]

share(roles)¶

Share this registered model or remove access from one or more user(s).

Parameters

rolesList[SharingRole]: A list of SharingRole instances, each of which references a user and a role to be assigned.

Examples

>>> from datarobot import RegisteredModel, SharingRole
>>> from datarobot.enums import SHARING_ROLE, SHARING_RECIPIENT_TYPE
>>> registered_model = RegisteredModel.get('5c939e08962d741e34f609f0')
>>> sharing_role = SharingRole(
...    role=SHARING_ROLE.CONSUMER,
...    recipient_type=SHARING_RECIPIENT_TYPE.USER,
...    id='5c939e08962d741e34f609f0',
...    can_share=True,
...    )
>>> registered_model.share(roles=[sharing_role])

Return type: None

get_version(version_id)¶

Retrieve a registered model version.

Parameters

version_idstr: The ID of the registered model version to retrieve.

Returns

registered_model_versionRegisteredModelVersion: A registered model version object.

Examples

from datarobot import RegisteredModel
registered_model = RegisteredModel.get('5c939e08962d741e34f609f0')
registered_model_version = registered_model.get_version('5c939e08962d741e34f609f0')
>>> RegisteredModelVersion('My Registered Model Version')

Return type: RegisteredModelVersion

list_versions(filters=None, search=None, sort_key=None, sort_direction=None, limit=None, offset=None)¶

Retrieve a list of registered model versions.

Parameters

filtersOptional[RegisteredModelVersionsListFilters]: A RegisteredModelVersionsListFilters instance used to filter the list of registered model versions returned.
searchOptional[str]: A search string used to filter the list of registered model versions returned.
sort_keyOptional[RegisteredModelVersionSortKey]: The key to use to sort the list of registered model versions returned.
sort_directionOptional[RegisteredModelSortDirection]: The direction to use to sort the list of registered model versions returned.
limitOptional[int]: The maximum number of registered model versions to return. Default is 100.
offsetOptional[int]: The number of registered model versions to skip over. Default is 0.

Returns

registered_model_versionsList[RegisteredModelVersion]: A list of registered model version objects.

Examples

from datarobot import RegisteredModel
from datarobot.models.model_registry import RegisteredModelVersionsListFilters
from datarobot.enums import RegisteredModelSortKey, RegisteredModelSortDirection
registered_model = RegisteredModel.get('5c939e08962d741e34f609f0')
filters = RegisteredModelVersionsListFilters(tags=['tag1', 'tag2'])
registered_model_versions = registered_model.list_versions(filters=filters)
>>> [RegisteredModelVersion('My Registered Model Version')]

Return type: List[RegisteredModelVersion]

list_associated_deployments(search=None, sort_key=None, sort_direction=None, limit=None, offset=None)¶

Retrieve a list of deployments associated with this registered model.

Parameters

searchOptional[str]
sort_keyOptional[RegisteredModelDeploymentSortKey]
sort_directionOptional[RegisteredModelSortDirection]
limitOptional[int]
offsetOptional[int]

Returns

deploymentsList[VersionAssociatedDeployment]: A list of deployments associated with this registered model.

Return type: List[VersionAssociatedDeployment]

class datarobot.models.RegisteredModelVersion(id, registered_model_id, registered_model_version, name, model_id, model_execution_type, is_archived, import_meta, source_meta, model_kind, target, model_description, datasets, timeseries, is_deprecated, permissions, active_deployment_count, bias_and_fairness=None, build_status=None, user_provided_id=None, updated_at=None, updated_by=None, tags=None, mlpkg_file_contents=None)¶

Represents a version of a registered model.

Parameters

idstr

The ID of the registered model version.

registered_model_idstr

The ID of the parent registered model.

registered_model_versionint

The version of the registered model.

namestr

The name of the registered model version.

model_idstr

The ID of the model.

model_execution_typestr

Type of model package. dedicated (native DataRobot models) and custom_inference_model` (user added inference models) both execute on DataRobot prediction servers, external do not

is_archivedbool

Whether the model package(version) is permanently archived (cannot be used in deployment or: replacement)

import_metaImportMeta

Information from when this Model Package was first saved.

source_metaSourceMeta

Meta information from where this model was generated

model_kindModelKind

Model attribute information.

targetTarget

Target information for the registered model version.

model_descriptionModelDescription

Model description information.

datasetsDataset

Dataset information for the registered model version.

timeseriesTimeseries

Timeseries information for the registered model version.

bias_and_fairnessBiasAndFairness

Bias and fairness information for the registered model version.

is_deprecatedbool

Whether the model package(version) is deprecated (cannot be used in deployment or: replacement)

permissionsList[str]

Permissions for the registered model version.

active_deployment_countint or None

Number of the active deployments associated with the registered model version.

build_statusstr or None

Model package build status. One of complete, inProgress, failed.

user_provided_idstr or None

User provided ID for the registered model version.

updated_atstr or None

The time the registered model version was last updated.

updated_byUserMetadata or None

The user who last updated the registered model version.

tagsList[TagWithId] or None

The tags associated with the registered model version.

mlpkg_file_contentsstr or None

The contents of the model package file.

classmethod create_for_leaderboard_item(model_id, name=None, prediction_threshold=None, distribution_prediction_model_id=None, description=None, compute_all_ts_intervals=None, registered_model_name=None, registered_model_id=None, tags=None, registered_model_tags=None, registered_model_description=None)¶

Parameters

model_idstr: ID of the DataRobot model.
namestr or None: Name of the version (model package).
prediction_thresholdfloat or None: Threshold used for binary classification in predictions.
distribution_prediction_model_idstr or None: ID of the DataRobot distribution prediction model trained on predictions from the DataRobot model.
descriptionstr or None: Description of the version (model package).
compute_all_ts_intervalsbool or None: Whether to compute all time series prediction intervals (1-100 percentiles).
registered_model_nameOptional[str]: Name of the new registered model that will be created from this model package. The model package will be created as version 1 of the created registered model. If neither registeredModelName nor registeredModelId is provided, it defaults to the model package name. Mutually exclusive with registeredModelId.
registered_model_idOptional[str]: Creates a model package as a new version for the provided registered model ID. Mutually exclusive with registeredModelName.
tagsOptional[List[Tag]]: Tags for the registered model version.
registered_model_tags: Optional[List[Tag]]: Tags for the registered model.
registered_model_description: Optional[str]: Description for the registered model.

Returns

regitered_model_versionRegisteredModelVersion: A new registered model version object.

Return type: TypeVar(TRegisteredModelVersion, bound= RegisteredModelVersion)

classmethod create_for_external(name, target, model_id=None, model_description=None, datasets=None, timeseries=None, registered_model_name=None, registered_model_id=None, tags=None, registered_model_tags=None, registered_model_description=None)¶

Create a new registered model version from an external model.

Parameters

namestr: Name of the registered model version.
targetExternalTarget: Target information for the registered model version.
model_idOptional[str]: Model ID of the registered model version.
model_descriptionOptional[ModelDescription]: Information about the model.
datasetsOptional[ExternalDatasets]: Dataset information for the registered model version.
timeseriesOptional[Timeseries]: Timeseries properties for the registered model version.
registered_model_nameOptional[str]: Name of the new registered model that will be created from this model package. The model package will be created as version 1 of the created registered model. If neither registeredModelName nor registeredModelId is provided, it defaults to the model package name. Mutually exclusive with registeredModelId.
registered_model_idOptional[str]: Creates a model package as a new version for the provided registered model ID. Mutually exclusive with registeredModelName.
tagsOptional[List[Tag]]: Tags for the registered model version.
registered_model_tags: Optional[List[Tag]]: Tags for the registered model.
registered_model_description: Optional[str]: Description for the registered model.

Returns

registered_model_versionRegisteredModelVersion: A new registered model version object.

Return type: TypeVar(TRegisteredModelVersion, bound= RegisteredModelVersion)

classmethod create_for_custom_model_version(custom_model_version_id, name=None, description=None, registered_model_name=None, registered_model_id=None, tags=None, registered_model_tags=None, registered_model_description=None)¶

Create a new registered model version from a custom model version.

Parameters

custom_model_version_idstr: ID of the custom model version.
nameOptional[str]: Name of the registered model version.
descriptionOptional[str]: Description of the registered model version.
registered_model_nameOptional[str]: Name of the new registered model that will be created from this model package. The model package will be created as version 1 of the created registered model. If neither registeredModelName nor registeredModelId is provided, it defaults to the model package name. Mutually exclusive with registeredModelId.
registered_model_idOptional[str]: Creates a model package as a new version for the provided registered model ID. Mutually exclusive with registeredModelName.
tagsOptional[List[Tag]]: Tags for the registered model version.
registered_model_tags: Optional[List[Tag]]: Tags for the registered model.
registered_model_description: Optional[str]: Description for the registered model.

Returns

registered_model_versionRegisteredModelVersion: A new registered model version object.

Return type: TypeVar(TRegisteredModelVersion, bound= RegisteredModelVersion)

list_associated_deployments(search=None, sort_key=None, sort_direction=None, limit=None, offset=None)¶

Retrieve a list of deployments associated with this registered model version.

Parameters

searchOptional[str]
sort_keyOptional[RegisteredModelDeploymentSortKey]
sort_directionOptional[RegisteredModelSortDirection]
limitOptional[int]
offsetOptional[int]

Returns

deploymentsList[VersionAssociatedDeployment]: A list of deployments associated with this registered model version.

Return type: List[VersionAssociatedDeployment]

class datarobot.models.model_registry.deployment.VersionAssociatedDeployment(id, currently_deployed, registered_model_version, is_challenger, status, label=None, first_deployed_at=None, first_deployed_by=None, created_by=None, prediction_environment=None)¶

Represents a deployment associated with a registered model version.

Parameters

idstr: The ID of the deployment.
currently_deployedbool: Whether this version is currently deployed.
registered_model_versionint: The version of the registered model associated with this deployment.
is_challengerbool: Whether the version associated with this deployment is a challenger.
statusstr: The status of the deployment.
labelstr, optional: The label of the deployment.
first_deployed_atdatetime.datetime, optional: The time the version was first deployed.
first_deployed_byUserMetadata, optional: The user who first deployed the version.
created_byUserMetadata, optional: The user who created the deployment.
prediction_environmentDeploymentPredictionEnvironment, optional: The prediction environment of the deployment.

ROC Curve¶

class datarobot.models.roc_curve.RocCurve(source, roc_points, negative_class_predictions, positive_class_predictions, source_model_id)¶

ROC curve data for model.

Attributes

sourcestr: ROC curve data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.
roc_pointslist of dict: List of precalculated metrics associated with thresholds for ROC curve.
negative_class_predictionslist of float: List of predictions from example for negative class
positive_class_predictionslist of float: List of predictions from example for positive class
source_model_idstr: ID of the model this ROC curve represents; in some cases, insights from the parent of a frozen model may be used

classmethod from_server_data(data, keep_attrs=None, use_insights_format=False, **kwargs)¶

Overwrite APIObject.from_server_data to handle roc curve data retrieved from either legacy URL or /insights/ new URL.

Parameters

datadict: The directly translated dict of JSON from the server. No casing fixes have taken place.
keep_attrsiterable: List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
use_insights_formatbool, optional: Whether to repack the data from the format used in the GET /insights/RocCur/ URL to the format used in the legacy URL.

Return type: RocCurve

class datarobot.models.roc_curve.LabelwiseRocCurve(source, roc_points, negative_class_predictions, positive_class_predictions, source_model_id, label, kolmogorov_smirnov_metric, auc)¶

Labelwise ROC curve data for one label and one source.

Attributes

sourcestr: ROC curve data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.
roc_pointslist of dict: List of precalculated metrics associated with thresholds for ROC curve.
negative_class_predictionslist of float: List of predictions from example for negative class
positive_class_predictionslist of float: List of predictions from example for positive class
source_model_idstr: ID of the model this ROC curve represents; in some cases, insights from the parent of a frozen model may be used
labelstr: Label name for
kolmogorov_smirnov_metricfloat: Kolmogorov-Smirnov metric value for label
aucfloat: AUC metric value for label

Ruleset¶

class datarobot.models.Ruleset(project_id, parent_model_id, ruleset_id, rule_count, score, model_id=None)¶

Represents an approximation of a model with DataRobot Prime

Attributes

idstr: the id of the ruleset
rule_countint: the number of rules used to approximate the model
scorefloat: the validation score of the approximation
project_idstr: the project the approximation belongs to
parent_model_idstr: the model being approximated
model_idstr or None: the model using this ruleset (if it exists). Will be None if no such model has been trained.

request_model()¶

Request training for a model using this ruleset

Training a model using a ruleset is a necessary prerequisite for being able to download the code for a ruleset.

Returns

job: Job: the job fitting the new Prime model

Return type: Job

Segmented Modeling¶

API Reference for entities used in Segmented Modeling. See dedicated User Guide for examples.

class datarobot.CombinedModel(id=None, project_id=None, segmentation_task_id=None, is_active_combined_model=False)¶

A model from a segmented project. Combination of ordinary models in child segments projects.

Attributes

idstr: the id of the model
project_idstr: the id of the project the model belongs to
segmentation_task_idstr: the id of a segmentation task used in this model
is_active_combined_modelbool: flag indicating if this is the active combined model in segmented project

classmethod get(project_id, combined_model_id)¶

Retrieve combined model

Parameters

project_idstr: The project’s id.
combined_model_idstr: Id of the combined model.

Returns

CombinedModel: The queried combined model.

Return type: CombinedModel

classmethod set_segment_champion(project_id, model_id, clone=False)¶

Update a segment champion in a combined model by setting the model_id that belongs to the child project_id as the champion.

Parameters

project_idstr: The project id for the child model that contains the model id.
model_idstr: Id of the model to mark as the champion
clonebool: (New in version v2.29) optional, defaults to False. Defines if combined model has to be cloned prior to setting champion (champion will be set for new combined model if yes).

Returns

combined_model_idstr: Id of the combined model that was updated

Return type: str

get_segments_info()¶

Retrieve Combined Model segments info

Returns

list[SegmentInfo]: List of segments

Return type: List[SegmentInfo]

get_segments_as_dataframe(encoding='utf-8')¶

Retrieve Combine Models segments as a DataFrame.

Parameters

encodingstr, optional: A string representing the encoding to use in the output csv file. Defaults to ‘utf-8’.

Returns

DataFrame: Combined model segments

Return type: DataFrame

get_segments_as_csv(filename, encoding='utf-8')¶

Save the Combine Models segments to a csv.

Parameters

filenamestr or file object: The path or file object to save the data to.
encodingstr, optional: A string representing the encoding to use in the output csv file. Defaults to ‘utf-8’.

Return type: None

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)¶

Inherited from Model - CombinedModels cannot be retrained directly

Return type: NoReturn

train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)¶

Inherited from Model - CombinedModels cannot be retrained directly

Return type: NoReturn

retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)¶

Inherited from Model - CombinedModels cannot be retrained directly

Return type: NoReturn

request_frozen_model(sample_pct=None, training_row_count=None)¶

Inherited from Model - CombinedModels cannot be retrained as frozen

Return type: NoReturn

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)¶

Inherited from Model - CombinedModels cannot be retrained as frozen

Return type: NoReturn

cross_validate()¶

Inherited from Model - CombinedModels cannot request cross validation

Return type: NoReturn

class datarobot.SegmentationTask(id, project_id, name, type, created, segments_count, segments, metadata, data)¶

A Segmentation Task is used for segmenting an existing project into multiple child projects. Each child project (or segment) will be a separate autopilot run. Currently only user defined segmentation is supported.

Example for creating a new SegmentationTask for Time Series segmentation with a user defined id column:

from datarobot import SegmentationTask

# Create the SegmentationTask
segmentation_task_results = SegmentationTask.create(
    project_id=project.id,
    target=target,
    use_time_series=True,
    datetime_partition_column=datetime_partition_column,
    multiseries_id_columns=[multiseries_id_column],
    user_defined_segment_id_columns=[user_defined_segment_id_column]
)

# Retrieve the completed SegmentationTask object from the job results
segmentation_task = segmentation_task_results['completedJobs'][0]

Attributes

idObjectId: The id of the segmentation task.
project_idObjectId: The associated id of the parent project.
typestr: What type of job the segmentation task is associated with, e.g. auto_ml or auto_ts.
createddatetime: The date this segmentation task was created.
segments_countint: The number of segments the segmentation task generated.
segmentslist of strings: The segment names that the segmentation task generated.
metadatadict: List of features that help to identify the parameters used by the segmentation task.
datadict: Optional parameters that are associated with enabled metadata for the segmentation task.

classmethod from_data(data)¶

Instantiate an object of this class using a dict.

Parameters

datadict: Correctly snake_cased keys and their values.

Return type: SegmentationTask

collect_payload()¶

Convert the record to a dictionary

Return type: Dict[str, str]

classmethod create(project_id, target, use_time_series=False, datetime_partition_column=None, multiseries_id_columns=None, user_defined_segment_id_columns=None, max_wait=600, model_package_id=None)¶

Creates segmentation tasks for the project based on the defined parameters.

Parameters

project_idstr: The associated id of the parent project.
targetstr: The column that represents the target in the dataset.
use_time_seriesbool: Whether AutoTS or AutoML segmentations should be generated.
datetime_partition_columnstr or null: Required for Time Series. The name of the column whose values as dates are used to assign a row to a particular partition.
multiseries_id_columnslist of str or null: Required for Time Series. A list of the names of multiseries id columns to define series within the training data. Currently only one multiseries id column is supported.
user_defined_segment_id_columnslist of str or null: Required when using a column for segmentation. A list of the segment id columns to use to define what columns are used to manually segment data. Currently only one user defined segment id column is supported.
model_package_idstr: Required when using automated segmentation. The associated id of the model in the DataRobot Model Registry that will be used to perform automated segmentation on a dataset.
max_waitinteger: The number of seconds to wait

Returns

segmentation_tasksdict: Dictionary containing the numberOfJobs, completedJobs, and failedJobs. completedJobs is a list of SegmentationTask objects, while failed jobs is a list of dictionaries indicating problems with submitted tasks.

Return type: SegmentationTaskCreatedResponse

classmethod list(project_id)¶

List all of the segmentation tasks that have been created for a specific project_id.

Parameters

project_idstr: The id of the parent project

Returns

segmentation_taskslist of SegmentationTask: List of instances with initialized data.

Return type: List[SegmentationTask]

classmethod get(project_id, segmentation_task_id)¶

Retrieve information for a single segmentation task associated with a project_id.

Parameters

project_idstr: The id of the parent project
segmentation_task_idstr: The id of the segmentation task

Returns

segmentation_taskSegmentationTask: Instance with initialized data.

Return type: SegmentationTask

class datarobot.SegmentInfo(project_id, segment, project_stage, project_status_error, autopilot_done, model_count=None, model_id=None)¶

A SegmentInfo is an object containing information about the combined model segments

Attributes

project_idstr: The associated id of the child project.
segmentstr: the name of the segment
project_stagestr: A description of the current stage of the project
project_status_errorstr: Project status error message.
autopilot_donebool: Is autopilot done for the project.
model_countint: Count of trained models in project.
model_idstr: ID of segment champion model.

classmethod list(project_id, model_id)¶

List all of the segments that have been created for a specific project_id.

Parameters

project_idstr: The id of the parent project

Returns

segmentslist of datarobot.models.segmentation.SegmentInfo: List of instances with initialized data.

Return type: List[SegmentInfo]

class datarobot.models.segmentation.SegmentationTask(id, project_id, name, type, created, segments_count, segments, metadata, data)¶

A Segmentation Task is used for segmenting an existing project into multiple child projects. Each child project (or segment) will be a separate autopilot run. Currently only user defined segmentation is supported.

Example for creating a new SegmentationTask for Time Series segmentation with a user defined id column:

from datarobot import SegmentationTask

# Create the SegmentationTask
segmentation_task_results = SegmentationTask.create(
    project_id=project.id,
    target=target,
    use_time_series=True,
    datetime_partition_column=datetime_partition_column,
    multiseries_id_columns=[multiseries_id_column],
    user_defined_segment_id_columns=[user_defined_segment_id_column]
)

# Retrieve the completed SegmentationTask object from the job results
segmentation_task = segmentation_task_results['completedJobs'][0]

Attributes

idObjectId: The id of the segmentation task.
project_idObjectId: The associated id of the parent project.
typestr: What type of job the segmentation task is associated with, e.g. auto_ml or auto_ts.
createddatetime: The date this segmentation task was created.
segments_countint: The number of segments the segmentation task generated.
segmentslist of strings: The segment names that the segmentation task generated.
metadatadict: List of features that help to identify the parameters used by the segmentation task.
datadict: Optional parameters that are associated with enabled metadata for the segmentation task.

classmethod from_data(data)¶

Instantiate an object of this class using a dict.

Parameters

datadict: Correctly snake_cased keys and their values.

Return type: SegmentationTask

collect_payload()¶

Convert the record to a dictionary

Return type: Dict[str, str]

classmethod create(project_id, target, use_time_series=False, datetime_partition_column=None, multiseries_id_columns=None, user_defined_segment_id_columns=None, max_wait=600, model_package_id=None)¶

Creates segmentation tasks for the project based on the defined parameters.

Parameters

project_idstr: The associated id of the parent project.
targetstr: The column that represents the target in the dataset.
use_time_seriesbool: Whether AutoTS or AutoML segmentations should be generated.
datetime_partition_columnstr or null: Required for Time Series. The name of the column whose values as dates are used to assign a row to a particular partition.
multiseries_id_columnslist of str or null: Required for Time Series. A list of the names of multiseries id columns to define series within the training data. Currently only one multiseries id column is supported.
user_defined_segment_id_columnslist of str or null: Required when using a column for segmentation. A list of the segment id columns to use to define what columns are used to manually segment data. Currently only one user defined segment id column is supported.
model_package_idstr: Required when using automated segmentation. The associated id of the model in the DataRobot Model Registry that will be used to perform automated segmentation on a dataset.
max_waitinteger: The number of seconds to wait

Returns

segmentation_tasksdict: Dictionary containing the numberOfJobs, completedJobs, and failedJobs. completedJobs is a list of SegmentationTask objects, while failed jobs is a list of dictionaries indicating problems with submitted tasks.

Return type: SegmentationTaskCreatedResponse

classmethod list(project_id)¶

List all of the segmentation tasks that have been created for a specific project_id.

Parameters

project_idstr: The id of the parent project

Returns

segmentation_taskslist of SegmentationTask: List of instances with initialized data.

Return type: List[SegmentationTask]

classmethod get(project_id, segmentation_task_id)¶

Retrieve information for a single segmentation task associated with a project_id.

Parameters

project_idstr: The id of the parent project
segmentation_task_idstr: The id of the segmentation task

Returns

segmentation_taskSegmentationTask: Instance with initialized data.

Return type: SegmentationTask

class datarobot.models.segmentation.SegmentationTaskCreatedResponse() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶

SHAP¶

class datarobot.models.ShapImpact(count, shap_impacts, row_count=None)¶

Represents SHAP impact score for a feature in a model.

New in version v2.21.

Notes

SHAP impact score for a feature has the following structure:

feature_name : (str) the feature name in dataset
impact_normalized : (float) normalized impact score value (largest value is 1)
impact_unnormalized : (float) raw impact score value

Attributes

countint: the number of SHAP Impact object returned
row_count: int or None: the sample size (specified in rows) to use for Shap Impact computation
shap_impactslist: a list which contains SHAP impact scores for top 1000 features used by a model

classmethod create(project_id, model_id, row_count=None)¶

Create SHAP impact for the specified model.

Parameters

project_idstr: id of the project the model belongs to
model_idstr: id of the model to calculate shap impact for
row_countint: the sample size (specified in rows) to use for Feature Impact computation

Returns

jobJob: an instance of created async job

Return type: Job

classmethod get(project_id, model_id)¶

Retrieve SHAP impact scores for features in a model.

Parameters

project_idstr: id of the project the model belongs to
model_idstr: id of the model the SHAP impact is for

Returns

shap_impactShapImpact: The queried instance.

Raises

ClientError (404): If the project or model does not exist or the SHAP impact has not been computed.

Return type: ShapImpact

SharingAccess¶

class datarobot.SharingAccess(username, role, can_share=None, can_use_data=None, user_id=None)¶

Represents metadata about whom a entity (e.g. a data store) has been shared with

New in version v2.14.

Currently DataStores, DataSources, Datasets, Projects (new in version v2.15) and CalendarFiles (new in version 2.15) can be shared.

This class can represent either access that has already been granted, or be used to grant access to additional users.

Attributes

usernamestr: a particular user
rolestr or None: if a string, represents a particular level of access and should be one of datarobot.enums.SHARING_ROLE. For more information on the specific access levels, see the sharing documentation. If None, can be passed to a share function to revoke access for a specific user.
can_sharebool or None: if a bool, indicates whether this user is permitted to further share. When False, the user has access to the entity, but can only revoke their own access but not modify any user’s access role. When True, the user can share with any other user at a access role up to their own. May be None if the SharingAccess was not retrieved from the DataRobot server but intended to be passed into a share function; this will be equivalent to passing True.
can_use_databool or None: if a bool, indicates whether this user should be able to view, download and process data (use to create projects, predictions, etc). For OWNER can_use_data is always True. If role is empty canUseData is ignored.
user_idstr or None: the id of the user

SharingRole¶

class datarobot.models.sharing.SharingRole(role, share_recipient_type, can_share=None, id=None, user_full_name=None, username=None)¶

Represents metadata about a user who has been granted access to an entity. At least one of id or username must be set.

Attributes

idstr or None: The ID of the user.
rolestr: Represents a particular level of access. Should be one of datarobot.enums.SHARING_ROLE.
share_recipient_typeSHARING_RECIPIENT_TYPE: The type of user for the object of the method. Can be user or organization.
user_full_namestr or None: The full name of the user.
usernamestr or None: The username (usually the email) of the user.
can_sharebool or None: Indicates whether this user is permitted to share with other users. When False, the user has access to the entity, but can only revoke their own access. They cannot not modify any user’s access role. When True, the user can share with any other user at an access role up to their own.

Training Predictions¶

class datarobot.models.training_predictions.TrainingPredictionsIterator(client, path, limit=None)¶

Lazily fetches training predictions from DataRobot API in chunks of specified size and then iterates rows from responses as named tuples. Each row represents a training prediction computed for a dataset’s row. Each named tuple has the following structure:

Notes

Each PredictionValue dict contains these keys:

label
describes what this model output corresponds to. For regression projects, it is the name of the target feature. For classification and multiclass projects, it is a label from the target feature.

value
the output of the prediction. For regression projects, it is the predicted value of the target. For classification and multiclass projects, it is the predicted probability that the row belongs to the class identified by the label.

Each PredictionExplanations dictionary contains these keys:

labelstring
describes what output was driven by this prediction explanation. For regression projects, it is the name of the target feature. For classification projects, it is the class whose probability increasing would correspond to a positive strength of this prediction explanation.

featurestring
the name of the feature contributing to the prediction

feature_valueobject
the value the feature took on for this row. The type corresponds to the feature (boolean, integer, number, string)

strengthfloat
algorithm-specific explanation value attributed to feature in this row

ShapMetadata dictionary contains these keys:

shap_remaining_totalfloat
The total of SHAP values for features beyond the max_explanations. This can be identically 0 in all rows, if max_explanations is greater than the number of features and thus all features are returned.

shap_base_valuefloat
the model’s average prediction over the training data. SHAP values are deviations from the base value.

warningsdict or None
SHAP values calculation warnings (e.g. additivity check failures in XGBoost models). Schema described as ShapWarnings.

ShapWarnings dictionary contains these keys:

mismatch_row_countint
the count of rows for which additivity check failed

max_normalized_mismatchfloat
the maximal relative normalized mismatch value

Examples

import datarobot as dr

# Fetch existing training predictions by their id
training_predictions = dr.TrainingPredictions.get(project_id, prediction_id)

# Iterate over predictions
for row in training_predictions.iterate_rows()
    print(row.row_id, row.prediction)

Attributes

row_idint: id of the record in original dataset for which training prediction is calculated
partition_idstr or float: id of the data partition that the row belongs to. “0.0” corresponds to the validation partition or backtest 1.
predictionfloat: the model’s prediction for this data row
prediction_valueslist of dictionaries: an array of dictionaries with a schema described as PredictionValue
timestampstr or None: (New in version v2.11) an ISO string representing the time of the prediction in time series project; may be None for non-time series projects
forecast_pointstr or None: (New in version v2.11) an ISO string representing the point in time used as a basis to generate the predictions in time series project; may be None for non-time series projects
forecast_distancestr or None: (New in version v2.11) how many time steps are between the forecast point and the timestamp in time series project; None for non-time series projects
series_idstr or None: (New in version v2.11) the id of the series in a multiseries project; may be NaN for single series projects; None for non-time series projects
prediction_explanationslist of dict or None: (New in version v2.21) The prediction explanations for each feature. The total elements in the array are bounded by max_explanations and feature count. Only present if prediction explanations were requested. Schema described as PredictionExplanations.
shap_metadatadict or None: (New in version v2.21) The additional information necessary to understand SHAP based prediction explanations. Only present if explanation_algorithm equals datarobot.enums.EXPLANATIONS_ALGORITHM.SHAP was added in compute request. Schema described as ShapMetadata.

class datarobot.models.training_predictions.TrainingPredictions(project_id, prediction_id, model_id=None, data_subset=None, explanation_algorithm=None, max_explanations=None, shap_warnings=None)¶

Represents training predictions metadata and provides access to prediction results.

Notes

Each element in shap_warnings has the following schema:

partition_namestr: the partition used for the prediction record.
valueobject: the warnings related to this partition.

The objects in value are:

mismatch_row_countint: the count of rows for which additivity check failed.
max_normalized_mismatchfloat: the maximal relative normalized mismatch value.

Examples

Compute training predictions for a model on the whole dataset

import datarobot as dr

# Request calculation of training predictions
training_predictions_job = model.request_training_predictions(dr.enums.DATA_SUBSET.ALL)
training_predictions = training_predictions_job.get_result_when_complete()
print('Training predictions {} are ready'.format(training_predictions.prediction_id))

# Iterate over actual predictions
for row in training_predictions.iterate_rows():
    print(row.row_id, row.partition_id, row.prediction)

List all training predictions for a project

import datarobot as dr

# Fetch all training predictions for a project
all_training_predictions = dr.TrainingPredictions.list(project_id)

# Inspect all calculated training predictions
for training_predictions in all_training_predictions:
    print(
        'Prediction {} is made for data subset "{}"'.format(
            training_predictions.prediction_id,
            training_predictions.data_subset,
        )
    )

Retrieve training predictions by id

import datarobot as dr

# Getting training predictions by id
training_predictions = dr.TrainingPredictions.get(project_id, prediction_id)

# Iterate over actual predictions
for row in training_predictions.iterate_rows():
    print(row.row_id, row.partition_id, row.prediction)

Attributes

project_idstr

id of the project the model belongs to

model_idstr

id of the model

prediction_idstr

id of generated predictions

data_subsetdatarobot.enums.DATA_SUBSET

data set definition used to build predictions. Choices are:

datarobot.enums.DATA_SUBSET.ALL
for all data available. Not valid for models in datetime partitioned projects.
datarobot.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT
for all data except training set. Not valid for models in datetime partitioned projects.
datarobot.enums.DATA_SUBSET.HOLDOUT
for holdout data set only.
datarobot.enums.DATA_SUBSET.ALL_BACKTESTS
for downloading the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.

explanation_algorithmdatarobot.enums.EXPLANATIONS_ALGORITHM

(New in version v2.21) Optional. If set to shap, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).

max_explanationsint

(New in version v2.21) The number of top contributors that are included in prediction explanations. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns.

shap_warningslist

(New in version v2.21) Will be present if explanation_algorithm was set to datarobot.enums.EXPLANATIONS_ALGORITHM.SHAP and there were additivity failures during SHAP values calculation.

classmethod list(project_id)¶

Fetch all the computed training predictions for a project.

Parameters

project_idstr: id of the project

Returns

A list ofpy:class:TrainingPredictions objects

classmethod get(project_id, prediction_id)¶

Retrieve training predictions on a specified data set.

Parameters

project_idstr: id of the project the model belongs to
prediction_idstr: id of the prediction set

Returns

TrainingPredictions object which is ready to operate with specified predictions

iterate_rows(batch_size=None)¶

Retrieve training prediction rows as an iterator.

Parameters

batch_sizeint, optional: maximum number of training prediction rows to fetch per request

Returns

iteratorTrainingPredictionsIterator: an iterator which yields named tuples representing training prediction rows

get_all_as_dataframe(class_prefix='class_', serializer='json')¶

Retrieve all training prediction rows and return them as a pandas.DataFrame.

Returned dataframe has the following structure:

row_id : row id from the original dataset
prediction : the model’s prediction for this row
class_<label> : the probability that the target is this class (only appears for classification and multiclass projects)
timestamp : the time of the prediction (only appears for out of time validation or time series projects)
forecast_point : the point in time used as a basis to generate the predictions (only appears for time series projects)
forecast_distance : how many time steps are between timestamp and forecast_point (only appears for time series projects)
series_id : he id of the series in a multiseries project or None for a single series project (only appears for time series projects)

Parameters

class_prefixstr, optional: The prefix to append to labels in the final dataframe. Default is class_ (e.g., apple -> class_apple)
serializerstr, optional: Serializer to use for the download. Options: json (default) or csv.

Returns

dataframe: pandas.DataFrame

download_to_csv(filename, encoding='utf-8', serializer='json')¶

Save training prediction rows into CSV file.

Parameters

filenamestr or file object: path or file object to save training prediction rows
encodingstring, optional: A string representing the encoding to use in the output file, defaults to ‘utf-8’
serializerstr, optional: Serializer to use for the download. Options: json (default) or csv.

Types¶

class datarobot.models.RocCurveEstimatedMetric¶: Typed dict for estimated metric

class datarobot.models.AnomalyAssessmentRecordMetadata¶: Typed dict for record metadata

class datarobot.models.AnomalyAssessmentPreviewBin¶: Typed dict for preview bin

class datarobot.models.ShapleyFeatureContribution¶: Typed dict for shapley feature contribution

class datarobot.models.AnomalyAssessmentDataPoint¶: Typed dict for data points

class datarobot.models.RegionExplanationsData¶: Typed dict for region explanations

Use Cases¶

class datarobot.UseCase(id, name, created_at, created, updated_at, updated, models_count, projects_count, datasets_count, notebooks_count, applications_count, playgrounds_count, vector_databases_count, members, description=None, owners=None)¶

Representation of a Use Case.

Examples

import datarobot
with UseCase.get("2348ac"):
    print(f"The current use case is {dr.Context.use_case}")

Attributes

idstr: The ID of the Use Case.
namestr: The name of the Use Case.
descriptionstr: The description of the Use Case. Nullable.
created_atstr: The timestamp generated at record creation.
createdUseCaseUser: The user who created the Use Case.
updated_atstr: The timestamp generated when the record was last updated.
updatedUseCaseUser: The most recent user to update the Use Case.
models_countint: The number of models in a Use Case.
projects_countint: The number of projects in a Use Case.
datasets_count: int: The number of datasets in a Use Case.
notebooks_count: int: The number of notebooks in a Use Case.
applications_count: int: The number of applications in a Use Case.
playgrounds_count: int: The number of playgronuds in a Use Case.
vector_databases_count: int: The number of vector databases in a Use Case.
ownersList[UseCaseUser]: The most recent user to update the Use Case.
membersList[UseCaseUser]: The most recent user to update the Use Case.

classmethod get(use_case_id)¶

Gets information about a Use Case.

Parameters

use_case_idstr: The identifier of the Use Case you want to load.

Returns

use_caseUseCase: The queried Use Case.

Return type: UseCase

classmethod list(search_params=None)¶

Returns the Use Cases associated with this account.

Parameters

search_paramsdict, optional.

If not None, the returned projects are filtered by lookup. Currently, you can query use cases by:

offset - The number of records to skip over. Default 0.
limit - The number of records to return in the range from 1 to 100. Default 100.
search - Only return Use Cases with names that match the given string.
project_id - Only return Use Cases associated with the given project ID.
application_id - Only return Use Cases associated with the given app.
sort - The order to sort the Use Cases.

sort queries can use the following options:

id or -id
name or -name
description or -description
projects_count or -projects_count
datasets_count or -datasets_count
notebooks_count or -notebooks_count
applications_count or -applications_count
created_at or -created_at
created_by or -created_by
updated_at or -updated_at
updated_by or -updated_by

Returns

use_caseslist of UseCase instances: Contains a list of Use Cases associated with this user account.

Raises

TypeError: Raised if search_params parameter is provided, but is not of supported type.

Return type: List[UseCase]

classmethod create(name=None, description=None)¶

Create a new Use Case.

Parameters

namestr: Optional. The name of the new Use Case.
description: str: The description of the new Use Case. Optional.

Returns

use_caseUseCase: The created Use Case.

Return type: UseCase

classmethod delete(use_case_id)¶

Delete a Use Case.

Parameters

use_case_idstr: The ID of the Use Case to be deleted.

Return type: None

update(name=None, description=None)¶

Update a Use Case’s name or description.

Parameters

namestr: The updated name of the Use Case.
descriptionstr: The updated description of the Use Case.

Returns

use_caseUseCase: The updated Use Case.

Return type: UseCase

add(entity=None, entity_type=None, entity_id=None)¶

Add an entity (project, dataset, etc.) to a Use Case. Can only accept either an entity or an entity type and entity ID, but not both.

Projects and Applications can only be linked to a single Use Case. Datasets can be linked to multiple Use Cases.

There are some prerequisites for linking Projects to a Use Case which are explained in the user guide.

Parameters

entityUnion[UseCaseReferenceEntity, Project, Dataset, Application]: An existing entity to be linked to this Use Case. Cannot be used if entity_type and entity_id are passed.
entity_typeUseCaseEntityType: The entity type of the entity to link to this Use Case. Cannot be used if entity is passed.
entity_idstr: The ID of the entity to link to this Use Case. Cannot be used if entity is passed.

Returns

use_case_reference_entityUseCaseReferenceEntity: The newly created reference link between this Use Case and the entity.

Return type: UseCaseReferenceEntity

remove(entity=None, entity_type=None, entity_id=None)¶

Remove an entity from a Use Case. Can only accept either an entity or an entity type and entity ID, but not both.

Parameters

entityUnion[UseCaseReferenceEntity, Project, Dataset, Application]: An existing entity instance to be removed from a Use Case. Cannot be used if entity_type and entity_id are passed.
entity_typeUseCaseEntityType: The entity type of the entity to link to this Use Case. Cannot be used if entity is passed.
entity_idstr: The ID of the entity to link to this Use Case. Cannot be used if entity is passed.

Return type: None

share(roles)¶

Share this Use Case with or remove access from one or more user(s).

Parameters

rolesList[SharingRole]

A list of SharingRole instances, each of which references a user and a role to be assigned.

Currently, the only supported roles for Use Cases are OWNER, EDITOR, and CONSUMER, and the only supported SHARING_RECIPIENT_TYPE is USER.

To remove access, set a user’s role to datarobot.enums.SHARING_ROLE.NO_ROLE.

Examples

The SharingRole class is needed in order to share a Use Case with one or more users.

For example, suppose you had a list of user IDs you wanted to share this Use Case with. You could use a loop to generate a list of SharingRole objects for them, and bulk share this Use Case.

>>> from datarobot.models.use_cases.use_case import UseCase
>>> from datarobot.models.sharing import SharingRole
>>> from datarobot.enums import SHARING_ROLE, SHARING_RECIPIENT_TYPE
>>>
>>> user_ids = ["60912e09fd1f04e832a575c1", "639ce542862e9b1b1bfa8f1b", "63e185e7cd3a5f8e190c6393"]
>>> sharing_roles = []
>>> for user_id in user_ids:
...     new_sharing_role = SharingRole(
...         role=SHARING_ROLE.CONSUMER,
...         share_recipient_type=SHARING_RECIPIENT_TYPE.USER,
...         id=user_id,
...         can_share=True,
...     )
...     sharing_roles.append(new_sharing_role)
>>> use_case = UseCase.get(use_case_id="5f33f1fd9071ae13568237b2")
>>> use_case.share(roles=sharing_roles)

Similarly, a SharingRole instance can be used to remove a user’s access if the role is set to SHARING_ROLE.NO_ROLE, like in this example:

>>> from datarobot.models.use_cases.use_case import UseCase
>>> from datarobot.models.sharing import SharingRole
>>> from datarobot.enums import SHARING_ROLE, SHARING_RECIPIENT_TYPE
>>>
>>> user_to_remove = "[email protected]"
... remove_sharing_role = SharingRole(
...     role=SHARING_ROLE.NO_ROLE,
...     share_recipient_type=SHARING_RECIPIENT_TYPE.USER,
...     username=user_to_remove,
...     can_share=False,
... )
>>> use_case = UseCase.get(use_case_id="5f33f1fd9071ae13568237b2")
>>> use_case.share(roles=[remove_sharing_role])

Return type: None

get_shared_roles(offset=None, limit=None, id=None)¶

Retrieve access control information for this Use Case.

Parameters

offsetOptional[int]: The number of records to skip over. Optional. Default is 0.
limit: Optional[int]: The number of records to return. Optional. Default is 100.
id: Optional[str]: Return the access control information for a user with this user ID. Optional.

Return type: List[SharingRole]

list_projects()¶

List all projects associated with this Use Case.

Returns

projectsList[Project]: All projects associated with this Use Case.

Return type: List[TypeVar(T)]

list_datasets()¶

List all datasets associated with this Use Case.

Returns

datasetsList[Dataset]: All datasets associated with this Use Case.

Return type: List[TypeVar(T)]

list_applications()¶

List all applications associated with this Use Case.

Returns

applicationsList[Application]: All applications associated with this Use Case.

Return type: List[TypeVar(T)]

class datarobot.models.use_cases.use_case.UseCaseUser(id, full_name=None, email=None, userhash=None, username=None)¶

Representation of a Use Case user.

Attributes

idstr: The id of the user.
full_namestr: The full name of the user. Optional.
emailstr: The email address of the user. Optional.
userhashstr: User’s gravatar hash. Optional.
usernamestr: The username of the user. Optional.

class datarobot.models.use_cases.use_case.UseCaseReferenceEntity(id, entity_type, entity_id, use_case_id, created_at, created, is_deleted)¶

An entity associated with a Use Case.

Attributes

entity_typeUseCaseEntityType: The type of the entity.
use_case_idstr: The Use Case this entity is associated with.
idstr: The ID of the entity.
created_atstr: The date and time this entity was linked with the Use Case.
is_deletedbool: Whether or not the linked entity has been deleted.
createdUseCaseUser: The user who created the link between this entity and the Use Case.

User Blueprints¶

class datarobot.UserBlueprint(blender, blueprint_id, diagram, features, features_text, icons, insights, model_type, supported_target_types, user_blueprint_id, user_id, is_time_series=False, reference_model=False, shap_support=False, supports_gpu=False, blueprint=None, custom_task_version_metadata=None, hex_column_name_lookup=None, project_id=None, vertex_context=None, blueprint_context=None, **kwargs)¶

A representation of a blueprint which may be modified by the user, saved to a user’s AI Catalog, trained on projects, and shared with others.

It is recommended to install the python library called datarobot_bp_workshop, available via pip, for the best experience when building blueprints.

Please refer to http://blueprint-workshop.datarobot.com for tutorials, examples, and other documentation.

Parameters

blender: bool: Whether the blueprint is a blender.
blueprint_id: string: The deterministic id of the blueprint, based on its content.
custom_task_version_metadata: list(list(string)), Optional: An association of custom entity ids and task ids.
diagram: string: The diagram used by the UI to display the blueprint.
features: list(string): A list of the names of tasks used in the blueprint.
features_text: string: A description of the blueprint via the names of tasks used.
hex_column_name_lookup: list(UserBlueprintsHexColumnNameLookupEntry), Optional: A lookup between hex values and data column names used in the blueprint.
icons: list(int): The icon(s) associated with the blueprint.
insights: string: An indication of the insights generated by the blueprint.
is_time_series: bool (Default=False): Whether the blueprint contains time-series tasks.
model_type: string: The generated or provided title of the blueprint.
project_id: string, Optional: The id of the project the blueprint was originally created with, if applicable.
reference_model: bool (Default=False): Whether the blueprint is a reference model.
shap_support: bool (Default=False): Whether the blueprint supports shapley additive explanations.
supported_target_types: list(enum(‘binary’, ‘multiclass’, ‘multilabel’, ‘nonnegative’,
‘regression’, ‘unsupervised’, ‘unsupervisedclustering’)): The list of supported targets of the current blueprint.
supports_gpu: bool (Default=False): Whether the blueprint supports execution on the GPU.
user_blueprint_id: string: The unique id associated with the user blueprint.
user_id: string: The id of the user who owns the blueprint.
blueprint: list(dict) or list(UserBlueprintTask), Optional: The representation of a directed acyclic graph defining a pipeline of data through tasks and a final estimator.
vertex_context: list(VertexContextItem), Optional: Info about, warnings about, and errors with a specific vertex in the blueprint.
blueprint_context: VertexContextItemMessages: Warnings and errors which may describe or summarize warnings or errors in the blueprint’s vertices

classmethod list(limit=100, offset=0, project_id=None)¶

Fetch a list of the user blueprints the current user created

Parameters

limit: int (Default=100): The max number of results to return.
offset: int (Default=0): The number of results to skip (for pagination).
project_id: string, Optional: The id of the project, used to filter for original project_id.

Returns

list(UserBlueprint)

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: List[UserBlueprint]

classmethod get(user_blueprint_id, project_id=None)¶

Retrieve a user blueprint

Parameters

user_blueprint_id: string: Used to identify a specific user-owned blueprint.
project_id: string (optional, default is None): String representation of ObjectId for a given project. Used to validate selected columns in the user blueprint.

Returns

UserBlueprint

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: UserBlueprint

classmethod create(blueprint, model_type=None, project_id=None, save_to_catalog=True)¶

Create a user blueprint

Parameters

blueprint: list(dict) or list(UserBlueprintTask): A list of tasks in the form of dictionaries which define a blueprint.
model_type: string, Optional: The title to give to the blueprint.
project_id: string, Optional: The project associated with the blueprint. Necessary in the event of project specific tasks, such as column selection tasks.
save_to_catalog: bool, (Default=True): Whether the blueprint being created should be saved to the catalog.

Returns

UserBlueprint

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: UserBlueprint

classmethod create_from_custom_task_version_id(custom_task_version_id, save_to_catalog=True, description=None)¶

Create a user blueprint with a single custom task version

Parameters

custom_task_version_id: string: Id of custom task version from which the user blueprint is created
save_to_catalog: bool, (Default=True): Whether the blueprint being created should be saved to the catalog
description: string (Default=None): The description for the user blueprint that will be created from the custom task version.

Returns

UserBlueprint

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: UserBlueprint

classmethod clone_project_blueprint(blueprint_id, project_id, model_type=None, save_to_catalog=True)¶

Clone a blueprint from a project.

Parameters

blueprint_id: string: The id associated with the blueprint to create the user blueprint from.
model_type: string, Optional: The title to give to the blueprint.
project_id: string: The id of the project which the blueprint to copy comes from.
save_to_catalog: bool, (Default=True): Whether the blueprint being created should be saved to the catalog.

Returns

UserBlueprint

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: UserBlueprint

classmethod clone_user_blueprint(user_blueprint_id, model_type=None, project_id=None, save_to_catalog=True)¶

Clone a user blueprint.

Parameters

model_type: string, Optional: The title to give to the blueprint.
project_id: string, Optional: String representation of ObjectId for a given project. Used to validate selected columns in the user blueprint.
user_blueprint_id: string: The id of the existing user blueprint to copy.
save_to_catalog: bool, (Default=True): Whether the blueprint being created should be saved to the catalog.

Returns

UserBlueprint

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: UserBlueprint

classmethod update(blueprint, user_blueprint_id, model_type=None, project_id=None, include_project_id_if_none=False)¶

Update a user blueprint

Parameters

blueprint: list(dict) or list(UserBlueprintTask): A list of tasks in the form of dictionaries which define a blueprint. If None, will not be passed.
model_type: string, Optional: The title to give to the blueprint. If None, will not be passed.
project_id: string, Optional: The project associated with the blueprint. Necessary in the event of project specific tasks, such as column selection tasks. If None, will not be passed. To explicitly pass None, pass True to include_project_id_if_none (useful if unlinking a blueprint from a project)
user_blueprint_id: string: Used to identify a specific user-owned blueprint.
include_project_id_if_none: bool (Default=False): Allows project_id to be passed as None, instead of ignored. If set to False, will not pass project_id in the API request if it is set to None. If True, the project id will be passed even if it is set to None.

Returns

UserBlueprint

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: UserBlueprint

classmethod delete(user_blueprint_id)¶

Delete a user blueprint, specified by the userBlueprintId.

Parameters

user_blueprint_id: string: Used to identify a specific user-owned blueprint.

Returns

requests.models.Response

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: Response

classmethod get_input_types()¶

Retrieve the input types which can be used with User Blueprints.

Returns

UserBlueprintAvailableInput

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: UserBlueprintAvailableInput

classmethod add_to_project(project_id, user_blueprint_ids)¶

Add a list of user blueprints, by id, to a specified (by id) project’s repository.

Parameters

project_id: string: The projectId of the project for the repository to add the specified user blueprints to.
user_blueprint_ids: list(string) or string: The ids of the user blueprints to add to the specified project’s repository.

Returns

UserBlueprintAddToProjectMenu

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: UserBlueprintAddToProjectMenu

classmethod get_available_tasks(project_id=None, user_blueprint_id=None)¶

Retrieve the available tasks, organized into categories, which can be used to create or modify User Blueprints.

Parameters

project_id: string, Optional
user_blueprint_id: string, Optional

Returns

UserBlueprintAvailableTasks

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: UserBlueprintAvailableTasks

classmethod validate_task_parameters(output_method, task_code, task_parameters, project_id=None)¶

Validate that each value assigned to specified task parameters are valid.

Parameters

output_method: enum(‘P’, ‘Pm’, ‘S’, ‘Sm’, ‘T’, ‘TS’): The method representing how the task will output data.
task_code: string: The task code representing the task to validate parameter values.
task_parameters: list(UserBlueprintTaskParameterValidationRequestParamItem): A list of task parameters and proposed values to be validated.
project_id: string (optional, default is None): The projectId representing the project where this user blueprint is edited.

Returns

UserBlueprintValidateTaskParameters

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: UserBlueprintValidateTaskParameters

classmethod list_shared_roles(user_blueprint_id, limit=100, offset=0, id=None, name=None, share_recipient_type=None)¶

Get a list of users, groups and organizations that have an access to this user blueprint

Parameters

id: str, Optional: Only return the access control information for a organization, group or user with this ID.
limit: int (Default=100): At most this many results are returned.
name: string, Optional: Only return the access control information for a organization, group or user with this name.
offset: int (Default=0): This many results will be skipped.
share_recipient_type: enum(‘user’, ‘group’, ‘organization’), Optional: Describes the recipient type, either user, group, or organization.
user_blueprint_id: str: Used to identify a specific user-owned blueprint.

Returns

list(UserBlueprintSharedRolesResponseValidator)

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: List[UserBlueprintSharedRolesResponseValidator]

classmethod validate_blueprint(blueprint, project_id=None)¶

Validate a user blueprint and return information about the inputs expected and outputs provided by each task.

Parameters

blueprint: list(dict) or list(UserBlueprintTask): The representation of a directed acyclic graph defining a pipeline of data through tasks and a final estimator.
project_id: string (optional, default is None): The projectId representing the project where this user blueprint is edited.

Returns

list(VertexContextItem)

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: List[VertexContextItem]

classmethod update_shared_roles(user_blueprint_id, roles)¶

Share a user blueprint with a user, group, or organization

Parameters

user_blueprint_id: str: Used to identify a specific user-owned blueprint.
roles: list(or(GrantAccessControlWithUsernameValidator, GrantAccessControlWithIdValidator)): Array of GrantAccessControl objects., up to maximum 100 objects.

Returns

requests.models.Response

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: Response

classmethod search_catalog(search=None, tag=None, limit=100, offset=0, owner_user_id=None, owner_username=None, order_by='-created')¶

Fetch a list of the user blueprint catalog entries the current user has access to based on an optional search term, tags, owner user info, or sort order.

Parameters

search: string, Optional.: A value to search for in the dataset’s name, description, tags, column names, categories, and latest error. The search is case insensitive. If no value is provided for this parameter, or if the empty string is used, or if the string contains only whitespace, no filtering will be done. Partial matching is performed on dataset name and description fields while all other fields will only match if the search matches the whole value exactly.
tag: string, Optional.: If provided, the results will be filtered to include only items with the specified tag.
limit: int, Optional. (default: 0), at most this many results are returned. To specify no: limit, use 0. The default may change and a maximum limit may be imposed without notice.
offset: int, Optional. (default: 0), this many results will be skipped.
owner_user_id: string, Optional.: Filter results to those owned by one or more owner identified by UID.
owner_username: string, Optional.: Filter results to those owned by one or more owner identified by username.
order_by: string, Optional. Defaults to ‘-created’.: Sort order which will be applied to catalog list, valid options are “catalogName”, “originalName”, “description”, “created”, and “relevance”. For all options other than relevance, you may prefix the attribute name with a dash to sort in descending order. e.g. orderBy=’-catalogName’.

Return type: UserBlueprintCatalogSearch

class datarobot.models.user_blueprints.models.UserBlueprintAvailableInput(input_types, **kwargs)¶

Retrieve the input types which can be used with User Blueprints.

Parameters

input_types: list(UserBlueprintsInputType): A list of associated pairs of an input types and their human-readable names.

classmethod get_input_types()¶

Retrieve the input types which can be used with User Blueprints.

Returns

UserBlueprintAvailableInput

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: UserBlueprintAvailableInput

class datarobot.models.user_blueprints.models.UserBlueprintAddToProjectMenu(added_to_menu, not_added_to_menu=None, message=None, **kwargs)¶

Add a list of user blueprints, by id, to a specified (by id) project’s repository.

Parameters

added_to_menu: list(UserBlueprintAddedToMenuItem): The list of userBlueprintId and blueprintId pairs representing blueprints successfully added to the project repository.
not_added_to_menu: list(UserBlueprintNotAddedToMenuItem): The list of userBlueprintId and error message representing blueprints which failed to be added to the project repository.
message: string: A success message or a list of reasons why the list of blueprints could not be added to the project repository.

classmethod add_to_project(project_id, user_blueprint_ids)¶

Add a list of user blueprints, by id, to a specified (by id) project’s repository.

Parameters

project_id: string: The projectId of the project for the repository to add the specified user blueprints to.
user_blueprint_ids: list(string): The ids of the user blueprints to add to the specified project’s repository.

Returns

UserBlueprintAddToProjectMenu

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: UserBlueprintAddToProjectMenu

class datarobot.models.user_blueprints.models.UserBlueprintAvailableTasks(categories, tasks, **kwargs)¶

Retrieve the available tasks, organized into categories, which can be used to create or modify User Blueprints.

Parameters

categories: list(UserBlueprintTaskCategoryItem): A list of the available task categories, sub-categories, and tasks.
tasks: list(UserBlueprintTaskLookupEntry): A list of task codes and their task definitions.

classmethod get_available_tasks(project_id=None, user_blueprint_id=None)¶

Retrieve the available tasks, organized into categories, which can be used to create or modify User Blueprints.

Parameters

project_id: string, Optional
user_blueprint_id: string, Optional

Returns

UserBlueprintAvailableTasks

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: UserBlueprintAvailableTasks

class datarobot.models.user_blueprints.models.UserBlueprintValidateTaskParameters(errors, **kwargs)¶

Validate that each value assigned to specified task parameters are valid.

Parameters

errors: list(UserBlueprintsValidateTaskParameter): A list of the task parameters, their proposed values, and messages describing why each is not valid.

classmethod validate_task_parameters(output_method, task_code, task_parameters, project_id=None)¶

Validate that each value assigned to specified task parameters are valid.

Parameters

output_method: enum(‘P’, ‘Pm’, ‘S’, ‘Sm’, ‘T’, ‘TS’): The method representing how the task will output data.
task_code: string: The task code representing the task to validate parameter values.
task_parameters: list(UserBlueprintTaskParameterValidationRequestParamItem): A list of task parameters and proposed values to be validated.
project_id: string (optional, default is None): The projectId representing the project where this user blueprint is edited.

Returns

UserBlueprintValidateTaskParameters

Raises

datarobot.errors.ClientError: if the server responded with 4xx status
datarobot.errors.ServerError: if the server responded with 5xx status

Return type: UserBlueprintValidateTaskParameters

class datarobot.models.user_blueprints.models.UserBlueprintSharedRolesResponseValidator(id, name, role, share_recipient_type, **kwargs)¶

A list of SharedRoles objects.

Parameters

share_recipient_type: enum(‘user’, ‘group’, ‘organization’): Describes the recipient type, either user, group, or organization.
role: str, one of enum(‘CONSUMER’, ‘EDITOR’, ‘OWNER’): The role of the org/group/user on this dataset or “NO_ROLE” for removing access when used with route to modify access.
id: str: The ID of the recipient organization, group or user.
name: string: The name of the recipient organization, group or user.

class datarobot.models.user_blueprints.models.VertexContextItem(information, messages, task_id, **kwargs)¶

Info about, warnings about, and errors with a specific vertex in the blueprint.

Parameters

task_id: string: The id associated with a specific vertex in the blueprint.
information: VertexContextItemInfo
messages: VertexContextItemMessages

class datarobot.models.user_blueprints.models.UserBlueprintCatalogSearch(id, catalog_name, info_creator_full_name, user_blueprint_id, description=None, last_modifier_full_name=None, **kwargs)¶

An APIObject representing a user blueprint catalog entry the current user has access to based on an optional search term and/or tags.

Parameters

id: str: The ID of the catalog entry linked to the user blueprint.
catalog_name: str: The name of the user blueprint.
creator: str: The name of the user that created the user blueprint.
user_blueprint_id: str: The ID of the user blueprint.
description: str, Optional (Default=None): The description of the user blueprint.
last_modifier_name: str, Optional (Default=None): The name of the user that last modified the user blueprint.

classmethod search_catalog(search=None, tag=None, limit=100, offset=0, owner_user_id=None, owner_username=None, order_by='-created')¶

Fetch a list of the user blueprint catalog entries the current user has access to based on an optional search term, tags, owner user info, or sort order.

Parameters

search: string, Optional.: A value to search for in the dataset’s name, description, tags, column names, categories, and latest error. The search is case insensitive. If no value is provided for this parameter, or if the empty string is used, or if the string contains only whitespace, no filtering will be done. Partial matching is performed on dataset name and description fields while all other fields will only match if the search matches the whole value exactly.
tag: string, Optional.: If provided, the results will be filtered to include only items with the specified tag.
limit: int, Optional. (default: 0), at most this many results are returned. To specify no: limit, use 0. The default may change and a maximum limit may be imposed without notice.
offset: int, Optional. (default: 0), this many results will be skipped.
owner_user_id: string, Optional.: Filter results to those owned by one or more owner identified by UID.
owner_username: string, Optional.: Filter results to those owned by one or more owner identified by username.
order_by: string, Optional. Defaults to ‘-created’.: Sort order which will be applied to catalog list, valid options are “catalogName”, “originalName”, “description”, “created”, and “relevance”. For all options other than relevance, you may prefix the attribute name with a dash to sort in descending order. e.g. orderBy=’-catalogName’.

Return type: List[UserBlueprintCatalogSearch]

VisualAI¶

class datarobot.models.visualai.Image(image_id, project_id, height=0, width=0)¶

An image stored in a project’s dataset.

Attributes

idstr: Image ID for this image.
image_typestr: Image media type. Accessing this may require a server request and an associated delay in returning.
image_bytesbytes: Raw bytes of this image. Accessing this may require a server request and an associated delay in returning.
heightint: Height of the image in pixels.
widthint: Width of the image in pixels.

classmethod get(project_id, image_id)¶

Get a single image object from project.

Parameters

project_idstr: Id of the project that contains the images.
image_idstr: ID of image to load from the project.

Return type: Image

class datarobot.models.visualai.SampleImage(project_id, image_id, height, width, target_value=None)¶

A sample image in a project’s dataset.

If Project.stage is datarobot.enums.PROJECT_STAGE.EDA2 then the target_* attributes of this class will have values, otherwise the values will all be None.

Attributes

imageImage: Image object.
target_valueTargetValue: Value associated with the feature_name.
project_idstr: Id of the project that contains the images.

classmethod list(project_id, feature_name, target_value=None, target_bin_start=None, target_bin_end=None, offset=None, limit=None)¶

Get sample images from a project.

Parameters

project_idstr: Project that contains the images.
feature_namestr: Name of feature column that contains images.
target_valueTargetValue: For classification projects - target value to filter images. Please note that you can only use this parameter when the project has finished the EDA2 stage.
target_bin_startOptional[Union[int, float]]: For regression projects - only images corresponding to the target values above (inclusive) this value will be returned. Must be specified together with target_bin_end. Please note that you can only use this parameter when the project has finished the EDA2 stage.
target_bin_endOptional[Union[int, float]]: For regression projects - only images corresponding to the target values below (exclusive) this value will be returned. Must be specified together with target_bin_start. Please note that you can only use this parameter when the project has finished the EDA2 stage.
offsetOptional[int]: Number of images to be skipped.
limitOptional[int]: Number of images to be returned.

Return type: List[SampleImage]

class datarobot.models.visualai.DuplicateImage(image_id, row_count, project_id)¶

An image that was duplicated in the project dataset.

Attributes

imageImage: Image object.
countint: Number of times the image was duplicated.

classmethod list(project_id, feature_name, offset=None, limit=None)¶

Get all duplicate images in a project.

Parameters

project_idstr: Project that contains the images.
feature_namestr: Name of feature column that contains images.
offsetOptional[int]: Number of images to be skipped.
limitOptional[int]: Number of images to be returned.

Return type: List[DuplicateImage]

class datarobot.models.visualai.ImageEmbedding(feature_name, position_x, position_y, image_id, project_id, model_id, actual_target_value=None, target_values=None, target_bins=None)¶

Vector representation of an image in an embedding space.

A vector in an embedding space will allow linear computations to be carried out between images: for example computing the Euclidean distance of the images.

Attributes

imageImage: Image object used to create this map.
feature_namestr: Name of the feature column this embedding is associated with.
position_xint: X coordinate of the image in the embedding space.
position_yint: Y coordinate of the image in the embedding space.
actual_target_valueobject: Actual target value of the dataset row.
target_valuesOptional[List[str]]: For classification projects, a list of target values of this project.
target_binsOptional[List[Dict[str, float]]]: For regression projects, a list of target bins of this project.
project_idstr: Id of the project this Image Embedding belongs to.
model_idstr: Id of the model this Image Embedding belongs to.

classmethod compute(project_id, model_id)¶

Start the computation of image embeddings for the model.

Parameters

project_idstr: Project to start creation in.
model_idstr: Project’s model to start creation in.

Returns

str: URL to check for image embeddings progress.

Raises

datarobot.errors.ClientError: Server rejected creation due to client error. Most likely cause is bad project_id or model_id.

Return type: str

classmethod models(project_id)¶

For a given project_id, list all model_id - feature_name pairs with available Image Embeddings.

Parameters

project_idstr: Id of the project to list model_id - feature_name pairs with available Image Embeddings for.

Returns

list( tuple(model_id, feature_name) ): List of model and feature name pairs.

Return type: List[Tuple[str, str]]

classmethod list(project_id, model_id, feature_name)¶

Return a list of ImageEmbedding objects.

Parameters

project_id: str: Id of the project the model belongs to.
model_id: str: Id of the model to list Image Embeddings for.
feature_name: str: Name of feature column to list Image Embeddings for.

Return type: List[ImageEmbedding]

class datarobot.models.visualai.ImageActivationMap(feature_name, activation_values, image_width, image_height, image_id, overlay_image_id, project_id, model_id, actual_target_value=None, predicted_target_value=None, target_values=None, target_bins=None)¶

Mark areas of image with weight of impact on training.

This is a technique to display how various areas of the region were used in training, and their effect on predictions. Larger values in activation_values indicates a larger impact.

Attributes

imageImage: Image object used to create this map.
overlay_imageImage: Image object containing the original image overlaid by the activation heatmap.
feature_namestr: Name of the feature column that contains the value this map is based on.
activation_valuesList[List[int]]: A row-column matrix that contains the activation strengths for image regions. Values are integers in the range [0, 255].
actual_target_valueTargetValue: Actual target value of the dataset row.
predicted_target_valueTargetValue: Predicted target value of the dataset row that contains this image.
target_valuesOptional[List[str]]: For classification projects a list of target values of this project.
target_binsOptional[List[Dict[str, float]]]: For regression projects a list of target bins.
project_idstr: Id of the project this Activation Map belongs to.
model_idstr: Id of the model this Activation Map belongs to.

classmethod compute(project_id, model_id)¶

Start the computation of activation maps for the given model.

Parameters

project_idstr: Project to start creation in.
model_idstr: Project’s model to start creation in.

Returns

str: URL to check for image embeddings progress.

Raises

datarobot.errors.ClientError: Server rejected creation due to client error. Most likely cause is bad project_id or model_id.

Return type: str

classmethod models(project_id)¶

For a given project_id, list all model_id - feature_name pairs with available Image Activation Maps.

Parameters

project_idstr: Id of the project to list model_id - feature_name pairs with available Image Activation Maps for.

Returns

list( tuple(model_id, feature_name) ): List of model and feature name pairs.

Return type: List[Tuple[str, str]]

classmethod list(project_id, model_id, feature_name, offset=None, limit=None)¶

Return a list of ImageActivationMap objects.

Parameters

project_idstr: Project that contains the images.
model_idstr: Model that contains the images.
feature_namestr: Name of feature column that contains images.
offsetOptional[int]: Number of images to be skipped.
limitOptional[int]: Number of images to be returned.

Return type: List[ImageActivationMap]

class datarobot.models.visualai.ImageAugmentationOptions(id, name, project_id, min_transformation_probability, current_transformation_probability, max_transformation_probability, min_number_of_new_images, current_number_of_new_images, max_number_of_new_images, transformations=None)¶

A List of all supported Image Augmentation Transformations for a project. Includes additional information about minimum, maximum, and default values for a transformation.

Attributes

name: string: The name of the augmentation list
project_id: string: The project containing the image data to be augmented
min_transformation_probability: float: The minimum allowed value for transformation probability.
current_transformation_probability: float: Default setting for probability that each transformation will be applied to an image.
max_transformation_probability: float: The maximum allowed value for transformation probability.
min_number_of_new_images: int: The minimum allowed number of new rows to add for each existing row
current_number_of_new_images: int: The default number of new rows to add for each existing row
max_number_of_new_images: int: The maximum allowed number of new rows to add for each existing row
transformations: list[dict]: List of transformations to possibly apply to each image

classmethod get(project_id)¶

Returns a list of all supported transformations for the given project

Parameters

project_id (str) – sting The id of the project for which to return the list of supported transformations.

Return type

ImageAugmentationOptions

Returns

ImageAugmentationOptions: A list containing all the supported transformations for the project.

class datarobot.models.visualai.ImageAugmentationList(id, name, project_id, feature_name=None, in_use=False, initial_list=False, transformation_probability=0.0, number_of_new_images=1, transformations=None, samples_id=None)¶

A List of Image Augmentation Transformations

Attributes

name: string: The name of the augmentation list
project_id: string: The project containing the image data to be augmented
feature_name: string (optional): name of the feature that the augmentation list is associated with
in_use: boolean: Whether this is the list that will passed in to every blueprint during blueprint generation before autopilot
initial_list: boolean: True if this is the list to be used during training to produce augmentations
transformation_probability: float: Probability that each transformation will be applied to an image. Value should be between 0.01 - 1.0.
number_of_new_images: int: Number of new rows to add for each existing row
transformations: array: List of transformations to possibly apply to each image
samples_id: str: Id of last image augmentation sample generated for image augmentation list.

classmethod create(name, project_id, feature_name=None, in_use=None, initial_list=False, transformation_probability=0.0, number_of_new_images=1, transformations=None, samples_id=None)¶

create a new image augmentation list

Return type: ImageAugmentationList

classmethod list(project_id, feature_name=None)¶

List Image Augmentation Lists present in a project.

Parameters

project_idstr: Project Id to retrieve augmentation lists for.
feature_nameOptional[str]: If passed, the response will only include Image Augmentation Lists active for the provided feature name.

Returns

list[ImageAugmentationList]

Return type: List[ImageAugmentationList]

update(name=None, feature_name=None, initial_list=None, transformation_probability=None, number_of_new_images=None, transformations=None)¶

Update one or multiple attributes of the Image Augmentation List in the DataRobot backend as well on this object.

Parameters

nameOptional[str]: New name of the feature list.
feature_nameOptional[str]: The new feature name for which the Image Augmentation List is effective.
initial_listOptional[bool]: New flag that indicates whether this list will be used during Autopilot to perform image augmentation.
transformation_probabilityOptional[float]: New probability that each enabled transformation will be applied to an image. This does not apply to Horizontal or Vertical Flip, which are always set to 50%.
number_of_new_imagesOptional[int]: New number of new rows to add for each existing row, updating the existing augmentation list.
transformationsOptional[list]: New list of Transformations to possibly apply to each image.

Returns

ImageAugmentationList: Reference to self. The passed values will be updated in place.

Return type: ImageAugmentationList

retrieve_samples()¶

Lists already computed image augmentation sample for image augmentation list. Returns samples only if they have been already computed. It does not initialize computation.

Returns

List of class ImageAugmentationSample

Return type: List[ImageAugmentationSample]

compute_samples(max_wait=600)¶

Initializes computation and retrieves list of image augmentation samples for image augmentation list. If samples exited prior to this call method, this will compute fresh samples and return latest version of samples.

Returns

List of class ImageAugmentationSample

Return type: List[ImageAugmentationSample]

class datarobot.models.visualai.ImageAugmentationSample(image_id, project_id, height, width, original_image_id=None, sample_id=None)¶

A preview of the type of images that augmentations will create during training.

Attributes

sample_idObjectId: The id of the augmentation sample, used to group related images together
image_idObjectId: A reference to the Image which can be used to retrieve the image binary
project_idObjectId: A reference to the project containing the image
original_image_idObjectId: A reference to the original image that generated this image in the case of an augmented image. If this is None it signifies this is an original image
heightint: Image height in pixels
widthint: Image width in pixels

classmethod list(auglist_id=None)¶

Return a list of ImageAugmentationSample objects.

Parameters

auglist_id: str: ID for augmentation list to retrieve samples for

Returns

List of class ImageAugmentationSample

Return type: List[ImageAugmentationSample]

Word Cloud¶

class datarobot.models.word_cloud.WordCloud(ngrams)¶

Word cloud data for the model.

Notes

WordCloudNgram is a dict containing the following:

ngram (str) Word or ngram value.

coefficient (float) Value from [-1.0, 1.0] range, describes effect of this ngram on the target. Large negative value means strong effect toward negative class in classification and smaller target value in regression models. Large positive - toward positive class and bigger value respectively.

count (int) Number of rows in the training sample where this ngram appears.

frequency (float) Value from (0.0, 1.0] range, relative frequency of given ngram to most frequent ngram.

is_stopword (bool) True for ngrams that DataRobot evaluates as stopwords.

class (str or None) For classification - values of the target class for corresponding word or ngram. For regression - None.

Attributes

ngramslist of dicts: List of dicts with schema described as WordCloudNgram above.

most_frequent(top_n=5)¶

Return most frequent ngrams in the word cloud.

Parameters

top_nint: Number of ngrams to return

Returns

list of dict: Up to top_n top most frequent ngrams in the word cloud. If top_n bigger then total number of ngrams in word cloud - return all sorted by frequency in descending order.

Return type: List[WordCloudNgram]

most_important(top_n=5)¶

Return most important ngrams in the word cloud.

Parameters

top_nint: Number of ngrams to return

Returns

list of dict: Up to top_n top most important ngrams in the word cloud. If top_n bigger then total number of ngrams in word cloud - return all sorted by absolute coefficient value in descending order.

Return type: List[WordCloudNgram]

ngrams_per_class()¶

Split ngrams per target class values. Useful for multiclass models.

Returns

dict: Dictionary in the format of (class label) -> (list of ngrams for that class)

Return type: Dict[Optional[str], List[WordCloudNgram]]

class datarobot.models.word_cloud.WordCloudNgram() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶

Data Slices¶

class datarobot.models.data_slice.DataSlice(id=None, name=None, filters=None, project_id=None)¶

Definition of a data slice

Attributes

idstr

ID of the data slice.

namestr

Name of the data slice definition.

filterslist[DataSliceFiltersType]

List of filters (dict) with params:

operandstr
Name of the feature to use in the filter.
operatorstr
Operator to use in the filter: ‘eq’, ‘in’, ‘<’, or ‘>’.
valuesUnion[str, int, float]
Values to use from the feature.

project_idstr

ID of the project that the model is part of.

classmethod list(project, offset=0, limit=100)¶

List the data slices in the same project

Parameters

projectUnion[str, Project]: ID of the project or Project object from which to list data slices.
offsetint, optional: Number of items to skip.
limitint, optional: Number of items to return.

Returns

data_sliceslist[DataSlice]

Examples

>>> import datarobot as dr
>>> ...  # set up your Client
>>> data_slices = dr.DataSlice.list("646d0ea0cd8eb2355a68b0e5")
>>> data_slices
[DataSlice(...), DataSlice(...), ...]

Return type: List[DataSlice]

classmethod create(name, filters, project)¶

Creates a data slice in the project with the given name and filters

Parameters

namestr

Name of the data slice definition.

filterslist[DataSliceFiltersType]

List of filters (dict) with params:

operandstr
Name of the feature to use in filter.
operatorstr
Operator to use: ‘eq’, ‘in’, ‘<’, or ‘>’.
valuesUnion[str, int, float]
Values to use from the feature.

projectUnion[str, Project]

Project ID or Project object from which to list data slices.

Returns

data_sliceDataSlice: The data slice object created

Examples

>>> import datarobot as dr
>>> ...  # set up your Client and retrieve a project
>>> data_slice = dr.DataSlice.create(
>>> ...    name='yes',
>>> ...    filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}],
>>> ...    project=project,
>>> ...  )
>>> data_slice
DataSlice(
    filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}],
    id=646d1296bd0c543d88923c9d,
    name=yes,
    project_id=646d0ea0cd8eb2355a68b0e5
)

Return type: DataSlice

delete()¶

Deletes the data slice from storage

Examples

>>> import datarobot as dr
>>> data_slice = dr.DataSlice.get('5a8ac9ab07a57a0001be501f')
>>> data_slice.delete()

>>> import datarobot as dr
>>> ... # get project or project_id
>>> data_slices = dr.DataSlice.list(project)  # project object or project_id
>>> data_slice = data_slices[0]  # choose a data slice from the list
>>> data_slice.delete()

Return type: None

request_size(source, model=None)¶

Submits a request to validate the data slice’s filters and calculate the data slice’s number of rows on a given source

Parameters

sourceINSIGHTS_SOURCES: Subset of data (partition or “source”) on which to apply the data slice for estimating available rows.
modelOptional[Union[str, Model]]: Model object or ID of the model. It is only required when source is “training”.

Returns

status_check_jobStatusCheckJob: Object contains all needed logic for a periodical status check of an async job.

Examples

>>> import datarobot as dr
>>> ... # get project or project_id
>>> data_slices = dr.DataSlice.list(project)  # project object or project_id
>>> data_slice = data_slices[0]  # choose a data slice from the list
>>> status_check_job = data_slice.request_size("validation")

Model is required when source is ‘training’

>>> import datarobot as dr
>>> ... # get project or project_id
>>> data_slices = dr.DataSlice.list(project)  # project object or project_id
>>> data_slice = data_slices[0]  # choose a data slice from the list
>>> status_check_job = data_slice.request_size("training", model)

Return type: StatusCheckJob

get_size_info(source, model=None)¶

Get information about the data slice applied to a source

Parameters

sourceINSIGHTS_SOURCES: Source (partition or subset) to which the data slice was applied
modelOptional[Union[str, Model]]: ID for the model whose training data was sliced with this data slice. Required when the source is “training”, and not used for other sources.

Returns

slice_size_infoDataSliceSizeInfo: Information of the data slice applied to a source

Examples

>>> import datarobot as dr
>>> ...  # set up your Client
>>> data_slices = dr.DataSlice.list("646d0ea0cd8eb2355a68b0e5")
>>> data_slice = slices[0]  # can be any slice in the list
>>> data_slice_size_info = data_slice.get_size_info("validation")
>>> data_slice_size_info
DataSliceSizeInfo(
    data_slice_id=6493a1776ea78e6644382535,
    messages=[
        {
            'level': 'WARNING',
            'description': 'Low Observation Count',
            'additional_info': 'Insufficient number of observations to compute some insights.'
        }
    ],
    model_id=None,
    project_id=646d0ea0cd8eb2355a68b0e5,
    slice_size=1,
    source=validation,
)
>>> data_slice_size_info.to_dict()
{
    'data_slice_id': '6493a1776ea78e6644382535',
    'messages': [
        {
            'level': 'WARNING',
            'description': 'Low Observation Count',
            'additional_info': 'Insufficient number of observations to compute some insights.'
        }
    ],
    'model_id': None,
    'project_id': '646d0ea0cd8eb2355a68b0e5',
    'slice_size': 1,
    'source': 'validation',
}

>>> import datarobot as dr
>>> ...  # set up your Client
>>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535")
>>> data_slice_size_info = data_slice.get_size_info("validation")

When using source=’training’, the model param is required.

>>> import datarobot as dr
>>> ...  # set up your Client
>>> model = dr.Model.get(project_id, model_id)
>>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535")
>>> data_slice_size_info = data_slice.get_size_info("training", model)

>>> import datarobot as dr
>>> ...  # set up your Client
>>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535")
>>> data_slice_size_info = data_slice.get_size_info("training", model_id)

Return type: DataSliceSizeInfo

classmethod get(data_slice_id)¶

Retrieve a specific data slice.

Parameters

data_slice_idstr: The identifier of the data slice to retrieve.

Returns

data_slice: DataSlice: The required data slice.

Examples

>>> import datarobot as dr
>>> dr.DataSlice.get('648b232b9da812a6aaa0b7a9')
DataSlice(filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}],
          id=648b232b9da812a6aaa0b7a9,
          name=test,
          project_id=644bc575572480b565ca42cd
          )

Return type: DataSlice

class datarobot.models.data_slice.DataSliceSizeInfo(data_slice_id=None, project_id=None, source=None, slice_size=None, messages=None, model_id=None)¶

Definition of a data slice applied to a source

Attributes

data_slice_idstr: ID of the data slice
project_idstr: ID of the project
sourcestr: Data source used to calculate the number of rows (slice size) after applying the data slice’s filters
model_idstr, optional: ID of the model, required when source (subset) is ‘training’
slice_sizeint: Number of rows in the data slice for a given source
messageslist[DataSliceSizeMessageType]: List of user-relevant messages related to a data slice

Batch Job¶

class datarobot.models.batch_job.IntakeSettings(*args, **kwargs)¶: Intake settings typed dict

class datarobot.models.batch_job.OutputSettings(*args, **kwargs)¶: Output settings typed dict

Document text extraction¶

class datarobot.models.documentai.document.FeaturesWithSamples(model_id, feature_name, document_task)¶

property document_task¶: Alias for field number 2

property feature_name¶: Alias for field number 1

property model_id¶: Alias for field number 0

class datarobot.models.documentai.document.DocumentPageFile(document_page_id, project_id=None, height=0, width=0, download_link=None)¶

Page of a document as an image file.

Attributes

project_idstr: The identifier of the project which the document page belongs to.
document_page_idstr: The unqique identifer for the document page.
heightint: The height of the document thumbnail in pixels.
widthint: The width of the document thumbnail in pixels.
thumbnail_bytesbytes: Document thumbnail as bytes.
mime_typestr: Mime image type of the document thumbnail.

property thumbnail_bytes: bytes¶

Document thumbnail as bytes.

Returns

bytes: Document thumbnail.

Return type: bytes

property mime_type: str¶

Mime image type of the document thumbnail. Example: ‘image/png’

Returns

str: Mime image type of the document thumbnail.

Return type: str

class datarobot.models.documentai.document.DocumentThumbnail(project_id, document_page_id, height=0, width=0, target_value=None)¶

Thumbnail of document from the project’s dataset.

If Project.stage is datarobot.enums.PROJECT_STAGE.EDA2 and it is a supervised project then the target_* attributes of this class will have values, otherwise the values will all be None.

Attributes

document: Document: The document object.
project_idstr: The identifier of the project which the document thumbnail belongs to.
target_value: str: The target value used for filtering thumbnails.

classmethod list(project_id, feature_name, target_value=None, offset=None, limit=None)¶

Get document thumbnails from a project.

Parameters

project_idstr: The identifier of the project which the document thumbnail belongs to.
feature_namestr: The name of feature that specifies the document type.
target_valueOptional[str], default None: The target value to filter thumbnails.
offsetOptional[int], default None: The number of documents to be skipped.
limitOptional[int], default None: The number of document thumbnails to return.

Returns

documentsList[DocumentThumbnail]: A list of DocumentThumbnail objects, each representing a single document.

Notes

Actual document thumbnails are not fetched from the server by this method. Instead the data gets loaded lazily when DocumentPageFile object attributes are accessed.

Examples

Fetch document thumbnails for the given project_id and feature_name.

from datarobot._experimental.models.documentai.document import DocumentThumbnail

# Fetch five documents from the EDA SAMPLE for the specified project and specific feature
document_thumbs = DocumentThumbnail.list(project_id, feature_name, limit=5)

# Fetch five documents for the specified project with target value filtering
# This option is only available after selecting the project target and starting modeling
target1_thumbs = DocumentThumbnail.list(project_id, feature_name, target_value='target1', limit=5)

Preview the document thumbnail.

from datarobot._experimental.models.documentai.document import DocumentThumbnail
from datarobot.helpers.image_utils import get_image_from_bytes

# Fetch 3 documents
document_thumbs = DocumentThumbnail.list(project_id, feature_name, limit=3)

for doc_thumb in document_thumbs:
    thumbnail = get_image_from_bytes(doc_thumb.document.thumbnail_bytes)
    thumbnail.show()

Return type: List[DocumentThumbnail]

class datarobot.models.documentai.document.DocumentTextExtractionSample¶

Stateless class for computing and retrieving Document Text Extraction Samples.

Notes

Actual document text extraction samples are not fetched from the server in the moment of a function call. Detailed information on the documents, the pages and the rendered images of them are fetched when accessed on demand (lazy loading).

Examples

1) Compute text extraction samples for a specific model, and fetch all existing document text extraction samples for a specific project.

from datarobot._experimental.models.documentai.document import DocumentTextExtractionSample

SPECIFIC_MODEL_ID1 = "model_id1"
SPECIFIC_MODEL_ID2 = "model_id2"
SPECIFIC_PROJECT_ID = "project_id"

# Order computation of document text extraction sample for specific model.
# By default `compute` method will await for computation to end before returning
DocumentTextExtractionSample.compute(SPECIFIC_MODEL_ID1, await_completion=False)
DocumentTextExtractionSample.compute(SPECIFIC_MODEL_ID2)

samples = DocumentTextExtractionSample.list_features_with_samples(SPECIFIC_PROJECT_ID)

2) Fetch document text extraction samples for a specific model_id and feature_name, and display all document sample pages.

from datarobot._experimental.models.documentai.document import DocumentTextExtractionSample
from datarobot.helpers.image_utils import get_image_from_bytes

SPECIFIC_MODEL_ID = "model_id"
SPECIFIC_FEATURE_NAME = "feature_name"

samples = DocumentTextExtractionSample.list_pages(
    model_id=SPECIFIC_MODEL_ID,
    feature_name=SPECIFIC_FEATURE_NAME
)
for sample in samples:
    thumbnail = sample.document_page.thumbnail
    image = get_image_from_bytes(thumbnail.thumbnail_bytes)
    image.show()

3) Fetch document text extraction samples for specific model_id and feature_name and display text extraction details for the first page. This example displays the image of the document with bounding boxes of detected text lines. It also returns a list of all text lines extracted from page along with their coordinates.

from datarobot._experimental.models.documentai.document import DocumentTextExtractionSample

SPECIFIC_MODEL_ID = "model_id"
SPECIFIC_FEATURE_NAME = "feature_name"

samples = DocumentTextExtractionSample.list_pages(SPECIFIC_MODEL_ID, SPECIFIC_FEATURE_NAME)
# Draw bounding boxes for first document page sample and display related text data.
image = samples[0].get_document_page_with_text_locations()
image.show()
# For each text block represented as bounding box object drawn on original image
# display its coordinates (top, left, bottom, right) and extracted text value
for text_line in samples[0].text_lines:
    print(text_line)

classmethod compute(model_id, await_completion=True, max_wait=600)¶

Starts computation of document text extraction samples for the model and, if successful, returns computed text samples for it. This method allows calculation to continue for a specified time and, if not complete, cancels the request.

Parameters

model_id: str: The identifier of the project’s model that start the creation of the cluster insights.
await_completion: bool: Determines whether the method should wait for completion before exiting or not.
max_wait: int (default=600): The maximum number of seconds to wait for the request to finish before raising an AsyncTimeoutError.

Raises

ClientError: Server rejected creation due to client error. Often, a bad model_id is causing these errors.
AsyncFailureError: Indicates whether any of the responses from the server are unexpected.
AsyncProcessUnsuccessfulError: Indicates whether the cluster insights computation failed or was cancelled.
AsyncTimeoutError: Indicates whether the cluster insights computation did not resolve within the specified time limit (max_wait).

Return type: None

classmethod list_features_with_samples(project_id)¶

Returns a list of features, model_id pairs with computed document text extraction samples.

Parameters

project_id: str: The project ID to retrieve the list of computed samples for.

Returns

List[FeaturesWithSamples]

Return type: List[FeaturesWithSamples]

classmethod list_pages(model_id, feature_name, document_index=None, document_task=None)¶

Returns a list of document text extraction sample pages.

Parameters

model_id: str: The model identifier.
feature_name: str: The specific feature name to retrieve.
document_index: Optional[int]: The specific document index to retrieve. Defaults to None.
document_task: Optional[str]: The document blueprint task.

Returns

List[DocumentTextExtractionSamplePage]

Return type: List[DocumentTextExtractionSamplePage]

classmethod list_documents(model_id, feature_name)¶

Returns a list of documents used for text extraction.

Parameters

model_id: str: The model identifier.
feature_name: str: The feature name.

Returns

List[DocumentTextExtractionSampleDocument]

Return type: List[DocumentTextExtractionSampleDocument]

class datarobot.models.documentai.document.DocumentTextExtractionSampleDocument(document_index, feature_name, thumbnail_id, thumbnail_width, thumbnail_height, thumbnail_link, document_task, actual_target_value=None, prediction=None)¶

Document text extraction source.

Holds data that contains feature and model prediction values, as well as the thumbnail of the document.

Attributes

document_index: int: The index of the document page sample.
feature_name: str: The name of the feature that the document text extraction sample is related to.
thumbnail_id: str: The document page ID.
thumbnail_width: int: The thumbnail image width.
thumbnail_height: int: The thumbnail image height.
thumbnail_link: str: The thumbnail image download link.
document_task: str: The document blueprint task that the document belongs to.
actual_target_value: Optional[Union[str, int, List[str]]]: The actual target value.
prediction: Optional[PredictionType]: Prediction values and labels.

classmethod list(model_id, feature_name, document_task=None)¶

List available documents with document text extraction samples.

Parameters

model_id: str: The identifier for the model.
feature_name: str: The name of the feature,
document_task: Optional[str]: The document blueprint task.

Returns

List[DocumentTextExtractionSampleDocument]

Return type: List[DocumentTextExtractionSampleDocument]

class datarobot.models.documentai.document.DocumentTextExtractionSamplePage(page_index, document_index, feature_name, document_page_id, document_page_width, document_page_height, document_page_link, text_lines, document_task, actual_target_value=None, prediction=None)¶

Document text extraction sample covering one document page.

Holds data about the document page, the recognized text, and the location of the text in the document page.

Attributes

page_index: int: Index of the page inside the document
document_index: int: Index of the document inside the dataset
feature_name: str: The name of the feature that the document text extraction sample belongs to.
document_page_id: str: The document page ID.
document_page_width: int: Document page width.
document_page_height: int: Document page height.
document_page_link: str: Document page link to download the document page image.
text_lines: List[Dict[str, Union[int, str]]]: A list of text lines and their coordinates.
document_task: str: The document blueprint task that the page belongs to.
actual_target_value: Optional[Union[str, int, List[str]]: Actual target value.
prediction: Optional[PredictionType]: Prediction values and labels.

classmethod list(model_id, feature_name, document_index=None, document_task=None)¶

Returns a list of document text extraction sample pages.

Parameters

model_id: str: The model identifier, used to retrieve document text extraction page samples.
feature_name: str: The feature name, used to retrieve document text extraction page samples.
document_index: Optional[int]: The specific document index to retrieve. Defaults to None.
document_task: Optional[str]: Document blueprint task.

Returns

List[DocumentTextExtractionSamplePage]

Return type: List[DocumentTextExtractionSamplePage]

get_document_page_with_text_locations(line_color='blue', line_width=3, padding=3)¶

Returns the document page with bounding boxes drawn around the text lines as a PIL.Image.

Parameters

line_color: str: The color used to draw a bounding box on the image page. Defaults to blue.
line_width: int: The line width of the bounding boxes that will be drawn. Defaults to 3.
padding: int: The additional space left between the text and the bounding box, measured in pixels. Defaults to 3.

Returns

Image: Returns a PIL.Image with drawn text-bounding boxes.

Return type: Image