API Reference

API Object

class datarobot.models.api_object.APIObject
classmethod from_data(data)

Instantiate an object of this class using a dict.

Parameters
datadict

Correctly snake_cased keys and their values.

Return type

TypeVar(T, bound= APIObject)

classmethod from_server_data(data, keep_attrs=None)

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters
datadict

The directly translated dict of JSON from the server. No casing fixes have taken place

keep_attrsiterable

List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

Return type

TypeVar(T, bound= APIObject)

Advanced Options

class datarobot.helpers.AdvancedOptions(weights=None, response_cap=None, blueprint_threshold=None, seed=None, smart_downsampled=None, majority_downsampling_rate=None, offset=None, exposure=None, accuracy_optimized_mb=None, scaleout_modeling_mode=None, events_count=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, only_include_monotonic_blueprints=None, allowed_pairwise_interaction_groups=None, blend_best_models=None, scoring_code_only=None, prepare_model_for_deployment=None, consider_blenders_in_recommendation=None, min_secondary_validation_model_count=None, shap_only_mode=None, autopilot_data_sampling_method=None, run_leakage_removed_feature_list=None, autopilot_with_feature_discovery=False, feature_discovery_supervised_feature_reduction=None, exponentially_weighted_moving_alpha=None, external_time_series_baseline_dataset_id=None, use_supervised_feature_reduction=True, primary_location_column=None, protected_features=None, preferable_target_value=None, fairness_metrics_set=None, fairness_threshold=None, bias_mitigation_feature_name=None, bias_mitigation_technique=None, include_bias_mitigation_feature_as_predictor_variable=None, default_monotonic_increasing_featurelist_id=None, default_monotonic_decreasing_featurelist_id=None, model_group_id=None, model_regime_id=None, model_baselines=None, incremental_learning_only_mode=None, incremental_learning_on_best_model=None, chunk_definition_id=None, incremental_learning_early_stopping_rounds=None)

Used when setting the target of a project to set advanced options of modeling process.

Parameters
weightsstring, optional

The name of a column indicating the weight of each row

response_capbool or float in [0.5, 1), optional

Defaults to none here, but server defaults to False. If specified, it is the quantile of the response distribution to use for response capping.

blueprint_thresholdint, optional

Number of hours models are permitted to run before being excluded from later autopilot stages Minimum 1

seedint, optional

a seed to use for randomization

smart_downsampledbool, optional

whether to use smart downsampling to throw away excess rows of the majority class. Only applicable to classification and zero-boosted regression projects.

majority_downsampling_ratefloat, optional

the percentage between 0 and 100 of the majority rows that should be kept. Specify only if using smart downsampling. May not cause the majority class to become smaller than the minority class.

offsetlist of str, optional

(New in version v2.6) the list of the names of the columns containing the offset of each row

exposurestring, optional

(New in version v2.6) the name of a column containing the exposure of each row

accuracy_optimized_mbbool, optional

(New in version v2.6) Include additional, longer-running models that will be run by the autopilot and available to run manually.

scaleout_modeling_modestring, optional

(Deprecated in 2.28. Will be removed in 2.30) DataRobot no longer supports scaleout models. Please remove any usage of this parameter as it will be removed from the API soon.

events_countstring, optional

(New in version v2.8) the name of a column specifying events count.

monotonic_increasing_featurelist_idstring, optional

(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overridden at model submission time if desired.

monotonic_decreasing_featurelist_idstring, optional

(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overridden at model submission time if desired.

only_include_monotonic_blueprintsbool, optional

(new in version 2.11) when true, only blueprints that support enforcing monotonic constraints will be available in the project or selected for the autopilot.

allowed_pairwise_interaction_groupslist of tuple, optional

(New in version v2.19) For GA2M models - specify groups of columns for which pairwise interactions will be allowed. E.g. if set to [(A, B, C), (C, D)] then GA2M models will allow interactions between columns A x B, B x C, A x C, C x D. All others (A x D, B x D) will not be considered.

blend_best_models: bool, optional

(New in version v2.19) blend best models during Autopilot run.

scoring_code_only: bool, optional

(New in version v2.19) Keep only models that can be converted to scorable java code during Autopilot run

shap_only_mode: bool, optional

(New in version v2.21) Keep only models that support SHAP values during Autopilot run. Use SHAP-based insights wherever possible. Defaults to False.

prepare_model_for_deployment: bool, optional

(New in version v2.19) Prepare model for deployment during Autopilot run. The preparation includes creating reduced feature list models, retraining best model on higher sample size, computing insights and assigning “RECOMMENDED FOR DEPLOYMENT” label.

consider_blenders_in_recommendation: bool, optional

(New in version 2.22.0) Include blenders when selecting a model to prepare for deployment in an Autopilot Run. Defaults to False.

min_secondary_validation_model_count: int, optional

(New in version v2.19) Compute “All backtest” scores (datetime models) or cross validation scores for the specified number of the highest ranking models on the Leaderboard, if over the Autopilot default.

autopilot_data_sampling_method: str, optional

(New in version v2.23) one of datarobot.enums.DATETIME_AUTOPILOT_DATA_SAMPLING_METHOD. Applicable for OTV projects only, defines if autopilot uses “random” or “latest” sampling when iteratively building models on various training samples. Defaults to “random” for duration-based projects and to “latest” for row-based projects.

run_leakage_removed_feature_list: bool, optional

(New in version v2.23) Run Autopilot on Leakage Removed feature list (if exists).

autopilot_with_feature_discovery: bool, default ``False``, optional

(New in version v2.23) If true, autopilot will run on a feature list that includes features found via search for interactions.

feature_discovery_supervised_feature_reduction: bool, optional

(New in version v2.23) Run supervised feature reduction for feature discovery projects.

exponentially_weighted_moving_alpha: float, optional

(New in version v2.26) defaults to None, value between 0 and 1 (inclusive), indicates alpha parameter used in exponentially weighted moving average within feature derivation window.

external_time_series_baseline_dataset_id: str, optional

(New in version v2.26) If provided, will generate metrics scaled by external model predictions metric for time series projects. The external predictions catalog must be validated before autopilot starts, see Project.validate_external_time_series_baseline and external baseline predictions documentation for further explanation.

use_supervised_feature_reduction: bool, default ``True` optional

Time Series only. When true, during feature generation DataRobot runs a supervised algorithm to retain only qualifying features. Setting to false can severely impact autopilot duration, especially for datasets with many features.

primary_location_column: str, optional.

The name of primary location column.

protected_features: list of str, optional.

(New in version v2.24) A list of project features to mark as protected for Bias and Fairness testing calculations. Max number of protected features allowed is 10.

preferable_target_value: str, optional.

(New in version v2.24) A target value that should be treated as a favorable outcome for the prediction. For example, if we want to check gender discrimination for giving a loan and our target is named is_bad, then the positive outcome for the prediction would be No, which means that the loan is good and that’s what we treat as a favorable result for the loaner.

fairness_metrics_set: str, optional.

(New in version v2.24) Metric to use for calculating fairness. Can be one of proportionalParity, equalParity, predictionBalance, trueFavorableAndUnfavorableRateParity or favorableAndUnfavorablePredictiveValueParity. Used and required only if Bias & Fairness in AutoML feature is enabled.

fairness_threshold: str, optional.

(New in version v2.24) Threshold value for the fairness metric. Can be in a range of [0.0, 1.0]. If the relative (i.e. normalized) fairness score is below the threshold, then the user will see a visual indication on the

bias_mitigation_feature_namestr, optional

The feature from protected features that will be used in a bias mitigation task to mitigate bias

bias_mitigation_techniquestr, optional

One of datarobot.enums.BiasMitigationTechnique Options: - ‘preprocessingReweighing’ - ‘postProcessingRejectionOptionBasedClassification’ The technique by which we’ll mitigate bias, which will inform which bias mitigation task we insert into blueprints

include_bias_mitigation_feature_as_predictor_variablebool, optional

Whether we should also use the mitigation feature as in input to the modeler just like any other categorical used for training, i.e. do we want the model to “train on” this feature in addition to using it for bias mitigation

default_monotonic_increasing_featurelist_idstr, optional

Returned from server on Project GET request - not able to be updated by user

default_monotonic_decreasing_featurelist_idstr, optional

Returned from server on Project GET request - not able to be updated by user

model_group_id: Optional[str] = None,

(New in version v3.3) The name of a column containing the model group ID for each row.

model_regime_id: Optional[str] = None,

(New in version v3.3) The name of a column containing the model regime ID for each row.

model_baselines: Optional[List[str]] = None,

(New in version v3.3) The list of the names of the columns containing the model baselines for each row.

incremental_learning_only_mode: Optional[bool] = None,

(New in version v3.4) Keep only models that support incremental learning during Autopilot run.

incremental_learning_on_best_model: Optional[bool] = None,

(New in version v3.4) Run incremental learning on the best model during Autopilot run.

chunk_definition_idstring, optional

(New in version v3.4) Unique definition for chunks needed to run automated incremental learning.

incremental_learning_early_stopping_roundsOptional[int] = None

(New in version v3.4) Early stopping rounds used in the automated incremental learning service.

Examples

import datarobot as dr
advanced_options = dr.AdvancedOptions(
    weights='weights_column',
    offset=['offset_column'],
    exposure='exposure_column',
    response_cap=0.7,
    blueprint_threshold=2,
    smart_downsampled=True, majority_downsampling_rate=75.0)
get(_AdvancedOptions__key, _AdvancedOptions__default=None)

Return the value for key if key is in the dictionary, else default.

Return type

Optional[Any]

pop(_AdvancedOptions__key)

If key is not found, d is returned if given, otherwise KeyError is raised

Return type

Optional[Any]

update_individual_options(**kwargs)

Update individual attributes of an instance of AdvancedOptions.

Return type

None

Anomaly Assessment

class datarobot.models.anomaly_assessment.AnomalyAssessmentRecord(status, status_details, start_date, end_date, prediction_threshold, preview_location, delete_location, latest_explanations_location, **record_kwargs)

Object which keeps metadata about anomaly assessment insight for the particular subset, backtest and series and the links to proceed to get the anomaly assessment data.

New in version v2.25.

Notes

Record contains:

  • record_id : the ID of the record.

  • project_id : the project ID of the record.

  • model_id : the model ID of the record.

  • backtest : the backtest of the record.

  • source : the source of the record.

  • series_id : the series id of the record for the multiseries projects.

  • status : the status of the insight.

  • status_details : the explanation of the status.

  • start_date : the ISO-formatted timestamp of the first prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.

  • end_date : the ISO-formatted timestamp of the last prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.

  • prediction_threshold : the threshold, all rows with anomaly scores greater or equal to it have shap explanations computed. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.

  • preview_location : URL to retrieve predictions preview for the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.

  • latest_explanations_location : the URL to retrieve the latest predictions with the shap explanations. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.

  • delete_location : the URL to delete anomaly assessment record and relevant insight data.

Attributes
record_id: str

The ID of the record.

project_id: str

The ID of the project record belongs to.

model_id: str

The ID of the model record belongs to.

backtest: int or “holdout”

The backtest of the record.

source: “training” or “validation”

The source of the record

series_id: str or None

The series id of the record for the multiseries projects. Defined only for the multiseries projects.

status: str

The status of the insight. One of datarobot.enums.AnomalyAssessmentStatus

status_details: str

The explanation of the status.

start_date: str or None

See start_date info in Notes for more details.

end_date: str or None

See end_date info in Notes for more details.

prediction_threshold: float or None

See prediction_threshold info in Notes for more details.

preview_location: str or None

See preview_location info in Notes for more details.

latest_explanations_location: str or None

See latest_explanations_location info in Notes for more details.

delete_location: str

The URL to delete anomaly assessment record and relevant insight data.

classmethod list(project_id, model_id, backtest=None, source=None, series_id=None, limit=100, offset=0, with_data_only=False)

Retrieve the list of the anomaly assessment records for the project and model. Output can be filtered and limited.

Parameters
project_id: str

The ID of the project record belongs to.

model_id: str

The ID of the model record belongs to.

backtest: int or “holdout”

The backtest to filter records by.

source: “training” or “validation”

The source to filter records by.

series_id: str, optional

The series id to filter records by. Can be specified for multiseries projects.

limit: int, optional

100 by default. At most this many results are returned.

offset: int, optional

This many results will be skipped.

with_data_only: bool, False by default

Filter by status == AnomalyAssessmentStatus.COMPLETED. If True, records with no data or not supported will be omitted.

Returns
AnomalyAssessmentRecord

The anomaly assessment record.

Return type

List[AnomalyAssessmentRecord]

classmethod compute(project_id, model_id, backtest, source, series_id=None)

Request anomaly assessment insight computation on the specified subset.

Parameters
project_id: str

The ID of the project to compute insight for.

model_id: str

The ID of the model to compute insight for.

backtest: int or “holdout”

The backtest to compute insight for.

source: “training” or “validation”

The source to compute insight for.

series_id: str, optional

The series id to compute insight for. Required for multiseries projects.

Returns
AnomalyAssessmentRecord

The anomaly assessment record.

Return type

AnomalyAssessmentRecord

delete()

Delete anomaly assessment record with preview and explanations.

Return type

None

get_predictions_preview()

Retrieve aggregated predictions statistics for the anomaly assessment record.

Returns
AnomalyAssessmentPredictionsPreview
Return type

AnomalyAssessmentPredictionsPreview

get_latest_explanations()

Retrieve latest predictions along with shap explanations for the most anomalous records.

Returns
AnomalyAssessmentExplanations
Return type

AnomalyAssessmentExplanations

get_explanations(start_date=None, end_date=None, points_count=None)

Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.

Parameters
start_date: str, optional

The start of the date range to get explanations in. Example: 2020-01-01T00:00:00.000000Z

end_date: str, optional

The end of the date range to get explanations in. Example: 2020-10-01T00:00:00.000000Z

points_count: int, optional

The number of the rows to return.

Returns
AnomalyAssessmentExplanations
Return type

AnomalyAssessmentExplanations

get_explanations_data_in_regions(regions, prediction_threshold=0.0)

Get predictions along with explanations for the specified regions, sorted by predictions in descending order.

Parameters
regions: list of preview_bins

For each region explanations will be retrieved and merged.

prediction_threshold: float, optional

If specified, only points with score greater or equal to the threshold will be returned.

Returns
dict in a form of {‘explanations’: explanations, ‘shap_base_value’: shap_base_value}
Return type

RegionExplanationsData

class datarobot.models.anomaly_assessment.AnomalyAssessmentExplanations(shap_base_value, data, start_date, end_date, count, **record_kwargs)

Object which keeps predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points.

New in version v2.25.

Notes

AnomalyAssessmentExplanations contains:

  • record_id : the id of the corresponding anomaly assessment record.

  • project_id : the project ID of the corresponding anomaly assessment record.

  • model_id : the model ID of the corresponding anomaly assessment record.

  • backtest : the backtest of the corresponding anomaly assessment record.

  • source : the source of the corresponding anomaly assessment record.

  • series_id : the series id of the corresponding anomaly assessment record for the multiseries projects.

  • start_date : the ISO-formatted first timestamp in the response. Will be None of there is no data in the specified range.

  • end_date : the ISO-formatted last timestamp in the response. Will be None of there is no data in the specified range.

  • count : The number of points in the response.

  • shap_base_value : the shap base value.

  • data : list of DataPoint objects in the specified date range.

DataPoint contains:

  • shap_explanation : None or an array of up to 10 ShapleyFeatureContribution objects. Only rows with the highest anomaly scores have Shapley explanations calculated. Value is None if prediction is lower than prediction_threshold.

  • timestamp (str) : ISO-formatted timestamp for the row.

  • prediction (float) : The output of the model for this row.

ShapleyFeatureContribution contains:

  • feature_value (str) : the feature value for this row. First 50 characters are returned.

  • strength (float) : the shap value for this feature and row.

  • feature (str) : the feature name.

Attributes
record_id: str

The ID of the record.

project_id: str

The ID of the project record belongs to.

model_id: str

The ID of the model record belongs to.

backtest: int or “holdout”

The backtest of the record.

source: “training” or “validation”

The source of the record.

series_id: str or None

The series id of the record for the multiseries projects. Defined only for the multiseries projects.

start_date: str or None

The ISO-formatted datetime of the first row in the data.

end_date: str or None

The ISO-formatted datetime of the last row in the data.

data: array of `data_point` objects or None

See data info in Notes for more details.

shap_base_value: float

Shap base value.

count: int

The number of points in the data.

classmethod get(project_id, record_id, start_date=None, end_date=None, points_count=None)

Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.

Parameters
project_id: str

The ID of the project.

record_id: str

The ID of the anomaly assessment record.

start_date: str, optional

The start of the date range to get explanations in. Example: 2020-01-01T00:00:00.000000Z

end_date: str, optional

The end of the date range to get explanations in. Example: 2020-10-01T00:00:00.000000Z

points_count: int, optional

The number of the rows to return.

Returns
AnomalyAssessmentExplanations
Return type

AnomalyAssessmentExplanations

class datarobot.models.anomaly_assessment.AnomalyAssessmentPredictionsPreview(start_date, end_date, preview_bins, **record_kwargs)

Aggregated predictions over time for the corresponding anomaly assessment record. Intended to find the bins with highest anomaly scores.

New in version v2.25.

Notes

AnomalyAssessmentPredictionsPreview contains:

  • record_id : the id of the corresponding anomaly assessment record.

  • project_id : the project ID of the corresponding anomaly assessment record.

  • model_id : the model ID of the corresponding anomaly assessment record.

  • backtest : the backtest of the corresponding anomaly assessment record.

  • source : the source of the corresponding anomaly assessment record.

  • series_id : the series id of the corresponding anomaly assessment record for the multiseries projects.

  • start_date : the ISO-formatted timestamp of the first prediction in the subset.

  • end_date : the ISO-formatted timestamp of the last prediction in the subset.

  • preview_bins : list of PreviewBin objects. The aggregated predictions for the subset. Bins boundaries may differ from actual start/end dates because this is an aggregation.

PreviewBin contains:

  • start_date (str) : the ISO-formatted datetime of the start of the bin.

  • end_date (str) : the ISO-formatted datetime of the end of the bin.

  • avg_predicted (float or None) : the average prediction of the model in the bin. None if there are no entries in the bin.

  • max_predicted (float or None) : the maximum prediction of the model in the bin. None if there are no entries in the bin.

  • frequency (int) : the number of the rows in the bin.

Attributes
record_id: str

The ID of the record.

project_id: str

The ID of the project record belongs to.

model_id: str

The ID of the model record belongs to.

backtest: int or “holdout”

The backtest of the record.

source: “training” or “validation”

The source of the record

series_id: str or None

The series id of the record for the multiseries projects. Defined only for the multiseries projects.

start_date: str

the ISO-formatted timestamp of the first prediction in the subset.

end_date: str

the ISO-formatted timestamp of the last prediction in the subset.

preview_bins: list of preview_bin objects.

The aggregated predictions for the subset. See more info in Notes.

classmethod get(project_id, record_id)

Retrieve aggregated predictions over time.

Parameters
project_id: str

The ID of the project.

record_id: str

The ID of the anomaly assessment record.

Returns
AnomalyAssessmentPredictionsPreview
Return type

AnomalyAssessmentPredictionsPreview

find_anomalous_regions(max_prediction_threshold=0.0)
Sort preview bins by max_predicted value and select those with max predicted value

greater or equal to max prediction threshold. Sort the result by max predicted value in descending order.

Parameters
max_prediction_threshold: float, optional

Return bins with maximum anomaly score greater or equal to max_prediction_threshold.

Returns
preview_bins: list of preview_bin

Filtered and sorted preview bins

Return type

List[AnomalyAssessmentPreviewBin]

Application

class datarobot.Application(id, application_type_id, user_id, model_deployment_id, name, created_by, created_at, updated_at, datasets, cloud_provider, deployment_ids, pool_used, permissions, has_custom_logo, org_id, deployment_status_id=None, description=None, related_entities=None, application_template_type=None, deployment_name=None, deactivation_status_id=None, created_first_name=None, creator_last_name=None, creator_userhash=None, deployments=None)

An entity associated with a DataRobot Application.

Attributes
idstr

The ID of the created application.

application_type_idstr

The ID of the type of the application.

user_idstr

The ID of the user which created the application.

model_deployment_idstr

The ID of the associated model deployment.

deactivation_status_idstr or None

The ID of the status object to track the asynchronous app deactivation process status. Will be None if the app was never deactivated.

namestr

The name of the application.

created_bystr

The username of the user created the application.

created_atstr

The timestamp when the application was created.

updated_atstr

The timestamp when the application was updated.

datasetsList[str]

The list of datasets IDs associated with the application.

creator_first_nameOptional[str]

Application creator first name. Optional.

creator_last_nameOptional[str]

Application creator last name. Optional.

creator_userhashOptional[str]

Application creator userhash. Optional.

deployment_status_idstr

The ID of the status object to track the asynchronous deployment process status.

descriptionstr

A description of the application.

cloud_providerstr

The host of this application.

deploymentsOptional[List[ApplicationDeployment]]

A list of deployment details. Optional.

deployment_idsList[str]

A list of deployment IDs for this app.

deployment_nameOptional[str]

Name of the deployment. Optional.

application_template_typeOptional[str]

Application template type, purpose. Optional.

pool_usedbool

Whether the pool where used for last app deployment.

permissionsList[str]

The list of permitted actions, which the authenticated user can perform on this application. Permissions should be ApplicationPermission options.

has_custom_logobool

Whether the app has a custom logo.

related_entitiesOptional[ApplcationRelatedEntity]

IDs of entities, related to app for easy search.

org_idstr

ID of the app’s organization.

classmethod list(offset=None, limit=None, use_cases=None)

Retrieve a list of user applications.

Parameters
offsetOptional[int]

Optional. Retrieve applications in a list after this number.

limitOptional[int]

Optional. Retrieve only this number of applications.

use_cases: Optional[Union[UseCase, List[UseCase], str, List[str]]]

Optional. Filter available Applications by a specific Use Case or Use Cases. Accepts either the entity or the ID. If set to [None], the method filters the application’s datasets by those not linked to a UseCase.

Returns
applicationsList[Application]

The requested list of user applications.

Return type

List[Application]

classmethod get(application_id)

Retrieve a single application.

Parameters
application_idstr

The ID of the application to retrieve.

Returns
applicationApplication

The requested application.

Return type

Application

Batch Predictions

class datarobot.models.BatchPredictionJob(data, completed_resource_url=None)

A Batch Prediction Job is used to score large data sets on prediction servers using the Batch Prediction API.

Attributes
idstr

the id of the job

classmethod score(deployment, intake_settings=None, output_settings=None, csv_settings=None, timeseries_settings=None, num_concurrent=None, chunk_size=None, passthrough_columns=None, passthrough_columns_set=None, max_explanations=None, max_ngram_explanations=None, explanation_algorithm=None, threshold_high=None, threshold_low=None, prediction_threshold=None, prediction_warning_enabled=None, include_prediction_status=False, skip_drift_tracking=False, prediction_instance=None, abort_on_error=True, column_names_remapping=None, include_probabilities=True, include_probabilities_classes=None, download_timeout=120, download_read_timeout=660, upload_read_timeout=600, explanations_mode=None)

Create new batch prediction job, upload the scoring dataset and return a batch prediction job.

The default intake and output options are both localFile which requires the caller to pass the file parameter and either download the results using the download() method afterwards or pass a path to a file where the scored data will be downloaded to afterwards.

Returns
BatchPredictionJob

Instance of BatchPredictionJob

Attributes
deploymentDeployment or string ID

Deployment which will be used for scoring.

intake_settingsdict (optional)

A dict configuring how data is coming from. Supported options:

  • type : string, either localFile, s3, azure, gcp, dataset, jdbc snowflake, synapse or bigquery

Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a dr.Dataset object.

To score from a local file, add the this parameter to the settings:

  • file : file-like object, string path to file or a pandas.DataFrame of scoring data

To score from S3, add the next parameters to the settings:

  • url : string, the URL to score (e.g.: s3://bucket/key)

  • credential_id : string (optional)

  • endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default)

To score from JDBC, add the next parameters to the settings:

  • data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).

  • query : string (optional if table, schema and/or catalog is specified), a self-supplied SELECT statement of the data set you wish to predict.

  • table : string (optional if query is specified), the name of specified database table.

  • schema : string (optional if query is specified), the name of specified database schema.

  • catalog : string (optional if query is specified), (new in v2.22) the name of specified database catalog.

  • fetch_size : int (optional), Changing the fetchSize can be used to balance throughput and memory usage.

  • credential_id : string (optional) the ID of the credentials holding information about a user with read-access to the JDBC data source (see Credentials).

output_settingsdict (optional)

A dict configuring how scored data is to be saved. Supported options:

  • type : string, either localFile, s3, azure, gcp, jdbc, snowflake, synapse or bigquery

To save scored data to a local file, add this parameters to the settings:

  • path : string (optional), path to save the scored data as CSV. If a path is not specified, you must download the scored data yourself with job.download(). If a path is specified, the call will block until the job is done. if there are no other jobs currently processing for the targeted prediction instance, uploading, scoring, downloading will happen in parallel without waiting for a full job to complete. Otherwise, it will still block, but start downloading the scored data as soon as it starts generating data. This is the fastest method to get predictions.

To save scored data to S3, add the next parameters to the settings:

  • url : string, the URL for storing the results (e.g.: s3://bucket/key)

  • credential_id : string (optional)

  • endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default)

To save scored data to JDBC, add the next parameters to the settings:

  • data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).

  • table : string, the name of specified database table.

  • schema : string (optional), the name of specified database schema.

  • catalog : string (optional), (new in v2.22) the name of specified database catalog.

  • statement_type : string, the type of insertion statement to create, one of datarobot.enums.AVAILABLE_STATEMENT_TYPES.

  • update_columns : list(string) (optional), a list of strings containing those column names to be updated in case statement_type is set to a value related to update or upsert.

  • where_columns : list(string) (optional), a list of strings containing those column names to be selected in case statement_type is set to a value related to insert or update.

  • credential_id : string, the ID of the credentials holding information about a user with write-access to the JDBC data source (see Credentials).

  • create_table_if_not_exists : bool (optional), If no existing table is detected, attempt to create it before writing data with the strategy defined in the statementType parameter.

csv_settingsdict (optional)

CSV intake and output settings. Supported options:

  • delimiter : string (optional, default ,), fields are delimited by this character. Use the string tab to denote TSV (TAB separated values). Must be either a one-character string or the string tab.

  • quotechar : string (optional, default ), fields containing the delimiter must be quoted using this character.

  • encoding : string (optional, default utf-8), encoding for the CSV files. For example (but not limited to): shift_jis, latin_1 or mskanji.

timeseries_settingsdict (optional)

Configuration for time-series scoring. Supported options:

  • type : string, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point or rows in the dataset without target. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.

  • forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions, by default value will be inferred from the dataset. May be passed if timeseries_settings.type=forecast.

  • predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.

  • predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.

  • relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

num_concurrentint (optional)

Number of concurrent chunks to score simultaneously. Defaults to the available number of cores of the deployment. Lower it to leave resources for real-time scoring.

chunk_sizestring or int (optional)

Which strategy should be used to determine the chunk size. Can be either a named strategy or a fixed size in bytes. - auto: use fixed or dynamic based on flipper - fixed: use 1MB for explanations, 5MB for regular requests - dynamic: use dynamic chunk sizes - int: use this many bytes per chunk

passthrough_columnslist[string] (optional)

Keep these columns from the scoring dataset in the scored dataset. This is useful for correlating predictions with source data.

passthrough_columns_setstring (optional)

To pass through every column from the scoring dataset, set this to all. Takes precedence over passthrough_columns if set.

max_explanationsint (optional)

Compute prediction explanations for this amount of features.

max_ngram_explanationsint or str (optional)

Compute text explanations for this amount of ngrams. Set to all to return all ngram explanations, or set to a positive integer value to limit the amount of ngram explanations returned. By default no ngram explanations will be computed and returned.

threshold_highfloat (optional)

Only compute prediction explanations for predictions above this threshold. Can be combined with threshold_low.

threshold_lowfloat (optional)

Only compute prediction explanations for predictions below this threshold. Can be combined with threshold_high.

explanations_modePredictionExplanationsMode, optional

Mode of prediction explanations calculation for multiclass and clustering models, if not specified - server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).

prediction_warning_enabledboolean (optional)

Add prediction warnings to the scored data. Currently only supported for regression models.

include_prediction_statusboolean (optional)

Include the prediction_status column in the output, defaults to False.

skip_drift_trackingboolean (optional)

Skips drift tracking on any predictions made from this job. This is useful when running non-production workloads to not affect drift tracking and cause unnecessary alerts. Defaults to False.

prediction_instancedict (optional)

Defaults to instance specified by deployment or system configuration. Supported options:

  • hostName : string

  • sslEnabled : boolean (optional, default true). Set to false to run prediction requests from the batch prediction job without SSL.

  • datarobotKey : string (optional), if running a job against a prediction instance in the Managed AI Cloud, you must provide the organization level DataRobot-Key

  • apiKey : string (optional), by default, prediction requests will use the API key of the user that created the job. This allows you to make requests on behalf of other users.

abort_on_errorboolean (optional)

Default behavior is to abort the job if too many rows fail scoring. This will free up resources for other jobs that may score successfully. Set to false to unconditionally score every row no matter how many errors are encountered. Defaults to True.

column_names_remappingdict (optional)

Mapping with column renaming for output table. Defaults to {}.

include_probabilitiesboolean (optional)

Flag that enables returning of all probability columns. Defaults to True.

include_probabilities_classeslist (optional)

List the subset of classes if a user doesn’t want all the classes. Defaults to [].

download_timeoutint (optional)

New in version 2.22.

If using localFile output, wait this many seconds for the download to become available. See download().

download_read_timeoutint (optional, default 660)

New in version 2.22.

If using localFile output, wait this many seconds for the server to respond between chunks.

upload_read_timeout: int (optional, default 600)

New in version 2.28.

If using localFile intake, wait this many seconds for the server to respond after whole dataset upload.

prediction_threshold: float (optional)

New in version 3.4.0.

Threshold is the point that sets the class boundary for a predicted value. The model classifies an observation below the threshold as FALSE, and an observation above the threshold as TRUE. In other words, DataRobot automatically assigns the positive class label to any prediction exceeding the threshold. This value can be set between 0.0 and 1.0.

Return type

BatchPredictionJob

classmethod apply_time_series_data_prep_and_score(deployment, intake_settings, timeseries_settings, **kwargs)

Prepare the dataset with time series data prep, create new batch prediction job, upload the scoring dataset, and return a batch prediction job.

The supported intake_settings are of type localFile or dataset.

For timeseries_settings of type forecast the forecast_point must be specified.

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

New in version v3.1.

Returns
BatchPredictionJob

Instance of BatchPredictionJob

Raises
InvalidUsageError

If the deployment does not support time series data prep. If the intake type is not supported for time series data prep.

Attributes
deploymentDeployment

Deployment which will be used for scoring.

intake_settingsdict

A dict configuring where data is coming from. Supported options:

  • type : string, either localFile, dataset

Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a Dataset object.

To score from a local file, add this parameter to the settings:

  • file : file-like object, string path to file or a pandas.DataFrame of scoring data.

timeseries_settingsdict

Configuration for time-series scoring. Supported options:

  • type : string, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.

  • forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions. Must be passed if timeseries_settings.type=forecast.

  • predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.

  • predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.

  • relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

Return type

BatchPredictionJob

classmethod score_to_file(deployment, intake_path, output_path, **kwargs)

Create new batch prediction job, upload the scoring dataset and download the scored CSV file concurrently.

Will block until the entire file is scored.

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Returns
BatchPredictionJob

Instance of BatchPredictionJob

Attributes
deploymentDeployment or string ID

Deployment which will be used for scoring.

intake_pathfile-like object/string path to file/pandas.DataFrame

Scoring data

output_pathstr

Filename to save the result under

classmethod apply_time_series_data_prep_and_score_to_file(deployment, intake_path, output_path, timeseries_settings, **kwargs)

Prepare the input dataset with time series data prep. Then, create a new batch prediction job using the prepared AI catalog item as input and concurrently download the scored CSV file.

The function call will return when the entire file is scored.

For timeseries_settings of type forecast the forecast_point must be specified.

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

New in version v3.1.

Returns
BatchPredictionJob

Instance of BatchPredictionJob.

Raises
InvalidUsageError

If the deployment does not support time series data prep.

Attributes
deploymentDeployment

The deployment which will be used for scoring.

intake_pathfile-like object/string path to file/pandas.DataFrame

The scoring data.

output_pathstr

The filename under which you save the result.

timeseries_settingsdict

Configuration for time-series scoring. Supported options:

  • type : string, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.

  • forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions. Must be passed if timeseries_settings.type=forecast.

  • predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.

  • predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.

  • relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

Return type

BatchPredictionJob

classmethod score_s3(deployment, source_url, destination_url, credential=None, endpoint_url=None, **kwargs)

Create new batch prediction job, with a scoring dataset from S3 and writing the result back to S3.

This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job)

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Returns
BatchPredictionJob

Instance of BatchPredictionJob

Attributes
deploymentDeployment or string ID

Deployment which will be used for scoring.

source_urlstring

The URL for the prediction dataset (e.g.: s3://bucket/key)

destination_urlstring

The URL for the scored dataset (e.g.: s3://bucket/key)

credentialstring or Credential (optional)

The AWS Credential object or credential id

endpoint_urlstring (optional)

Any non-default endpoint URL for S3 access (omit to use the default)

classmethod score_azure(deployment, source_url, destination_url, credential=None, **kwargs)

Create new batch prediction job, with a scoring dataset from Azure blob storage and writing the result back to Azure blob storage.

This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job).

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Returns
BatchPredictionJob

Instance of BatchPredictionJob

Attributes
deploymentDeployment or string ID

Deployment which will be used for scoring.

source_urlstring

The URL for the prediction dataset (e.g.: https://storage_account.blob.endpoint/container/blob_name)

destination_urlstring

The URL for the scored dataset (e.g.: https://storage_account.blob.endpoint/container/blob_name)

credentialstring or Credential (optional)

The Azure Credential object or credential id

classmethod score_gcp(deployment, source_url, destination_url, credential=None, **kwargs)

Create new batch prediction job, with a scoring dataset from Google Cloud Storage and writing the result back to one.

This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job).

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Returns
BatchPredictionJob

Instance of BatchPredictionJob

Attributes
deploymentDeployment or string ID

Deployment which will be used for scoring.

source_urlstring

The URL for the prediction dataset (e.g.: http(s)://storage.googleapis.com/[bucket]/[object])

destination_urlstring

The URL for the scored dataset (e.g.: http(s)://storage.googleapis.com/[bucket]/[object])

credentialstring or Credential (optional)

The GCP Credential object or credential id

classmethod score_from_existing(batch_prediction_job_id)

Create a new batch prediction job based on the settings from a previously created one

Returns
BatchPredictionJob

Instance of BatchPredictionJob

Attributes
batch_prediction_job_id: str

ID of the previous batch prediction job

Return type

BatchPredictionJob

classmethod score_pandas(deployment, df, read_timeout=660, **kwargs)

Run a batch prediction job, with a scoring dataset from a pandas dataframe. The output from the prediction will be joined to the passed DataFrame and returned.

Use columnNamesRemapping to drop or rename columns in the output

This method blocks until the job has completed or raises an exception on errors.

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Returns
BatchPredictionJob

Instance of BatchPredictonJob

pandas.DataFrame

The original dataframe merged with the predictions

Attributes
deploymentDeployment or string ID

Deployment which will be used for scoring.

dfpandas.DataFrame

The dataframe to score

Return type

Tuple[BatchPredictionJob, DataFrame]

classmethod score_with_leaderboard_model(model, intake_settings=None, output_settings=None, csv_settings=None, timeseries_settings=None, passthrough_columns=None, passthrough_columns_set=None, max_explanations=None, max_ngram_explanations=None, explanation_algorithm=None, threshold_high=None, threshold_low=None, prediction_threshold=None, prediction_warning_enabled=None, include_prediction_status=False, abort_on_error=True, column_names_remapping=None, include_probabilities=True, include_probabilities_classes=None, download_timeout=120, download_read_timeout=660, upload_read_timeout=600, explanations_mode=None)

Creates a new batch prediction job for a Leaderboard model by uploading the scoring dataset. Returns a batch prediction job.

The default intake and output options are both localFile, which requires the caller to pass the file parameter and either download the results using the download() method afterwards or pass a path to a file where the scored data will be downloaded to.

Returns
BatchPredictionJob

Instance of BatchPredictionJob

Attributes
modelModel or DatetimeModel or string ID

Model which will be used for scoring.

intake_settingsdict (optional)

A dict configuring how data is coming from. Supported options:

  • type : string, either localFile, dataset, or dss.

Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a dr.Dataset object.

To score from a local file, add the this parameter to the settings:

  • file : file-like object, string path to file or a pandas.DataFrame of scoring data.

To score subset of training data, use dss intake type and specify following parameters:

  • project_id : project to fetch training data from. Access to project is required.

  • partition : subset of training data to score, one of datarobot.enums.TrainingDataSubsets.

output_settingsdict (optional)

A dict configuring how scored data is to be saved. Supported options:

  • type : string, localFile

To save scored data to a local file, add this parameters to the settings:

  • path : string (optional) The path to save the scored data as a CSV file. If a path is not specified, you must download the scored data yourself with job.download(). If a path is specified, the call is blocked until the job is done. If there are no other jobs currently processing for the targeted prediction instance, uploading, scoring, and downloading will happen in parallel without waiting for a full job to complete. Otherwise, it will still block, but start downloading the scored data as soon as it starts generating data. This is the fastest method to get predictions.

csv_settingsdict (optional)

CSV intake and output settings. Supported options:

  • delimiter : string (optional, default ,), fields are delimited by this character. Use the string tab to denote TSV (TAB separated values). Must be either a one-character string or the string tab.

  • quotechar : string (optional, default ), fields containing the delimiter must be quoted using this character.

  • encoding : string (optional, default utf-8), encoding for the CSV files. For example (but not limited to): shift_jis, latin_1 or mskanji.

timeseries_settingsdict (optional)

Configuration for time-series scoring. Supported options:

  • type : string, must be forecast, historical (default if not passed is forecast), or training. forecast mode makes predictions using forecast_point or rows in the dataset without target. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range. training mode is a special case for predictions on subsets of training data. Note, that it must be used in conjunction with dss intake type only.

  • forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions, by default value will be inferred from the dataset. May be passed if timeseries_settings.type=forecast.

  • predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.

  • predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.

  • relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

passthrough_columnslist[string] (optional)

Keep these columns from the scoring dataset in the scored dataset. This is useful for correlating predictions with source data.

passthrough_columns_setstring (optional)

To pass through every column from the scoring dataset, set this to all. Takes precedence over passthrough_columns if set.

max_explanationsint (optional)

Compute prediction explanations for this amount of features.

max_ngram_explanationsint or str (optional)

Compute text explanations for this amount of ngrams. Set to all to return all ngram explanations, or set to a positive integer value to limit the amount of ngram explanations returned. By default no ngram explanations will be computed and returned.

threshold_highfloat (optional)

Only compute prediction explanations for predictions above this threshold. Can be combined with threshold_low.

threshold_lowfloat (optional)

Only compute prediction explanations for predictions below this threshold. Can be combined with threshold_high.

explanations_modePredictionExplanationsMode, optional

Mode of prediction explanations calculation for multiclass and clustering models, if not specified - server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).

prediction_warning_enabledboolean (optional)

Add prediction warnings to the scored data. Currently only supported for regression models.

include_prediction_statusboolean (optional)

Include the prediction_status column in the output, defaults to False.

abort_on_errorboolean (optional)

Default behavior is to abort the job if too many rows fail scoring. This will free up resources for other jobs that may score successfully. Set to false to unconditionally score every row no matter how many errors are encountered. Defaults to True.

column_names_remappingdict (optional)

Mapping with column renaming for output table. Defaults to {}.

include_probabilitiesboolean (optional)

Flag that enables returning of all probability columns. Defaults to True.

include_probabilities_classeslist (optional)

List the subset of classes if you do not want all the classes. Defaults to [].

download_timeoutint (optional)

New in version 2.22.

If using localFile output, wait this many seconds for the download to become available. See download().

download_read_timeoutint (optional, default 660)

New in version 2.22.

If using localFile output, wait this many seconds for the server to respond between chunks.

upload_read_timeout: int (optional, default 600)

New in version 2.28.

If using localFile intake, wait this many seconds for the server to respond after whole dataset upload.

prediction_threshold: float (optional)

New in version 3.4.0.

Threshold is the point that sets the class boundary for a predicted value. The model classifies an observation below the threshold as FALSE, and an observation above the threshold as TRUE. In other words, DataRobot automatically assigns the positive class label to any prediction exceeding the threshold. This value can be set between 0.0 and 1.0.

Return type

BatchPredictionJob

classmethod get(batch_prediction_job_id)

Get batch prediction job

Returns
BatchPredictionJob

Instance of BatchPredictionJob

Attributes
batch_prediction_job_id: str

ID of batch prediction job

Return type

BatchPredictionJob

download(fileobj, timeout=120, read_timeout=660)

Downloads the CSV result of a prediction job

Attributes
fileobj: A file-like object where the CSV prediction results will be

written to. Examples include an in-memory buffer (e.g., io.BytesIO) or a file on disk (opened for binary writing).

timeoutint (optional, default 120)

New in version 2.22.

Seconds to wait for the download to become available.

The download will not be available before the job has started processing. In case other jobs are occupying the queue, processing may not start immediately.

If the timeout is reached, the job will be aborted and RuntimeError is raised.

Set to -1 to wait infinitely.

read_timeoutint (optional, default 660)

New in version 2.22.

Seconds to wait for the server to respond between chunks.

Return type

None

delete(ignore_404_errors=False)

Cancel this job. If this job has not finished running, it will be removed and canceled.

Return type

None

get_status()

Get status of batch prediction job

Returns
BatchPredictionJob status data

Dict with job status

classmethod list_by_status(statuses=None)

Get jobs collection for specific set of statuses

Returns
BatchPredictionJob statuses

List of job statuses dicts with specific statuses

Attributes
statuses

List of statuses to filter jobs ([ABORTED|COMPLETED…]) if statuses is not provided, returns all jobs for user

Return type

List[BatchPredictionJob]

class datarobot.models.BatchPredictionJobDefinition(id=None, name=None, enabled=None, schedule=None, batch_prediction_job=None, created=None, updated=None, created_by=None, updated_by=None, last_failed_run_time=None, last_successful_run_time=None, last_started_job_status=None, last_scheduled_run_time=None)
classmethod get(batch_prediction_job_definition_id)

Get batch prediction job definition

Returns
BatchPredictionJobDefinition

Instance of BatchPredictionJobDefinition

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.get('5a8ac9ab07a57a0001be501f')
>>> definition
BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
Attributes
batch_prediction_job_definition_id: str

ID of batch prediction job definition

Return type

BatchPredictionJobDefinition

classmethod list(search_name=None, deployment_id=None, limit=<datarobot.models.batch_prediction_job.MissingType object>, offset=0)

Get job all definitions

Parameters
search_namestr, optional

String for filtering job definitions Job definitions that contain the string in name will be returned. If not specified, all available job definitions will be returned.

deployment_id: str

The ID of the deployment record belongs to.

limit: int, optional

0 by default. At most this many results are returned.

offset: int, optional

This many results will be skipped.

Returns
List[BatchPredictionJobDefinition]

List of job definitions the user has access to see

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.list()
>>> definition
[
    BatchPredictionJobDefinition(60912e09fd1f04e832a575c1),
    BatchPredictionJobDefinition(6086ba053f3ef731e81af3ca)
]
Return type

List[BatchPredictionJobDefinition]

classmethod create(enabled, batch_prediction_job, name=None, schedule=None)

Creates a new batch prediction job definition to be run either at scheduled interval or as a manual run.

Returns
BatchPredictionJobDefinition

Instance of BatchPredictionJobDefinition

Examples

>>> import datarobot as dr
>>> job_spec = {
...    "num_concurrent": 4,
...    "deployment_id": "foobar",
...    "intake_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...    "output_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...}
>>> schedule = {
...    "day_of_week": [
...        1
...    ],
...    "month": [
...        "*"
...    ],
...    "hour": [
...        16
...    ],
...    "minute": [
...        0
...    ],
...    "day_of_month": [
...        1
...    ]
...}
>>> definition = BatchPredictionJobDefinition.create(
...    enabled=False,
...    batch_prediction_job=job_spec,
...    name="some_definition_name",
...    schedule=schedule
... )
>>> definition
BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
Attributes
enabledbool (default False)

Whether or not the definition should be active on a scheduled basis. If True, schedule is required.

batch_prediction_job: dict

The job specifications for your batch prediction job. It requires the same job input parameters as used with score(), only it will not initialize a job scoring, only store it as a definition for later use.

namestring (optional)

The name you want your job to be identified with. Must be unique across the organization’s existing jobs. If you don’t supply a name, a random one will be generated for you.

scheduledict (optional)

The schedule payload defines at what intervals the job should run, which can be combined in various ways to construct complex scheduling terms if needed. In all of the elements in the objects, you can supply either an asterisk ["*"] denoting “every” time denomination or an array of integers (e.g. [1, 2, 3]) to define a specific interval.

The schedule payload is split up in the following items:

Minute:

The minute(s) of the day that the job will run. Allowed values are either ["*"] meaning every minute of the day or [0 ... 59]

Hour: The hour(s) of the day that the job will run. Allowed values are either ["*"] meaning every hour of the day or [0 ... 23].

Day of Month: The date(s) of the month that the job will run. Allowed values are either [1 ... 31] or ["*"] for all days of the month. This field is additive with dayOfWeek, meaning the job will run both on the date(s) defined in this field and the day specified by dayOfWeek (for example, dates 1st, 2nd, 3rd, plus every Tuesday). If dayOfMonth is set to ["*"] and dayOfWeek is defined, the scheduler will trigger on every day of the month that matches dayOfWeek (for example, Tuesday the 2nd, 9th, 16th, 23rd, 30th). Invalid dates such as February 31st are ignored.

Month: The month(s) of the year that the job will run. Allowed values are either [1 ... 12] or ["*"] for all months of the year. Strings, either 3-letter abbreviations or the full name of the month, can be used interchangeably (e.g., “jan” or “october”). Months that are not compatible with dayOfMonth are ignored, for example {"dayOfMonth": [31], "month":["feb"]}

Day of Week: The day(s) of the week that the job will run. Allowed values are [0 .. 6], where (Sunday=0), or ["*"], for all days of the week. Strings, either 3-letter abbreviations or the full name of the day, can be used interchangeably (e.g., “sunday”, “Sunday”, “sun”, or “Sun”, all map to [0]. This field is additive with dayOfMonth, meaning the job will run both on the date specified by dayOfMonth and the day defined in this field.

Return type

BatchPredictionJobDefinition

update(enabled, batch_prediction_job=None, name=None, schedule=None)

Updates a job definition with the changed specs.

Takes the same input as create()

Returns
BatchPredictionJobDefinition

Instance of the updated BatchPredictionJobDefinition

Examples

>>> import datarobot as dr
>>> job_spec = {
...    "num_concurrent": 5,
...    "deployment_id": "foobar_new",
...    "intake_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...    "output_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...}
>>> schedule = {
...    "day_of_week": [
...        1
...    ],
...    "month": [
...        "*"
...    ],
...    "hour": [
...        "*"
...    ],
...    "minute": [
...        30, 59
...    ],
...    "day_of_month": [
...        1, 2, 6
...    ]
...}
>>> definition = BatchPredictionJobDefinition.create(
...    enabled=False,
...    batch_prediction_job=job_spec,
...    name="updated_definition_name",
...    schedule=schedule
... )
>>> definition
BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
Attributes
enabledbool (default False)

Same as enabled in create().

batch_prediction_job: dict

Same as batch_prediction_job in create().

namestring (optional)

Same as name in create().

scheduledict

Same as schedule in create().

Return type

BatchPredictionJobDefinition

run_on_schedule(schedule)

Sets the run schedule of an already created job definition.

If the job was previously not enabled, this will also set the job to enabled.

Returns
BatchPredictionJobDefinition

Instance of the updated BatchPredictionJobDefinition with the new / updated schedule.

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.create('...')
>>> schedule = {
...    "day_of_week": [
...        1
...    ],
...    "month": [
...        "*"
...    ],
...    "hour": [
...        "*"
...    ],
...    "minute": [
...        30, 59
...    ],
...    "day_of_month": [
...        1, 2, 6
...    ]
...}
>>> definition.run_on_schedule(schedule)
BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
Attributes
scheduledict

Same as schedule in create().

Return type

BatchPredictionJobDefinition

run_once()

Manually submits a batch prediction job to the queue, based off of an already created job definition.

Returns
BatchPredictionJob

Instance of BatchPredictionJob

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.create('...')
>>> job = definition.run_once()
>>> job.wait_for_completion()
Return type

BatchPredictionJob

delete()

Deletes the job definition and disables any future schedules of this job if any. If a scheduled job is currently running, this will not be cancelled.

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.get('5a8ac9ab07a57a0001be501f')
>>> definition.delete()
Return type

None

Batch Monitoring

class datarobot.models.BatchMonitoringJob(data, completed_resource_url=None)

A Batch Monitoring Job is used to monitor data sets outside DataRobot app.

Attributes
idstr

the id of the job

classmethod get(project_id, job_id)

Get batch monitoring job

Returns
BatchMonitoringJob

Instance of BatchMonitoringJob

Attributes
job_id: str

ID of batch job

Return type

BatchMonitoringJob

download(fileobj, timeout=120, read_timeout=660)

Downloads the results of a monitoring job as a CSV.

Attributes
fileobj: A file-like object where the CSV monitoring results will be

written to. Examples include an in-memory buffer (e.g., io.BytesIO) or a file on disk (opened for binary writing).

timeoutint (optional, default 120)

Seconds to wait for the download to become available.

The download will not be available before the job has started processing. In case other jobs are occupying the queue, processing may not start immediately.

If the timeout is reached, the job will be aborted and RuntimeError is raised.

Set to -1 to wait infinitely.

read_timeoutint (optional, default 660)

Seconds to wait for the server to respond between chunks.

Return type

None

classmethod run(deployment, intake_settings=None, output_settings=None, csv_settings=None, num_concurrent=None, chunk_size=None, abort_on_error=True, monitoring_aggregation=None, monitoring_columns=None, monitoring_output_settings=None, download_timeout=120, download_read_timeout=660, upload_read_timeout=600)

Create new batch monitoring job, upload the dataset, and return a batch monitoring job.

Returns
BatchMonitoringJob

Instance of BatchMonitoringJob

Examples

>>> import datarobot as dr
>>> job_spec = {
...     "intake_settings": {
...         "type": "jdbc",
...         "data_store_id": "645043933d4fbc3215f17e34",
...         "catalog": "SANDBOX",
...         "table": "10kDiabetes_output_actuals",
...         "schema": "SCORING_CODE_UDF_SCHEMA",
...         "credential_id": "645043b61a158045f66fb329"
...     },
>>>     "monitoring_columns": {
...         "predictions_columns": [
...             {
...                 "class_name": "True",
...                 "column_name": "readmitted_True_PREDICTION"
...             },
...             {
...                 "class_name": "False",
...                 "column_name": "readmitted_False_PREDICTION"
...             }
...         ],
...         "association_id_column": "rowID",
...         "actuals_value_column": "ACTUALS"
...     }
... }
>>> deployment_id = "foobar"
>>> job = dr.BatchMonitoringJob.run(deployment_id, **job_spec)
>>> job.wait_for_completion()
Attributes
deploymentDeployment or string ID

Deployment which will be used for monitoring.

intake_settingsdict

A dict configuring how data is coming from. Supported options:

  • type : string, either localFile, s3, azure, gcp, dataset, jdbc snowflake, synapse or bigquery

Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a dr.Dataset object.

To monitor from a local file, add this parameter to the settings:

  • file : A file-like object, string path to a file or a pandas.DataFrame of scoring data.

To monitor from S3, add the next parameters to the settings:

  • url : string, the URL to score (e.g.: s3://bucket/key).

  • credential_id : string (optional).

  • endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default).

To monitor from JDBC, add the next parameters to the settings:

  • data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).

  • query : string (optional if table, schema and/or catalog is specified), a self-supplied SELECT statement of the data set you wish to predict.

  • table : string (optional if query is specified), the name of specified database table.

  • schema : string (optional if query is specified), the name of specified database schema.

  • catalog : string (optional if query is specified), (new in v2.22) the name of specified database catalog.

  • fetch_size : int (optional), Changing the fetchSize can be used to balance throughput and memory usage.

  • credential_id : string (optional) the ID of the credentials holding information about a user with read-access to the JDBC data source (see Credentials).

output_settingsdict (optional)

A dict configuring how monitored data is to be saved. Supported options:

  • type : string, either localFile, s3, azure, gcp, jdbc, snowflake, synapse or bigquery

To save monitored data to a local file, add parameters to the settings:

  • path : string (optional), path to save the scored data as CSV. If a path is not specified, you must download the scored data yourself with job.download(). If a path is specified, the call will block until the job is done. if there are no other jobs currently processing for the targeted prediction instance, uploading, scoring, downloading will happen in parallel without waiting for a full job to complete. Otherwise, it will still block, but start downloading the scored data as soon as it starts generating data. This is the fastest method to get predictions.

To save monitored data to S3, add the next parameters to the settings:

  • url : string, the URL for storing the results (e.g.: s3://bucket/key).

  • credential_id : string (optional).

  • endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default).

To save monitored data to JDBC, add the next parameters to the settings:

  • data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).

  • table : string, the name of specified database table.

  • schema : string (optional), the name of specified database schema.

  • catalog : string (optional), (new in v2.22) the name of specified database catalog.

  • statement_type : string, the type of insertion statement to create, one of datarobot.enums.AVAILABLE_STATEMENT_TYPES.

  • update_columns : list(string) (optional), a list of strings containing those column names to be updated in case statement_type is set to a value related to update or upsert.

  • where_columns : list(string) (optional), a list of strings containing those column names to be selected in case statement_type is set to a value related to insert or update.

  • credential_id : string, the ID of the credentials holding information about a user with write-access to the JDBC data source (see Credentials).

  • create_table_if_not_exists : bool (optional), If no existing table is detected, attempt to create it before writing data with the strategy defined in the statementType parameter.

csv_settingsdict (optional)

CSV intake and output settings. Supported options:

  • delimiter : string (optional, default ,), fields are delimited by this character. Use the string tab to denote TSV (TAB separated values). Must be either a one-character string or the string tab.

  • quotechar : string (optional, default ), fields containing the delimiter must be quoted using this character.

  • encoding : string (optional, default utf-8), encoding for the CSV files. For example (but not limited to): shift_jis, latin_1 or mskanji.

num_concurrentint (optional)

Number of concurrent chunks to score simultaneously. Defaults to the available number of cores of the deployment. Lower it to leave resources for real-time scoring.

chunk_sizestring or int (optional)

Which strategy should be used to determine the chunk size. Can be either a named strategy or a fixed size in bytes. - auto: use fixed or dynamic based on flipper. - fixed: use 1MB for explanations, 5MB for regular requests. - dynamic: use dynamic chunk sizes. - int: use this many bytes per chunk.

abort_on_errorboolean (optional)

Default behavior is to abort the job if too many rows fail scoring. This will free up resources for other jobs that may score successfully. Set to false to unconditionally score every row no matter how many errors are encountered. Defaults to True.

download_timeoutint (optional)

New in version 2.22.

If using localFile output, wait this many seconds for the download to become available. See download().

download_read_timeoutint (optional, default 660)

New in version 2.22.

If using localFile output, wait this many seconds for the server to respond between chunks.

upload_read_timeout: int (optional, default 600)

New in version 2.28.

If using localFile intake, wait this many seconds for the server to respond after whole dataset upload.

Return type

BatchMonitoringJob

cancel(ignore_404_errors=False)

Cancel this job. If this job has not finished running, it will be removed and canceled.

Return type

None

get_status()

Get status of batch monitoring job

Returns
BatchMonitoringJob status data

Dict with job status

Return type

Any

class datarobot.models.BatchMonitoringJobDefinition(id=None, name=None, enabled=None, schedule=None, batch_monitoring_job=None, created=None, updated=None, created_by=None, updated_by=None, last_failed_run_time=None, last_successful_run_time=None, last_started_job_status=None, last_scheduled_run_time=None)
classmethod get(batch_monitoring_job_definition_id)

Get batch monitoring job definition

Returns
BatchMonitoringJobDefinition

Instance of BatchMonitoringJobDefinition

Examples

>>> import datarobot as dr
>>> definition = dr.BatchMonitoringJobDefinition.get('5a8ac9ab07a57a0001be501f')
>>> definition
BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1)
Attributes
batch_monitoring_job_definition_id: str

ID of batch monitoring job definition

Return type

BatchMonitoringJobDefinition

classmethod list()

Get job all monitoring job definitions

Returns
List[BatchMonitoringJobDefinition]

List of job definitions the user has access to see

Examples

>>> import datarobot as dr
>>> definition = dr.BatchMonitoringJobDefinition.list()
>>> definition
[
    BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1),
    BatchMonitoringJobDefinition(6086ba053f3ef731e81af3ca)
]
Return type

List[BatchMonitoringJobDefinition]

classmethod create(enabled, batch_monitoring_job, name=None, schedule=None)

Creates a new batch monitoring job definition to be run either at scheduled interval or as a manual run.

Returns
BatchMonitoringJobDefinition

Instance of BatchMonitoringJobDefinition

Examples

>>> import datarobot as dr
>>> job_spec = {
...    "num_concurrent": 4,
...    "deployment_id": "foobar",
...    "intake_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...    "output_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...}
>>> schedule = {
...    "day_of_week": [
...        1
...    ],
...    "month": [
...        "*"
...    ],
...    "hour": [
...        16
...    ],
...    "minute": [
...        0
...    ],
...    "day_of_month": [
...        1
...    ]
...}
>>> definition = BatchMonitoringJobDefinition.create(
...    enabled=False,
...    batch_monitoring_job=job_spec,
...    name="some_definition_name",
...    schedule=schedule
... )
>>> definition
BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1)
Attributes
enabledbool (default False)

Whether the definition should be active on a scheduled basis. If True, schedule is required.

batch_monitoring_job: dict

The job specifications for your batch monitoring job. It requires the same job input parameters as used with BatchMonitoringJob

namestring (optional)

The name you want your job to be identified with. Must be unique across the organization’s existing jobs. If you don’t supply a name, a random one will be generated for you.

scheduledict (optional)

The schedule payload defines at what intervals the job should run, which can be combined in various ways to construct complex scheduling terms if needed. In all the elements in the objects, you can supply either an asterisk ["*"] denoting “every” time denomination or an array of integers (e.g. [1, 2, 3]) to define a specific interval.

The schedule payload is split up in the following items:

Minute:

The minute(s) of the day that the job will run. Allowed values are either ["*"] meaning every minute of the day or [0 ... 59]

Hour: The hour(s) of the day that the job will run. Allowed values are either ["*"] meaning every hour of the day or [0 ... 23].

Day of Month: The date(s) of the month that the job will run. Allowed values are either [1 ... 31] or ["*"] for all days of the month. This field is additive with dayOfWeek, meaning the job will run both on the date(s) defined in this field and the day specified by dayOfWeek (for example, dates 1st, 2nd, 3rd, plus every Tuesday). If dayOfMonth is set to ["*"] and dayOfWeek is defined, the scheduler will trigger on every day of the month that matches dayOfWeek (for example, Tuesday the 2nd, 9th, 16th, 23rd, 30th). Invalid dates such as February 31st are ignored.

Month: The month(s) of the year that the job will run. Allowed values are either [1 ... 12] or ["*"] for all months of the year. Strings, either 3-letter abbreviations or the full name of the month, can be used interchangeably (e.g., “jan” or “october”). Months that are not compatible with dayOfMonth are ignored, for example {"dayOfMonth": [31], "month":["feb"]}

Day of Week: The day(s) of the week that the job will run. Allowed values are [0 .. 6], where (Sunday=0), or ["*"], for all days of the week. Strings, either 3-letter abbreviations or the full name of the day, can be used interchangeably (e.g., “sunday”, “Sunday”, “sun”, or “Sun”, all map to [0]. This field is additive with dayOfMonth, meaning the job will run both on the date specified by dayOfMonth and the day defined in this field.

Return type

BatchMonitoringJobDefinition

update(enabled, batch_monitoring_job=None, name=None, schedule=None)

Updates a job definition with the changed specs.

Takes the same input as create()

Returns
BatchMonitoringJobDefinition

Instance of the updated BatchMonitoringJobDefinition

Examples

>>> import datarobot as dr
>>> job_spec = {
...    "num_concurrent": 5,
...    "deployment_id": "foobar_new",
...    "intake_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...    "output_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...}
>>> schedule = {
...    "day_of_week": [
...        1
...    ],
...    "month": [
...        "*"
...    ],
...    "hour": [
...        "*"
...    ],
...    "minute": [
...        30, 59
...    ],
...    "day_of_month": [
...        1, 2, 6
...    ]
...}
>>> definition = BatchMonitoringJobDefinition.create(
...    enabled=False,
...    batch_monitoring_job=job_spec,
...    name="updated_definition_name",
...    schedule=schedule
... )
>>> definition
BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1)
Attributes
enabledbool (default False)

Same as enabled in create().

batch_monitoring_job: dict

Same as batch_monitoring_job in create().

namestring (optional)

Same as name in create().

scheduledict

Same as schedule in create().

Return type

BatchMonitoringJobDefinition

run_on_schedule(schedule)

Sets the run schedule of an already created job definition.

If the job was previously not enabled, this will also set the job to enabled.

Returns
BatchMonitoringJobDefinition

Instance of the updated BatchMonitoringJobDefinition with the new / updated schedule.

Examples

>>> import datarobot as dr
>>> definition = dr.BatchMonitoringJobDefinition.create('...')
>>> schedule = {
...    "day_of_week": [
...        1
...    ],
...    "month": [
...        "*"
...    ],
...    "hour": [
...        "*"
...    ],
...    "minute": [
...        30, 59
...    ],
...    "day_of_month": [
...        1, 2, 6
...    ]
...}
>>> definition.run_on_schedule(schedule)
BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1)
Attributes
scheduledict

Same as schedule in create().

Return type

BatchMonitoringJobDefinition

run_once()

Manually submits a batch monitoring job to the queue, based off of an already created job definition.

Returns
BatchMonitoringJob

Instance of BatchMonitoringJob

Examples

>>> import datarobot as dr
>>> definition = dr.BatchMonitoringJobDefinition.create('...')
>>> job = definition.run_once()
>>> job.wait_for_completion()
Return type

BatchMonitoringJob

delete()

Deletes the job definition and disables any future schedules of this job if any. If a scheduled job is currently running, this will not be cancelled.

Examples

>>> import datarobot as dr
>>> definition = dr.BatchMonitoringJobDefinition.get('5a8ac9ab07a57a0001be501f')
>>> definition.delete()
Return type

None

Status Check Job

class datarobot.models.StatusCheckJob(job_id, resource_type=None)

Tracks asynchronous task status

Attributes
job_idstr

The ID of the status the job belongs to.

wait_for_completion(max_wait=600)

Waits for job to complete.

Parameters
max_waitint, optional

How long to wait for the job to finish. If the time expires, DataRobot returns the current status.

Returns
statusJobStatusResult

Returns the current status of the job.

Return type

JobStatusResult

get_status()

Retrieve JobStatusResult object with the latest job status data from the server.

Return type

JobStatusResult

get_result_when_complete(max_wait=600)

Wait for the job to complete, then attempt to convert the resulting json into an object of type self.resource_type Returns ——- A newly created resource of type self.resource_type

Return type

APIObject

class datarobot.models.JobStatusResult(status: Optional[str], status_id: Optional[str], completed_resource_url: Optional[str], message: Optional[str])

This class represents a result of status check for submitted async jobs.

status: Optional[str]

Alias for field number 0

status_id: Optional[str]

Alias for field number 1

completed_resource_url: Optional[str]

Alias for field number 2

message: Optional[str]

Alias for field number 3

Blueprint

class datarobot.models.Blueprint(id=None, processes=None, model_type=None, project_id=None, blueprint_category=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, recommended_featurelist_id=None, supports_composable_ml=None, supports_incremental_learning=None)

A Blueprint which can be used to fit models

Attributes
idstr

the id of the blueprint

processeslist of str

the processes used by the blueprint

model_typestr

the model produced by the blueprint

project_idstr

the project the blueprint belongs to

blueprint_categorystr

(New in version v2.6) Describes the category of the blueprint and the kind of model it produces.

recommended_featurelist_id: str or null

(New in v2.18) The ID of the feature list recommended for this blueprint. If this field is not present, then there is no recommended feature list.

supports_composable_mlbool or None

(New in version v2.26) whether this blueprint is supported in the Composable ML.

supports_incremental_learningbool or None

(New in version v3.3) whether this blueprint supports incremental learning.

classmethod get(project_id, blueprint_id)

Retrieve a blueprint.

Parameters
project_idstr

The project’s id.

blueprint_idstr

Id of blueprint to retrieve.

Returns
blueprintBlueprint

The queried blueprint.

Return type

Blueprint

get_json()

Get the blueprint json representation used by this model.

Returns
BlueprintJson

Json representation of the blueprint stages.

Return type

Dict[str, Tuple[List[str], List[str], str]]

get_chart()

Retrieve a chart.

Returns
BlueprintChart

The current blueprint chart.

Return type

BlueprintChart

get_documents()

Get documentation for tasks used in the blueprint.

Returns
list of BlueprintTaskDocument

All documents available for blueprint.

Return type

List[BlueprintTaskDocument]

classmethod from_data(data)

Instantiate an object of this class using a dict.

Parameters
datadict

Correctly snake_cased keys and their values.

Return type

TypeVar(T, bound= APIObject)

classmethod from_server_data(data, keep_attrs=None)

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters
datadict

The directly translated dict of JSON from the server. No casing fixes have taken place

keep_attrsiterable

List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

Return type

TypeVar(T, bound= APIObject)

class datarobot.models.BlueprintTaskDocument(title=None, task=None, description=None, parameters=None, links=None, references=None)

Document describing a task from a blueprint.

Attributes
titlestr

Title of document.

taskstr

Name of the task described in document.

descriptionstr

Task description.

parameterslist of dict(name, type, description)

Parameters that task can receive in human-readable format.

linkslist of dict(name, url)

External links used in document

referenceslist of dict(name, url)

References used in document. When no link available url equals None.

class datarobot.models.BlueprintChart(nodes, edges)

A Blueprint chart that can be used to understand data flow in blueprint.

Attributes
nodeslist of dict (id, label)

Chart nodes, id unique in chart.

edgeslist of tuple (id1, id2)

Directions of data flow between blueprint chart nodes.

classmethod get(project_id, blueprint_id)

Retrieve a blueprint chart.

Parameters
project_idstr

The project’s id.

blueprint_idstr

Id of blueprint to retrieve chart.

Returns
BlueprintChart

The queried blueprint chart.

Return type

BlueprintChart

to_graphviz()

Get blueprint chart in graphviz DOT format.

Returns
unicode

String representation of chart in graphviz DOT language.

Return type

str

class datarobot.models.ModelBlueprintChart(nodes, edges)

A Blueprint chart that can be used to understand data flow in model. Model blueprint chart represents reduced repository blueprint chart with only elements that used to build this particular model.

Attributes
nodeslist of dict (id, label)

Chart nodes, id unique in chart.

edgeslist of tuple (id1, id2)

Directions of data flow between blueprint chart nodes.

classmethod get(project_id, model_id)

Retrieve a model blueprint chart.

Parameters
project_idstr

The project’s id.

model_idstr

Id of model to retrieve model blueprint chart.

Returns
ModelBlueprintChart

The queried model blueprint chart.

Return type

ModelBlueprintChart

to_graphviz()

Get blueprint chart in graphviz DOT format.

Returns
unicode

String representation of chart in graphviz DOT language.

Return type

str

Calendar File

class datarobot.CalendarFile(calendar_end_date=None, calendar_start_date=None, created=None, id=None, name=None, num_event_types=None, num_events=None, project_ids=None, role=None, multiseries_id_columns=None)

Represents the data for a calendar file.

For more information about calendar files, see the calendar documentation.

Attributes
idstr

The id of the calendar file.

calendar_start_datestr

The earliest date in the calendar.

calendar_end_datestr

The last date in the calendar.

createdstr

The date this calendar was created, i.e. uploaded to DR.

namestr

The name of the calendar.

num_event_typesint

The number of different event types.

num_eventsint

The number of events this calendar has.

project_idslist of strings

A list containing the projectIds of the projects using this calendar.

multiseries_id_columns: list of str or None

A list of columns in calendar which uniquely identify events for different series. Currently, only one column is supported. If multiseries id columns are not provided, calendar is considered to be single series.

rolestr

The access role the user has for this calendar.

classmethod create(file_path, calendar_name=None, multiseries_id_columns=None)

Creates a calendar using the given file. For information about calendar files, see the calendar documentation

The provided file must be a CSV in the format:

Date,   Event,          Series ID,    Event Duration
<date>, <event_type>,   <series id>,  <event duration>
<date>, <event_type>,              ,  <event duration>

A header row is required, and the “Series ID” and “Event Duration” columns are optional.

Once the CalendarFile has been created, pass its ID with the DatetimePartitioningSpecification when setting the target for a time series project in order to use it.

Parameters
file_pathstring

A string representing a path to a local csv file.

calendar_namestring, optional

A name to assign to the calendar. Defaults to the name of the file if not provided.

multiseries_id_columnslist of str or None

A list of the names of multiseries id columns to define which series an event belongs to. Currently only one multiseries id column is supported.

Returns
calendar_fileCalendarFile

Instance with initialized data.

Raises
AsyncProcessUnsuccessfulError

Raised if there was an error processing the provided calendar file.

Examples

# Creating a calendar with a specified name
cal = dr.CalendarFile.create('/home/calendars/somecalendar.csv',
                                         calendar_name='Some Calendar Name')
cal.id
>>> 5c1d4904211c0a061bc93013
cal.name
>>> Some Calendar Name

# Creating a calendar without specifying a name
cal = dr.CalendarFile.create('/home/calendars/somecalendar.csv')
cal.id
>>> 5c1d4904211c0a061bc93012
cal.name
>>> somecalendar.csv

# Creating a calendar with multiseries id columns
cal = dr.CalendarFile.create('/home/calendars/somemultiseriescalendar.csv',
                             calendar_name='Some Multiseries Calendar Name',
                             multiseries_id_columns=['series_id'])
cal.id
>>> 5da9bb21962d746f97e4daee
cal.name
>>> Some Multiseries Calendar Name
cal.multiseries_id_columns
>>> ['series_id']
Return type

CalendarFile

classmethod create_calendar_from_dataset(dataset_id, dataset_version_id=None, calendar_name=None, multiseries_id_columns=None, delete_on_error=False)

Creates a calendar using the given dataset. For information about calendar files, see the calendar documentation

The provided dataset have the following format:

Date,   Event,          Series ID,    Event Duration
<date>, <event_type>,   <series id>,  <event duration>
<date>, <event_type>,              ,  <event duration>

The “Series ID” and “Event Duration” columns are optional.

Once the CalendarFile has been created, pass its ID with the DatetimePartitioningSpecification when setting the target for a time series project in order to use it.

Parameters
dataset_idstring

The identifier of the dataset from which to create the calendar.

dataset_version_idstring, optional

The identifier of the dataset version from which to create the calendar.

calendar_namestring, optional

A name to assign to the calendar. Defaults to the name of the dataset if not provided.

multiseries_id_columnslist of str, optional

A list of the names of multiseries id columns to define which series an event belongs to. Currently only one multiseries id column is supported.

delete_on_errorboolean, optional

Whether delete calendar file from Catalog if it’s not valid.

Returns
calendar_fileCalendarFile

Instance with initialized data.

Raises
AsyncProcessUnsuccessfulError

Raised if there was an error processing the provided calendar file.

Examples

# Creating a calendar from a dataset
dataset = dr.Dataset.create_from_file('/home/calendars/somecalendar.csv')
cal = dr.CalendarFile.create_calendar_from_dataset(
    dataset.id, calendar_name='Some Calendar Name'
)
cal.id
>>> 5c1d4904211c0a061bc93013
cal.name
>>> Some Calendar Name

# Creating a calendar from a new dataset version
new_dataset_version = dr.Dataset.create_version_from_file(
    dataset.id, '/home/calendars/anothercalendar.csv'
)
cal = dr.CalendarFile.create(
    new_dataset_version.id, dataset_version_id=new_dataset_version.version_id
)
cal.id
>>> 5c1d4904211c0a061bc93012
cal.name
>>> anothercalendar.csv
Return type

CalendarFile

classmethod create_calendar_from_country_code(country_code, start_date, end_date)

Generates a calendar based on the provided country code and dataset start date and end dates. The provided country code should be uppercase and 2-3 characters long. See CalendarFile.get_allowed_country_codes for a list of allowed country codes.

Parameters
country_codestring

The country code for the country to use for generating the calendar.

start_datedatetime.datetime

The earliest date to include in the generated calendar.

end_datedatetime.datetime

The latest date to include in the generated calendar.

Returns
calendar_fileCalendarFile

Instance with initialized data.

Return type

CalendarFile

classmethod get_allowed_country_codes(offset=None, limit=None)

Retrieves the list of allowed country codes that can be used for generating the preloaded calendars.

Parameters
offsetint

Optional, defaults to 0. This many results will be skipped.

limitint

Optional, defaults to 100, maximum 1000. At most this many results are returned.

Returns
list

A list dicts, each of which represents an allowed country codes. Each item has the following structure:

Return type

List[CountryCode]

classmethod get(calendar_id)

Gets the details of a calendar, given the id.

Parameters
calendar_idstr

The identifier of the calendar.

Returns
calendar_fileCalendarFile

The requested calendar.

Raises
DataError

Raised if the calendar_id is invalid, i.e. the specified CalendarFile does not exist.

Examples

cal = dr.CalendarFile.get(some_calendar_id)
cal.id
>>> some_calendar_id
Return type

CalendarFile

classmethod list(project_id=None, batch_size=None)

Gets the details of all calendars this user has view access for.

Parameters
project_idstr, optional

If provided, will filter for calendars associated only with the specified project.

batch_sizeint, optional

The number of calendars to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of calendars. If not specified, an appropriate default will be chosen by the server.

Returns
calendar_listlist of CalendarFile

A list of CalendarFile objects.

Examples

calendars = dr.CalendarFile.list()
len(calendars)
>>> 10
Return type

List[CalendarFile]

classmethod delete(calendar_id)

Deletes the calendar specified by calendar_id.

Parameters
calendar_idstr

The id of the calendar to delete. The requester must have OWNER access for this calendar.

Raises
ClientError

Raised if an invalid calendar_id is provided.

Examples

# Deleting with a valid calendar_id
status_code = dr.CalendarFile.delete(some_calendar_id)
status_code
>>> 204
dr.CalendarFile.get(some_calendar_id)
>>> ClientError: Item not found
Return type

None

classmethod update_name(calendar_id, new_calendar_name)

Changes the name of the specified calendar to the specified name. The requester must have at least READ_WRITE permissions on the calendar.

Parameters
calendar_idstr

The id of the calendar to update.

new_calendar_namestr

The new name to set for the specified calendar.

Returns
status_codeint

200 for success

Raises
ClientError

Raised if an invalid calendar_id is provided.

Examples

response = dr.CalendarFile.update_name(some_calendar_id, some_new_name)
response
>>> 200
cal = dr.CalendarFile.get(some_calendar_id)
cal.name
>>> some_new_name
Return type

int

classmethod share(calendar_id, access_list)

Shares the calendar with the specified users, assigning the specified roles.

Parameters
calendar_idstr

The id of the calendar to update

access_list:

A list of dr.SharingAccess objects. Specify None for the role to delete a user’s access from the specified CalendarFile. For more information on specific access levels, see the sharing documentation.

Returns
status_codeint

200 for success

Raises
ClientError

Raised if unable to update permissions for a user.

AssertionError

Raised if access_list is invalid.

Examples

# assuming some_user is a valid user, share this calendar with some_user
sharing_list = [dr.SharingAccess(some_user_username,
                                 dr.enums.SHARING_ROLE.READ_WRITE)]
response = dr.CalendarFile.share(some_calendar_id, sharing_list)
response.status_code
>>> 200

# delete some_user from this calendar, assuming they have access of some kind already
delete_sharing_list = [dr.SharingAccess(some_user_username,
                                        None)]
response = dr.CalendarFile.share(some_calendar_id, delete_sharing_list)
response.status_code
>>> 200

# Attempt to add an invalid user to a calendar
invalid_sharing_list = [dr.SharingAccess(invalid_username,
                                         dr.enums.SHARING_ROLE.READ_WRITE)]
dr.CalendarFile.share(some_calendar_id, invalid_sharing_list)
>>> ClientError: Unable to update access for this calendar
Return type

int

classmethod get_access_list(calendar_id, batch_size=None)

Retrieve a list of users that have access to this calendar.

Parameters
calendar_idstr

The id of the calendar to retrieve the access list for.

batch_sizeint, optional

The number of access records to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of calendars. If not specified, an appropriate default will be chosen by the server.

Returns
access_control_listlist of SharingAccess

A list of SharingAccess objects.

Raises
ClientError

Raised if user does not have access to calendar or calendar does not exist.

Return type

List[SharingAccess]

class datarobot.models.calendar_file.CountryCode() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list.  For example:  dict(one=1, two=2)

Automated Documentation

class datarobot.models.automated_documentation.AutomatedDocument(entity_id=None, document_type=None, output_format=None, locale=None, template_id=None, id=None, filepath=None, created_at=None)

An automated documentation object.

New in version v2.24.

Attributes
document_typestr or None

Type of automated document. You can specify: MODEL_COMPLIANCE, AUTOPILOT_SUMMARY depending on your account settings. Required for document generation.

entity_idstr or None

ID of the entity to generate the document for. It can be model ID or project ID. Required for document generation.

output_formatstr or None

Format of the generate document, either docx or html. Required for document generation.

localestr or None

Localization of the document, dependent on your account settings. Default setting is EN_US.

template_idstr or None

Template ID to use for the document outline. Defaults to standard DataRobot template. See the documentation for ComplianceDocTemplate for more information.

idstr or None

ID of the document. Required to download or delete a document.

filepathstr or None

Path to save a downloaded document to. Either include a file path and name or the file will be saved to the directory from which the script is launched.

created_atdatetime or None

Document creation timestamp.

classmethod list_available_document_types()

Get a list of all available document types and locales.

Returns
List of dicts

Examples

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
doc_types = dr.AutomatedDocument.list_available_document_types()
Return type

List[DocumentOption]

property is_model_compliance_initialized: Tuple[bool, str]

Check if model compliance documentation pre-processing is initialized. Model compliance documentation pre-processing must be initialized before generating documentation for a custom model.

Returns
Tuple of (boolean, string)
  • boolean flag is whether model compliance documentation pre-processing is initialized

  • string value is the initialization status

Return type

Tuple[bool, str]

initialize_model_compliance()

Initialize model compliance documentation pre-processing. Must be called before generating documentation for a custom model.

Returns
Tuple of (boolean, string)
  • boolean flag is whether model compliance documentation pre-processing is initialized

  • string value is the initialization status

Examples

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)

# NOTE: entity_id is either a model id or a model package (version) id
doc = dr.AutomatedDocument(
        document_type="MODEL_COMPLIANCE",
        entity_id="6f50cdb77cc4f8d1560c3ed5",
        output_format="docx",
        locale="EN_US")

doc.initialize_model_compliance()
Return type

Tuple[bool, str]

generate(max_wait=600)

Request generation of an automated document.

Required attributes to request document generation: document_type, entity_id, and output_format.

Returns
requests.models.Response

Examples

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)

doc = dr.AutomatedDocument(
        document_type="MODEL_COMPLIANCE",
        entity_id="6f50cdb77cc4f8d1560c3ed5",
        output_format="docx",
        locale="EN_US",
        template_id="50efc9db8aff6c81a374aeec",
        filepath="/Users/username/Documents/example.docx"
        )

doc.generate()
doc.download()
Return type

Response

download()

Download a generated Automated Document. Document ID is required to download a file.

Returns
requests.models.Response

Examples

Generating and downloading the generated document:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)

doc = dr.AutomatedDocument(
        document_type="AUTOPILOT_SUMMARY",
        entity_id="6050d07d9da9053ebb002ef7",
        output_format="docx",
        filepath="/Users/username/Documents/Project_Report_1.docx"
        )

doc.generate()
doc.download()

Downloading an earlier generated document when you know the document ID:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
doc = dr.AutomatedDocument(id='5e8b6a34d2426053ab9a39ed')
doc.download()

Notice that filepath was not set for this document. In this case, the file is saved to the directory from which the script was launched.

Downloading a document chosen from a list of earlier generated documents:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)

model_id = "6f5ed3de855962e0a72a96fe"
docs = dr.AutomatedDocument.list_generated_documents(entity_ids=[model_id])
doc = docs[0]
doc.filepath = "/Users/me/Desktop/Recommended_model_doc.docx"
doc.download()
Return type

Response

delete()

Delete a document using its ID.

Returns
requests.models.Response

Examples

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
doc = dr.AutomatedDocument(id="5e8b6a34d2426053ab9a39ed")
doc.delete()

If you don’t know the document ID, you can follow the same workflow to get the ID as in the examples for the AutomatedDocument.download method.

Return type

Response

classmethod list_generated_documents(document_types=None, entity_ids=None, output_formats=None, locales=None, offset=None, limit=None)

Get information about all previously generated documents available for your account. The information includes document ID and type, ID of the entity it was generated for, time of creation, and other information.

Parameters
document_typesList of str or None

Query for one or more document types.

entity_idsList of str or None

Query generated documents by one or more entity IDs.

output_formatsList of str or None

Query for one or more output formats.

localesList of str or None

Query generated documents by one or more locales.

offset: int or None

Number of items to skip. Defaults to 0 if not provided.

limit: int or None

Number of items to return, maximum number of items is 1000.

Returns
List of AutomatedDocument objects, where each object contains attributes described in
AutomatedDocument

Examples

To get a list of all generated documents:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
docs = AutomatedDocument.list_generated_documents()

To get a list of all AUTOPILOT_SUMMARY documents:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
docs = AutomatedDocument.list_generated_documents(document_types=["AUTOPILOT_SUMMARY"])

To get a list of 5 recently created automated documents in html format:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
docs = AutomatedDocument.list_generated_documents(output_formats=["html"], limit=5)

To get a list of automated documents created for specific entities (projects or models):

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
docs = AutomatedDocument.list_generated_documents(
    entity_ids=["6051d3dbef875eb3be1be036",
                "6051d3e1fbe65cd7a5f6fde6",
                "6051d3e7f86c04486c2f9584"]
    )

Note, that the list of results contains AutomatedDocument objects, which means that you can execute class-related methods on them. Here’s how you can list, download, and then delete from the server all automated documents related to a certain entity:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)

ids = ["6051d3dbef875eb3be1be036", "5fe1d3d55cd810ebdb60c517f"]
docs = AutomatedDocument.list_generated_documents(entity_ids=ids)
for doc in docs:
    doc.download()
    doc.delete()
Return type

List[AutomatedDocument]

class datarobot.models.automated_documentation.DocumentOption() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list.  For example:  dict(one=1, two=2)

Challenger

class datarobot.models.deployment.challenger.Challenger(id, deployment_id=None, name=None, model=None, model_package=None, prediction_environment=None)

A challenger is an alternative model being compared to the model currently deployed

Attributes
idstr

The ID of the challenger.

deployment_idstr

The ID of the deployment.

namestr

The name of the challenger.

modeldict

The model of the challenger.

model_packagedict

The model package of the challenger.

prediction_environmentdict

The prediction environment of the challenger.

classmethod create(deployment_id, model_package_id, prediction_environment_id, name, max_wait=600)

Create a challenger for a deployment

Parameters
deployment_idstr

The ID of the deployment

model_package_idstr

The model package id of the challenger model

prediction_environment_idstr

The prediction environment id of the challenger model

namestr

The name of the challenger model

max_waitint, optional

The amount of seconds to wait for successful resolution of a challenger creation job.

Examples

from datarobot import Challenger
challenger = Challenger.create(
    deployment_id="5c939e08962d741e34f609f0",
    name="Elastic-Net Classifier",
    model_package_id="5c0a969859b00004ba52e41b",
    prediction_environment_id="60b012436635fc00909df555"
)
Return type

Challenger

classmethod get(deployment_id, challenger_id)

Get a challenger for a deployment

Parameters
deployment_idstr

The ID of the deployment

challenger_idstr

The ID of the challenger

Returns
Challenger

The challenger object

Examples

from datarobot import Challenger
challenger = Challenger.get(
    deployment_id="5c939e08962d741e34f609f0",
    challenger_id="5c939e08962d741e34f609f0"
)

challenger.id
>>>'5c939e08962d741e34f609f0'
challenger.model_package['name']
>>> 'Elastic-Net Classifier'
Return type

Challenger

classmethod list(deployment_id)

List all challengers for a deployment

Parameters
deployment_idstr

The ID of the deployment

Returns
challengers: list

A list of challenger objects

Examples

from datarobot import Challenger
challengers = Challenger.list(deployment_id="5c939e08962d741e34f609f0")

challengers[0].id
>>>'5c939e08962d741e34f609f0'
challengers[0].model_package['name']
>>> 'Elastic-Net Classifier'
Return type

List[Challenger]

delete()

Delete a challenger for a deployment

Return type

None

update(name=None, prediction_environment_id=None)

Update name and prediction environment of a challenger

Parameters
name: str, optional

The name of the challenger model

prediction_environment_id: str, optional

The prediction environment id of the challenger model

Return type

None

Class Mapping Aggregation Settings

For multiclass projects with a lot of unique values in target column you can specify the parameters for aggregation of rare values to improve the modeling performance and decrease the runtime and resource usage of resulting models.

class datarobot.helpers.ClassMappingAggregationSettings(max_unaggregated_class_values=None, min_class_support=None, excluded_from_aggregation=None, aggregation_class_name=None)

Class mapping aggregation settings. For multiclass projects allows fine control over which target values will be preserved as classes. Classes which aren’t preserved will be - aggregated into a single “catch everything else” class in case of multiclass - or will be ignored in case of multilabel. All attributes are optional, if not specified - server side defaults will be used.

Attributes
max_unaggregated_class_valuesint, optional

Maximum amount of unique values allowed before aggregation kicks in.

min_class_supportint, optional

Minimum number of instances necessary for each target value in the dataset. All values with less instances will be aggregated.

excluded_from_aggregationlist, optional

List of target values that should be guaranteed to kept as is, regardless of other settings.

aggregation_class_namestr, optional

If some of the values will be aggregated - this is the name of the aggregation class that will replace them.

Client Configuration

datarobot.client.Client(token=None, endpoint=None, config_path=None, connect_timeout=None, user_agent_suffix=None, ssl_verify=None, max_retries=None, token_type=None, default_use_case=None, enable_api_consumer_tracking=None, trace_context=None)

Configures the global API client for the Python SDK. The client will be configured in one of the following ways, in order of priority.

Parameters
tokenstr, optional

API token.

endpointstr, optional

Base URL of API.

config_pathstr, optional

An alternate location of the config file.

connect_timeoutint, optional

How long the client should be willing to wait before giving up on establishing a connection with the server.

user_agent_suffixstr, optional

Additional text that is appended to the User-Agent HTTP header when communicating with the DataRobot REST API. This can be useful for identifying different applications that are built on top of the DataRobot Python Client, which can aid debugging and help track usage.

ssl_verifybool or str, optional

Whether to check SSL certificate. Could be set to path with certificates of trusted certification authorities. Default: True.

max_retriesint or urllib3.util.retry.Retry, optional

Either an integer number of times to retry connection errors, or a urllib3.util.retry.Retry object to configure retries.

token_type: str, optional

Authentication token type: Token, Bearer. “Bearer” is for DataRobot OAuth2 token, “Token” for token generated in Developer Tools. Default: “Token”.

default_use_case: str, optional

The entity ID of the default Use Case to use with any requests made by the client.

enable_api_consumer_tracking: bool, optional

Enable and disable user metrics tracking within the datarobot module. Default: False.

trace_context: str, optional

An ID or other string for identifying which code template or AI Accelerator was used to make a request.

Returns
The RESTClientObject instance created.

Notes

Token and endpoint must be specified from one source only. This is a restriction to prevent token leakage if environment variables or config file are used.

The DataRobotClientConfig params will be looking up to find the configuration parameters in one of the following ways,

  1. From call kwargs if specified;

  2. From a YAML file at the path specified in the config_path kwarg;

  3. From a YAML file at the path specified in the environment variables DATAROBOT_CONFIG_FILE;

  4. From environment variables;

  5. From the default values in the default YAML file at the path $HOME/.config/datarobot/drconfig.yaml.

This can also have the side effect of setting a default Use Case for client API requests.

Return type

RESTClientObject

datarobot.client.get_client()

Returns the global HTTP client for the Python SDK, instantiating it if necessary.

Return type

RESTClientObject

datarobot.client.set_client(client)

Configure the global HTTP client for the Python SDK. Returns previous instance.

Return type

Optional[RESTClientObject]

datarobot.client.client_configuration(*args, **kwargs)

This context manager can be used to temporarily change the global HTTP client.

In multithreaded scenarios, it is highly recommended to use a fresh manager object per thread.

DataRobot does not recommend nesting these contexts.

Parameters
argsParameters passed to datarobot.client.Client()
kwargsKeyword arguments passed to datarobot.client.Client()

Examples

from datarobot.client import client_configuration
from datarobot.models import Project

with client_configuration(token="api-key-here", endpoint="https://host-name.com"):
    Project.list()
from datarobot.client import Client, client_configuration
from datarobot.models import Project

Client()  # Interact with DataRobot using the default configuration.
Project.list()

with client_configuration(config_path="/path/to/a/drconfig.yaml"):
    # Interact with DataRobot using a different configuration.
    Project.list()
class datarobot.rest.RESTClientObject(auth, endpoint, connect_timeout=6.05, verify=True, user_agent_suffix=None, max_retries=None, authentication_type=None)
Parameters
connect_timeout

timeout for http request and connection

headers

headers for outgoing requests

open_in_browser()

Opens the DataRobot app in a web browser, or logs the URL if a browser is not available.

Return type

None

Clustering

class datarobot.models.ClusteringModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, is_n_clusters_dynamically_determined=None, blueprint_id=None, metrics=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, n_clusters=None, has_empty_clusters=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)

ClusteringModel extends Model class. It provides provides properties and methods specific to clustering projects.

compute_insights(max_wait=600)

Compute and retrieve cluster insights for model. This method awaits completion of job computing cluster insights and returns results after it is finished. If computation takes longer than specified max_wait exception will be raised.

Parameters
project_id: str

Project to start creation in.

model_id: str

Project’s model to start creation in.

max_wait: int

Maximum number of seconds to wait before giving up

Returns
List of ClusterInsight
Raises
ClientError

Server rejected creation due to client error. Most likely cause is bad project_id or model_id.

AsyncFailureError

If any of the responses from the server are unexpected

AsyncProcessUnsuccessfulError

If the cluster insights computation has failed or was cancelled.

AsyncTimeoutError

If the cluster insights computation did not resolve in time

Return type

List[ClusterInsight]

property insights: List[ClusterInsight]

Return actual list of cluster insights if already computed.

Returns
List of ClusterInsight
Return type

List[ClusterInsight]

property clusters: List[Cluster]

Return actual list of Clusters.

Returns
List of Cluster
Return type

List[Cluster]

update_cluster_names(cluster_name_mappings)

Change many cluster names at once based on list of name mappings.

Parameters
cluster_name_mappings: List of tuples

Cluster names mapping consisting of current cluster name and old cluster name. Example:

cluster_name_mappings = [
    ("current cluster name 1", "new cluster name 1"),
    ("current cluster name 2", "new cluster name 2")]
Returns
List of Cluster
Raises
datarobot.errors.ClientError

Server rejected update of cluster names. Possible reasons include: incorrect format of mapping, mapping introduces duplicates.

Return type

List[Cluster]

update_cluster_name(current_name, new_name)

Change cluster name from current_name to new_name.

Parameters
current_name: str

Current cluster name.

new_name: str

New cluster name.

Returns
List of Cluster
Raises
datarobot.errors.ClientError

Server rejected update of cluster names.

Return type

List[Cluster]

class datarobot.models.cluster.Cluster(**kwargs)

Representation of a single cluster.

Attributes
name: str

Current cluster name

percent: float

Percent of data contained in the cluster. This value is reported after cluster insights are computed for the model.

classmethod list(project_id, model_id)

Retrieve a list of clusters in the model.

Parameters
project_id: str

ID of the project that the model is part of.

model_id: str

ID of the model.

Returns
List of clusters
Return type

List[Cluster]

classmethod update_multiple_names(project_id, model_id, cluster_name_mappings)

Update many clusters at once based on list of name mappings.

Parameters
project_id: str

ID of the project that the model is part of.

model_id: str

ID of the model.

cluster_name_mappings: List of tuples

Cluster name mappings, consisting of current and previous names for each cluster. Example:

cluster_name_mappings = [
    ("current cluster name 1", "new cluster name 1"),
    ("current cluster name 2", "new cluster name 2")]
Returns
List of clusters
Raises
datarobot.errors.ClientError

Server rejected update of cluster names.

ValueError

Invalid cluster name mapping provided.

Return type

List[Cluster]

classmethod update_name(project_id, model_id, current_name, new_name)

Change cluster name from current_name to new_name

Parameters
project_id: str

ID of the project that the model is part of.

model_id: str

ID of the model.

current_name: str

Current cluster name

new_name: str

New cluster name

Returns
List of Cluster
Return type

List[Cluster]

class datarobot.models.cluster_insight.ClusterInsight(**kwargs)

Holds data on all insights related to feature as well as breakdown per cluster.

Parameters
feature_name: str

Name of a feature from the dataset.

feature_type: str

Type of feature.

insightsList of classes (ClusterInsight)

List provides information regarding the importance of a specific feature in relation to each cluster. Results help understand how the model is grouping data and what each cluster represents.

feature_impact: float

Impact of a feature ranging from 0 to 1.

classmethod compute(project_id, model_id, max_wait=600)

Starts creation of cluster insights for the model and if successful, returns computed ClusterInsights. This method allows calculation to continue for a specified time and if not complete, cancels the request.

Parameters
project_id: str

ID of the project to begin creation of cluster insights for.

model_id: str

ID of the project model to begin creation of cluster insights for.

max_wait: int

Maximum number of seconds to wait canceling the request.

Returns
List[ClusterInsight]
Raises
ClientError

Server rejected creation due to client error. Most likely cause is bad project_id or model_id.

AsyncFailureError

Indicates whether any of the responses from the server are unexpected.

AsyncProcessUnsuccessfulError

Indicates whether the cluster insights computation failed or was cancelled.

AsyncTimeoutError

Indicates whether the cluster insights computation did not resolve within the specified time limit (max_wait).

Return type

List[ClusterInsight]

Compliance Documentation Templates

class datarobot.models.compliance_doc_template.ComplianceDocTemplate(id, creator_id, creator_username, name, org_id=None, sections=None)

A compliance documentation template. Templates are used to customize contents of AutomatedDocument.

New in version v2.14.

Notes

Each section dictionary has the following schema:

  • title : title of the section

  • type : type of section. Must be one of “datarobot”, “user” or “table_of_contents”.

Each type of section has a different set of attributes described bellow.

Section of type "datarobot" represent a section owned by DataRobot. DataRobot sections have the following additional attributes:

  • content_id : The identifier of the content in this section. You can get the default template with get_default for a complete list of possible DataRobot section content ids.

  • sections : list of sub-section dicts nested under the parent section.

Section of type "user" represent a section with user-defined content. Those sections may contain text generated by user and have the following additional fields:

  • regularText : regular text of the section, optionally separated by \n to split paragraphs.

  • highlightedText : highlighted text of the section, optionally separated by \n to split paragraphs.

  • sections : list of sub-section dicts nested under the parent section.

Section of type "table_of_contents" represent a table of contents and has no additional attributes.

Attributes
idstr

the id of the template

namestr

the name of the template.

creator_idstr

the id of the user who created the template

creator_usernamestr

username of the user who created the template

org_idstr

the id of the organization the template belongs to

sectionslist of dicts

the sections of the template describing the structure of the document. Section schema is described in Notes section above.

classmethod get_default(template_type=None)

Get a default DataRobot template. This template is used for generating compliance documentation when no template is specified.

Parameters
template_typestr or None

Type of the template. Currently supported values are “normal” and “time_series”

Returns
templateComplianceDocTemplate

the default template object with sections attribute populated with default sections.

Return type

ComplianceDocTemplate

classmethod create_from_json_file(name, path)

Create a template with the specified name and sections in a JSON file.

This is useful when working with sections in a JSON file. Example:

default_template = ComplianceDocTemplate.get_default()
default_template.sections_to_json_file('path/to/example.json')
# ... edit example.json in your editor
my_template = ComplianceDocTemplate.create_from_json_file(
    name='my template',
    path='path/to/example.json'
)
Parameters
namestr

the name of the template. Must be unique for your user.

pathstr

the path to find the JSON file at

Returns
templateComplianceDocTemplate

the created template

Return type

ComplianceDocTemplate

classmethod create(name, sections)

Create a template with the specified name and sections.

Parameters
namestr

the name of the template. Must be unique for your user.

sectionslist

list of section objects

Returns
templateComplianceDocTemplate

the created template

Return type

ComplianceDocTemplate

classmethod get(template_id)

Retrieve a specific template.

Parameters
template_idstr

the id of the template to retrieve

Returns
templateComplianceDocTemplate

the retrieved template

Return type

ComplianceDocTemplate

classmethod list(name_part=None, limit=None, offset=None)

Get a paginated list of compliance documentation template objects.

Parameters
name_partstr or None

Return only the templates with names matching specified string. The matching is case-insensitive.

limitint

The number of records to return. The server will use a (possibly finite) default if not specified.

offsetint

The number of records to skip.

Returns
templateslist of ComplianceDocTemplate

the list of template objects

Return type

List[ComplianceDocTemplate]

sections_to_json_file(path, indent=2)

Save sections of the template to a json file at the specified path

Parameters
pathstr

the path to save the file to

indentint

indentation to use in the json file.

Return type

None

update(name=None, sections=None)

Update the name or sections of an existing doc template.

Note that default or non-existent templates can not be updated.

Parameters
namestr, optional

the new name for the template

sectionslist of dicts

list of sections

Return type

None

delete()

Delete the compliance documentation template.

Return type

None

Confusion Chart

class datarobot.models.confusion_chart.ConfusionChart(source, data, source_model_id)

Confusion Chart data for model.

Notes

ClassMetrics is a dict containing the following:

  • class_name (string) name of the class

  • actual_count (int) number of times this class is seen in the validation data

  • predicted_count (int) number of times this class has been predicted for the validation data

  • f1 (float) F1 score

  • recall (float) recall score

  • precision (float) precision score

  • was_actual_percentages (list of dict) one vs all actual percentages in format specified below.
    • other_class_name (string) the name of the other class

    • percentage (float) the percentage of the times this class was predicted when is was actually class (from 0 to 1)

  • was_predicted_percentages (list of dict) one vs all predicted percentages in format specified below.
    • other_class_name (string) the name of the other class

    • percentage (float) the percentage of the times this class was actual predicted (from 0 to 1)

  • confusion_matrix_one_vs_all (list of list) 2d list representing 2x2 one vs all matrix.
    • This represents the True/False Negative/Positive rates as integer for each class. The data structure looks like:

    • [ [ True Negative, False Positive ], [ False Negative, True Positive ] ]

Attributes
sourcestr

Confusion Chart data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.

raw_datadict

All of the raw data for the Confusion Chart

confusion_matrixlist of list

The N x N confusion matrix

classeslist

The names of each of the classes

class_metricslist of dicts

List of dicts with schema described as ClassMetrics above.

source_model_idstr

ID of the model this Confusion chart represents; in some cases, insights from the parent of a frozen model may be used

Credentials

class datarobot.models.Credential(credential_id=None, name=None, credential_type=None, creation_date=None, description=None)
classmethod list()

Returns list of available credentials.

Returns
credentialslist of Credential instances

contains a list of available credentials.

Examples

>>> import datarobot as dr
>>> data_sources = dr.Credential.list()
>>> data_sources
[
    Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'),
    Credential('5e42cc4dcf8a5f3256865840', 'my_jdbc_cred', 'jdbc'),
]
Return type

List[Credential]

classmethod get(credential_id)

Gets the Credential.

Parameters
credential_idstr

the identifier of the credential.

Returns
credentialCredential

the requested credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.get('5a8ac9ab07a57a0001be501f')
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'),
Return type

Credential

delete()

Deletes the Credential the store.

Parameters
credential_idstr

the identifier of the credential.

Returns
credentialCredential

the requested credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.get('5a8ac9ab07a57a0001be501f')
>>> cred.delete()
Return type

None

classmethod create_basic(name, user, password, description=None)

Creates the credentials.

Parameters
namestr

the name to use for this set of credentials.

userstr

the username to store for this set of credentials.

passwordstr

the password to store for this set of credentials.

descriptionstr, optional

the description to use for this set of credentials.

Returns
credentialCredential

the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_basic(
...     name='my_basic_cred',
...     user='username',
...     password='password',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_basic_cred', 'basic'),
Return type

Credential

classmethod create_oauth(name, token, refresh_token, description=None)

Creates the OAUTH credentials.

Parameters
namestr

the name to use for this set of credentials.

token: str

the OAUTH token

refresh_token: str

The OAUTH token

descriptionstr, optional

the description to use for this set of credentials.

Returns
credentialCredential

the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_oauth(
...     name='my_oauth_cred',
...     token='XXX',
...     refresh_token='YYY',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_oauth_cred', 'oauth'),
Return type

Credential

classmethod create_s3(name, aws_access_key_id=None, aws_secret_access_key=None, aws_session_token=None, config_id=None, description=None)

Creates the S3 credentials.

Parameters
namestr

the name to use for this set of credentials.

aws_access_key_idstr, optional

the AWS access key id.

aws_secret_access_keystr, optional

the AWS secret access key.

aws_session_tokenstr, optional

the AWS session token.

config_id: str, optional

The ID of the saved shared secure configuration. If specified, cannot include awsAccessKeyId, awsSecretAccessKey or awsSessionToken.

descriptionstr, optional

the description to use for this set of credentials.

Returns
credentialCredential

the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_s3(
...     name='my_s3_cred',
...     aws_access_key_id='XXX',
...     aws_secret_access_key='YYY',
...     aws_session_token='ZZZ',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'),
Return type

Credential

classmethod create_azure(name, azure_connection_string, description=None)

Creates the Azure storage credentials.

Parameters
namestr

the name to use for this set of credentials.

azure_connection_stringstr

the Azure connection string.

descriptionstr, optional

the description to use for this set of credentials.

Returns
credentialCredential

the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_azure(
...     name='my_azure_cred',
...     azure_connection_string='XXX',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_azure_cred', 'azure'),
Return type

Credential

classmethod create_snowflake_key_pair(name, user=None, private_key=None, passphrase=None, config_id=None, description=None)

Creates the Snowflake Key Pair credentials.

Parameters
namestr

the name to use for this set of credentials.

user: str, optional

the Snowflake login name

private_key: str, optional

the private key copied exactly from user private key file. Since it contains multiple lines, when assign to a variable, put the key string inside triple-quotes

passphrase: str, optional

the string used to encrypt the private key

config_id: str, optional

The ID of the saved shared secure configuration. If specified, cannot include user, privateKeyStr or passphrase.

descriptionstr, optional

the description to use for this set of credentials.

Returns
credentialCredential

the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_snowflake_key_pair(
...     name='key_pair_cred',
...     user='XXX',
...     private_key='YYY',
...     passphrase='ZZZ',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'key_pair_cred', 'snowflake_key_pair_user_account'),
Return type

Credential

classmethod create_databricks_access_token(name, databricks_access_token, description=None)

Creates the Databricks access token credentials.

Parameters
namestr

the name to use for this set of credentials.

databricks_access_token: str, optional

the Databricks personal access token

descriptionstr, optional

the description to use for this set of credentials.

Returns
credentialCredential

the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_databricks_access_token(
...     name='access_token_cred',
...     databricks_access_token='XXX',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'access_token_cred', 'databricks_access_token_account'),
Return type

Credential

classmethod create_databricks_service_principal(name, client_id=None, client_secret=None, config_id=None, description=None)

Creates the Databricks access token credentials.

Parameters
namestr

the name to use for this set of credentials.

client_id: str, optional

the client ID for Databricks Service Principal

client_secret: str, optional

the client secret for Databricks Service Principal

config_id: str, optional

The ID of the saved shared secure configuration. If specified, cannot include clientId and clientSecret.

descriptionstr, optional

the description to use for this set of credentials.

Returns
credentialCredential

the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_databricks_service_principal(
...     name='svc_principal_cred',
...     client_id='XXX',
...     client_secret='XXX',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'svc_principal_cred', 'databricks_service_principal_account'),
Return type

Credential

classmethod create_gcp(name, gcp_key=None, description=None)

Creates the GCP credentials.

Parameters
namestr

the name to use for this set of credentials.

gcp_keystr | dict

the GCP key in json format or parsed as dict.

descriptionstr, optional

the description to use for this set of credentials.

Returns
credentialCredential

the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_gcp(
...     name='my_gcp_cred',
...     gcp_key='XXX',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_gcp_cred', 'gcp'),
Return type

Credential

update(name=None, description=None, **kwargs)

Update the credential values of an existing credential. Updates this object in place.

New in version v3.2.

Parameters
namestr

The name to use for this set of credentials.

descriptionstr, optional

The description to use for this set of credentials; if omitted, and name is not omitted, then it clears any previous description for that name.

kwargsKeyword arguments specific to the given credential_type that should be updated.
Return type

None

Prediction Environment

class datarobot.models.PredictionEnvironment(id, name, platform, description=None, permissions=None, is_deleted=None, supported_model_formats=None, import_meta=None, management_meta=None, health=None, is_managed_by_management_agent=None, plugin=None, datastore_id=None, credential_id=None)

A prediction environment entity.

New in version v3.3.0.

Attributes
id: str

The ID of the prediction environment.

name: str

The name of the prediction environment.

description: str, optional

The description of the prediction environment.

platform: str, optional

Indicates which platform is in use (AWS, GCP, DataRobot, etc.).

permissions: list, optional

A set of permissions for the prediction environment.

is_deleted: boolean, optional

The flag that shows if this prediction environment deleted.

supported_model_formats: list[PredictionEnvironmentModelFormats], optional

The list of supported model formats.

is_managed_by_management_agentboolean, optional

Determines if the prediction environment should be managed by the management agent. False by default.

datastore_idstr, optional

The ID of the data store connection configuration. Only applicable for external prediction environments managed by DataRobot.

credential_idstr, optional

The ID of the credential associated with the data connection. Only applicable for external prediction environments managed by DataRobot.

classmethod list()

Returns list of available external prediction environments.

Returns
prediction_environmentslist of PredictionEnvironment instances

contains a list of available prediction environments.

Examples

>>> import datarobot as dr
>>> prediction_environments = dr.PredictionEnvironment.list()
>>> prediction_environments
[
    PredictionEnvironment('5e429d6ecf8a5f36c5693e03', 'demo_pe', 'aws', 'env for demo testing'),
    PredictionEnvironment('5e42cc4dcf8a5f3256865840', 'azure_pe', 'azure', 'env for azure demo testing'),
]
Return type

List[PredictionEnvironment]

classmethod get(pe_id)

Gets the PredictionEnvironment by id.

Parameters
pe_idstr

the identifier of the PredictionEnvironment.

Returns
prediction_environmentPredictionEnvironment

the requested prediction environment object.

Examples

>>> import datarobot as dr
>>> pe = dr.PredictionEnvironment.get('5a8ac9ab07a57a1231be501f')
>>> pe
PredictionEnvironment('5a8ac9ab07a57a1231be501f', 'my_predict_env', 'aws', 'demo env'),
Return type

PredictionEnvironment

delete()

Deletes the prediction environment.

Examples

>>> import datarobot as dr
>>> pe = dr.PredictionEnvironment.get('5a8ac9ab07a57a1231be501f')
>>> pe.delete()
Return type

None

classmethod create(name, platform, description=None, plugin=None, supported_model_formats=None, is_managed_by_management_agent=False, datastore=None, credential=None)

Create a prediction environment.

Parameters
namestr

The name of the prediction environment.

descriptionstr, optional

The description of the prediction environment.

platformstr

Indicates which platform is in use (AWS, GCP, DataRobot, etc.).

pluginstr

Optional. The plugin name to use.

supported_model_formatslist[PredictionEnvironmentModelFormats], optional

The list of supported model formats. When not provided, the default value is inferred based on platform, (DataRobot platform: DataRobot, Custom Models; All other platforms: DataRobot, Custom Models, External Models).

is_managed_by_management_agentboolean, optional

Determines if this prediction environment should be managed by the management agent. default: False

datastoreDataStore|str, optional]

The datastore object or ID of the data store connection configuration. Only applicable for external Prediction Environments managed by DataRobot.

credentialCredential|str, optional]

The credential object or ID of the credential associated with the data connection. Only applicable for external Prediction Environments managed by DataRobot.

Returns
prediction_environmentPredictionEnvironment

the prediction environment was created

Raises
datarobot.errors.ClientError

If the server responded with 4xx status.

datarobot.errors.ServerError

If the server responded with 5xx status.

Examples

>>> import datarobot as dr
>>> pe = dr.PredictionEnvironment.create(
...     name='my_predict_env',
...     platform=PredictionEnvironmentPlatform.AWS,
...     description='demo prediction env',
... )
>>> pe
PredictionEnvironment('5e429d6ecf8a5f36c5693e99', 'my_predict_env', 'aws', 'demo prediction env'),
Return type

PredictionEnvironment

Champion Model Package

class datarobot.models.deployment.champion_model_package.ChampionModelPackage(id, registered_model_id, registered_model_version, name, model_id, model_execution_type, is_archived, import_meta, source_meta, model_kind, target, model_description, datasets, timeseries, is_deprecated, bias_and_fairness=None, build_status=None, user_provided_id=None, updated_at=None, updated_by=None, tags=None, mlpkg_file_contents=None)

Represents a champion model package.

Parameters
idstr

The ID of the registered model version.

registered_model_idstr

The ID of the parent registered model.

registered_model_versionint

The version of the registered model.

namestr

The name of the registered model version.

model_idstr

The ID of the model.

model_execution_typestr

The type of model package (version). dedicated (native DataRobot models) and custom_inference_model` (user added inference models) both execute on DataRobot prediction servers, while external does not.

is_archivedbool
Whether the model package (version) is permanently archived (cannot be used in deployment or

replacement).

import_metaImportMeta

Information from when this model package (version) was first saved.

source_metaSourceMeta

Meta information from where the model was generated.

model_kindModelKind

Model attribute information.

targetTarget

Target information for the registered model version.

model_descriptionModelDescription

Model description information.

datasetsDataset

Dataset information for the registered model version.

timeseriesTimeseries

Time series information for the registered model version.

bias_and_fairnessBiasAndFairness

Bias and fairness information for the registered model version.

is_deprecatedbool
Whether the model package (version) is deprecated (cannot be used in deployment or

replacement).

build_statusstr or None

Model package (version) build status. One of complete, inProgress, failed.

user_provided_idstr or None

User provided ID for the registered model version.

updated_atstr or None

The time the registered model version was last updated.

updated_byUserMetadata or None

The user who last updated the registered model version.

tagsList[TagWithId] or None

The tags associated with the registered model version.

mlpkg_file_contentsstr or None

The contents of the model package file.

Custom Metrics

class datarobot.models.deployment.custom_metrics.CustomMetric(id, name, units, baseline_values, is_model_specific, type, directionality, time_step='hour', description=None, association_id=None, value=None, sample_count=None, timestamp=None, batch=None, deployment_id=None)

A DataRobot custom metric.

New in version v3.4.

Attributes
id: str

The ID of the custom metric.

deployment_id: str

The ID of the deployment.

name: str

The name of the custom metric.

units: str

The units, or the y-axis label, of the given custom metric.

baseline_values: BaselinesValues

The baseline value used to add “reference dots” to the values over time chart.

is_model_specific: bool

Determines whether the metric is related to the model or deployment.

type: CustomMetricAggregationType

The aggregation type of the custom metric.

directionality: CustomMetricDirectionality

The directionality of the custom metric.

time_step: CustomMetricBucketTimeStep

Custom metric time bucket size.

description: str

A description of the custom metric.

association_id: DatasetColumn

A custom metric association_id column source when reading values from columnar dataset.

timestamp: DatasetColumn

A custom metric timestamp column source when reading values from columnar dataset.

value: DatasetColumn

A custom metric value source when reading values from columnar dataset.

sample_count: DatasetColumn

A custom metric sample source when reading values from columnar dataset.

batch: str

A custom metric batch ID source when reading values from columnar dataset.

classmethod create(name, deployment_id, units, is_model_specific, aggregation_type, directionality, time_step='hour', description=None, baseline_value=None, value_column_name=None, sample_count_column_name=None, timestamp_column_name=None, timestamp_format=None, batch_column_name=None)

Create a custom metric for a deployment

Parameters
name: str

The name of the custom metric.

deployment_id: str

The id of the deployment.

units: str

The units, or the y-axis label, of the given custom metric.

baseline_value: float

The baseline value used to add “reference dots” to the values over time chart.

is_model_specific: bool

Determines whether the metric is related to the model or deployment.

aggregation_type: CustomMetricAggregationType

The aggregation type of the custom metric.

directionality: CustomMetricDirectionality

The directionality of the custom metric.

time_step: CustomMetricBucketTimeStep

Custom metric time bucket size.

description: Optional[str]

A description of the custom metric.

value_column_name: Optional[str]

A custom metric value column name when reading values from columnar dataset.

sample_count_column_name: Optional[str]

Points to a weight column name if users provide pre-aggregated metric values from columnar dataset.

timestamp_column_name: Optional[str]

A custom metric timestamp column name when reading values from columnar dataset.

timestamp_format: Optional[str]

A custom metric timestamp format when reading values from columnar dataset.

batch_column_name: Optional[str]

A custom metric batch ID column name when reading values from columnar dataset.

Returns
CustomMetric

The custom metric object.

Examples

from datarobot.models.deployment import CustomMetric
from datarobot.enums import CustomMetricAggregationType, CustomMetricDirectionality

custom_metric = CustomMetric.create(
    deployment_id="5c939e08962d741e34f609f0",
    name="Sample metric",
    units="Y",
    baseline_value=12,
    is_model_specific=True,
    aggregation_type=CustomMetricAggregationType.AVERAGE,
    directionality=CustomMetricDirectionality.HIGHER_IS_BETTER
    )
Return type

CustomMetric

classmethod get(deployment_id, custom_metric_id)

Get a custom metric for a deployment

Parameters
deployment_id: str

The ID of the deployment.

custom_metric_id: str

The ID of the custom metric.

Returns
CustomMetric

The custom metric object.

Examples

from datarobot.models.deployment import CustomMetric

custom_metric = CustomMetric.get(
    deployment_id="5c939e08962d741e34f609f0",
    custom_metric_id="65f17bdcd2d66683cdfc1113"
)

custom_metric.id
>>>'65f17bdcd2d66683cdfc1113'
Return type

CustomMetric

classmethod list(deployment_id)

List all custom metrics for a deployment

Parameters
deployment_id: str

The ID of the deployment.

Returns
custom_metrics: list

A list of custom metrics objects.

Examples

from datarobot.models.deployment import CustomMetric

custom_metrics = CustomMetric.list(deployment_id="5c939e08962d741e34f609f0")
custom_metrics[0].id
>>>'65f17bdcd2d66683cdfc1113'
Return type

List[CustomMetric]

classmethod delete(deployment_id, custom_metric_id)

Delete a custom metric associated with a deployment.

Parameters
deployment_id: str

The ID of the deployment.

custom_metric_id: str

The ID of the custom metric.

Returns
None

Examples

from datarobot.models.deployment import CustomMetric

CustomMetric.delete(
    deployment_id="5c939e08962d741e34f609f0",
    custom_metric_id="65f17bdcd2d66683cdfc1113"
)
Return type

None

update(name=None, units=None, aggregation_type=None, directionality=None, time_step=None, description=None, baseline_value=None, value_column_name=None, sample_count_column_name=None, timestamp_column_name=None, timestamp_format=None, batch_column_name=None)

Update metadata of a custom metric

Parameters
name: Optional[str]

The name of the custom metric.

units: Optional[str]

The units, or the y-axis label, of the given custom metric.

baseline_value: Optional[float]

The baseline value used to add “reference dots” to the values over time chart.

aggregation_type: Optional[CustomMetricAggregationType]

The aggregation type of the custom metric.

directionality: Optional[CustomMetricDirectionality]

The directionality of the custom metric.

time_step: Optional[CustomMetricBucketTimeStep]

Custom metric time bucket size.

description: Optional[str]

A description of the custom metric.

value_column_name: Optional[str]

A custom metric value column name when reading values from columnar dataset.

sample_count_column_name: Optional[str]

Points to a weight column name if users provide pre-aggregated metric values from columnar dataset.

timestamp_column_name: Optional[str]

A custom metric timestamp column name when reading values from columnar dataset.

timestamp_format: Optional[str]

A custom metric timestamp format when reading values from columnar dataset.

batch_column_name: Optional[str]

A custom metric batch ID column name when reading values from columnar dataset.

Returns
CustomMetric

The custom metric object.

Examples

from datarobot.models.deployment import CustomMetric
from datarobot.enums import CustomMetricAggregationType, CustomMetricDirectionality

custom_metric = CustomMetric.get(
    deployment_id="5c939e08962d741e34f609f0",
    custom_metric_id="65f17bdcd2d66683cdfc1113"
)
custom_metric = custom_metric.update(
    deployment_id="5c939e08962d741e34f609f0",
    name="Sample metric",
    units="Y",
    baseline_value=12,
    is_model_specific=True,
    aggregation_type=CustomMetricAggregationType.AVERAGE,
    directionality=CustomMetricDirectionality.HIGHER_IS_BETTER
    )
Return type

CustomMetric

unset_baseline()

Unset the baseline value of a custom metric

Returns
None

Examples

from datarobot.models.deployment import CustomMetric
from datarobot.enums import CustomMetricAggregationType, CustomMetricDirectionality

custom_metric = CustomMetric.get(
    deployment_id="5c939e08962d741e34f609f0",
    custom_metric_id="65f17bdcd2d66683cdfc1113"
)
custom_metric.baseline_values
>>> [{'value': 12.0}]
custom_metric.unset_baseline()
custom_metric.baseline_values
>>> []
Return type

None

submit_values(data, model_id=None, model_package_id=None, dry_run=False, segments=None)

Submit aggregated custom metrics values from JSON.

Parameters
data: pd.DataFrame or List[CustomMetricBucket]

The data containing aggregated custom metric values.

model_id: Optional[str]

For a model metric: the ID of the associated champion/challenger model, used to update the metric values. For a deployment metric: the ID of the model is not needed.

model_package_id: Optional[str]

For a model metric: the ID of the associated champion/challenger model, used to update the metric values. For a deployment metric: the ID of the model package is not needed.

dry_run: Optional[bool]

Specifies whether or not metric data is submitted in production mode (where data is saved).

segments: Optional[CustomMetricSegmentFromJSON]

A list of segments for a custom metric used in segmented analysis.

Returns
None

Examples

from datarobot.models.deployment import CustomMetric

custom_metric = CustomMetric.get(
    deployment_id="5c939e08962d741e34f609f0",
    custom_metric_id="65f17bdcd2d66683cdfc1113"
)

# data for values over time
data = [{
    'value': 12,
    'sample_size': 3,
    'timestamp': '2024-03-15T14:00:00'
}]

# data witch association ID
data = [{
    'value': 12,
    'sample_size': 3,
    'timestamp': '2024-03-15T14:00:00',
    'association_id': '65f44d04dbe192b552e752ed'
}]

# data for batches
data = [{
    'value': 12,
    'sample_size': 3,
    'batch': '65f44c93fedc5de16b673a0d'
}]

# for deployment specific metrics
custom_metric.submit_values(data=data)

# for model specific metrics pass model_package_id or model_id
custom_metric.submit_values(data=data, model_package_id="6421df32525c58cc6f991f25")

# dry run
custom_metric.submit_values(data=data, model_package_id="6421df32525c58cc6f991f25", dry_run=True)

# for segmented analysis
segments = [{"name": "custom_seg", "value": "val_1"}]
custom_metric.submit_values(data=data, model_package_id="6421df32525c58cc6f991f25", segments=segments)
Return type

None

submit_single_value(value, model_id=None, model_package_id=None, dry_run=False, segments=None)

Submit a single custom metric value at the current moment.

Parameters
value: float

Single numeric custom metric value.

model_id: Optional[str]

For a model metric: the ID of the associated champion/challenger model, used to update the metric values. For a deployment metric: the ID of the model is not needed.

model_package_id: Optional[str]

For a model metric: the ID of the associated champion/challenger model, used to update the metric values. For a deployment metric: the ID of the model package is not needed.

dry_run: Optional[bool]

Specifies whether or not metric data is submitted in production mode (where data is saved).

segments: Optional[CustomMetricSegmentFromJSON]

A list of segments for a custom metric used in segmented analysis.

Returns
None

Examples

from datarobot.models.deployment import CustomMetric

custom_metric = CustomMetric.get(
    deployment_id="5c939e08962d741e34f609f0",
    custom_metric_id="65f17bdcd2d66683cdfc1113"
)

# for deployment specific metrics
custom_metric.submit_single_value(value=121)

# for model specific metrics pass model_package_id or model_id
custom_metric.submit_single_value(value=121, model_package_id="6421df32525c58cc6f991f25")

# dry run
custom_metric.submit_single_value(value=121, model_package_id="6421df32525c58cc6f991f25", dry_run=True)

# for segmented analysis
segments = [{"name": "custom_seg", "value": "val_1"}]
custom_metric.submit_single_value(value=121, model_package_id="6421df32525c58cc6f991f25", segments=segments)
Return type

None

submit_values_from_catalog(dataset_id, model_id=None, model_package_id=None, batch_id=None, segments=None)

Submit aggregated custom metrics values from dataset (AI catalog). The names of the columns in the dataset should correspond to the names of the columns that were defined in the custom metric. In addition, the format of the timestamps should also be the same as defined in the metric.

Parameters
dataset_id: str

The ID of the source dataset.

model_id: Optional[str]

For a model metric: the ID of the associated champion/challenger model, used to update the metric values. For a deployment metric: the ID of the model is not needed.

model_package_id: Optional[str]

For a model metric: the ID of the associated champion/challenger model, used to update the metric values. For a deployment metric: the ID of the model package is not needed.

batch_id: Optional[str]

Specifies a batch ID associated with all values provided by this dataset, an alternative to providing batch IDs as a column within a dataset (at the record level).

segments: Optional[CustomMetricSegmentFromDataset]

A list of segments for a custom metric used in segmented analysis.

Returns
None

Examples

from datarobot.models.deployment import CustomMetric

custom_metric = CustomMetric.get(
    deployment_id="5c939e08962d741e34f609f0",
    custom_metric_id="65f17bdcd2d66683cdfc1113"
)

# for deployment specific metrics
custom_metric.submit_values_from_catalog(dataset_id="61093144cabd630828bca321")

# for model specific metrics pass model_package_id or model_id
custom_metric.submit_values_from_catalog(
    dataset_id="61093144cabd630828bca321",
    model_package_id="6421df32525c58cc6f991f25"
)

# for segmented analysis
segments = [{"name": "custom_seg", "column": "column_with_segment_values"}]
custom_metric.submit_values_from_catalog(
    dataset_id="61093144cabd630828bca321",
    model_package_id="6421df32525c58cc6f991f25",
    segments=segments
)
Return type

None

get_values_over_time(start, end, model_package_id=None, model_id=None, segment_attribute=None, segment_value=None, bucket_size='P7D')

Retrieve values of a single custom metric over a time period.

Parameters
start: datetime or str

Start of the time period.

end: datetime or str

End of the time period.

model_id: Optional[str]

The ID of the model.

model_package_id: Optional[str]

The ID of the model package.

bucket_size: Optional[str]

Time duration of a bucket, in ISO 8601 time duration format.

segment_attribute: Optional[str]

The name of the segment on which segment analysis is being performed.

segment_value: Optional[str]

The value of the segment_attribute to segment on.

Returns
custom_metric_over_time: CustomMetricValuesOverTime

The queried custom metric values over time information.

Examples

from datarobot.models.deployment import CustomMetric
from datetime import datetime, timedelta

now=datetime.now()
custom_metric = CustomMetric.get(
    deployment_id="5c939e08962d741e34f609f0",
    custom_metric_id="65f17bdcd2d66683cdfc1113"
)
values_over_time = custom_metric.get_values_over_time(start=now - timedelta(days=7), end=now)

values_over_time.bucket_values
>>> {datetime.datetime(2024, 3, 22, 14, 0, tzinfo=tzutc()): 1.0,
>>> datetime.datetime(2024, 3, 22, 15, 0, tzinfo=tzutc()): 123.0}}

values_over_time.bucket_sample_sizes
>>> {datetime.datetime(2024, 3, 22, 14, 0, tzinfo=tzutc()): 1,
>>>  datetime.datetime(2024, 3, 22, 15, 0, tzinfo=tzutc()): 1}}

values_over_time.get_buckets_as_dataframe()
>>>                        start                       end  value  sample_size
>>> 0  2024-03-21 16:00:00+00:00 2024-03-21 17:00:00+00:00    NaN          NaN
>>> 1  2024-03-21 17:00:00+00:00 2024-03-21 18:00:00+00:00    NaN          NaN
Return type

CustomMetricValuesOverTime

get_summary(start, end, model_package_id=None, model_id=None, segment_attribute=None, segment_value=None)

Retrieve the summary of a custom metric over a time period.

Parameters
start: datetime or str

Start of the time period.

end: datetime or str

End of the time period.

model_id: Optional[str]

The ID of the model.

model_package_id: Optional[str]

The ID of the model package.

segment_attribute: Optional[str]

The name of the segment on which segment analysis is being performed.

segment_value: Optional[str]

The value of the segment_attribute to segment on.

Returns
custom_metric_summary: CustomMetricSummary

The summary of the custom metric.

Examples

from datarobot.models.deployment import CustomMetric
from datetime import datetime, timedelta

now=datetime.now()
custom_metric = CustomMetric.get(
    deployment_id="5c939e08962d741e34f609f0",
    custom_metric_id="65f17bdcd2d66683cdfc1113"
)
summary = custom_metric.get_summary(start=now - timedelta(days=7), end=now)

print(summary)
>> "CustomMetricSummary(2024-03-21 15:52:13.392178+00:00 - 2024-03-22 15:52:13.392168+00:00:
{'id': '65fd9b1c0c1a840bc6751ce0', 'name': 'Test METRIC', 'value': 215.0, 'sample_count': 13,
'baseline_value': 12.0, 'percent_change': 24.02})"
Return type

CustomMetricSummary

get_values_over_batch(batch_ids=None, model_package_id=None, model_id=None, segment_attribute=None, segment_value=None)

Retrieve values of a single custom metric over batches.

Parameters
batch_idsOptional[List[str]]

Specify a list of batch IDs to pull the data for.

model_id: Optional[str]

The ID of the model.

model_package_id: Optional[str]

The ID of the model package.

segment_attribute: Optional[str]

The name of the segment on which segment analysis is being performed.

segment_value: Optional[str]

The value of the segment_attribute to segment on.

Returns
custom_metric_over_batch: CustomMetricValuesOverBatch

The queried custom metric values over batch information.

Examples

from datarobot.models.deployment import CustomMetric

custom_metric = CustomMetric.get(
    deployment_id="5c939e08962d741e34f609f0",
    custom_metric_id="65f17bdcd2d66683cdfc1113"
)
# all batch metrics all model specific
values_over_batch = custom_metric.get_values_over_batch(model_package_id='6421df32525c58cc6f991f25')

values_over_batch.bucket_values
>>> {'6572db2c9f9d4ad3b9de33d0': 35.0, '6572db2c9f9d4ad3b9de44e1': 105.0}

values_over_batch.bucket_sample_sizes
>>> {'6572db2c9f9d4ad3b9de33d0': 6, '6572db2c9f9d4ad3b9de44e1': 8}

values_over_batch.get_buckets_as_dataframe()
>>>                    batch_id                     batch_name  value  sample_size
>>> 0  6572db2c9f9d4ad3b9de33d0  Batch 1 - 03/26/2024 13:04:46   35.0            6
>>> 1  6572db2c9f9d4ad3b9de44e1  Batch 2 - 03/26/2024 13:06:04  105.0            8
Return type

CustomMetricValuesOverBatch

get_batch_summary(batch_ids=None, model_package_id=None, model_id=None, segment_attribute=None, segment_value=None)

Retrieve the summary of a custom metric over a batch.

Parameters
batch_idsOptional[List[str]]

Specify a list of batch IDs to pull the data for.

model_id: Optional[str]

The ID of the model.

model_package_id: Optional[str]

The ID of the model package.

segment_attribute: Optional[str]

The name of the segment on which segment analysis is being performed.

segment_value: Optional[str]

The value of the segment_attribute to segment on.

Returns
custom_metric_summary: CustomMetricBatchSummary

The batch summary of the custom metric.

Examples

from datarobot.models.deployment import CustomMetric

custom_metric = CustomMetric.get(
    deployment_id="5c939e08962d741e34f609f0",
    custom_metric_id="65f17bdcd2d66683cdfc1113"
)
# all batch metrics all model specific
batch_summary = custom_metric.get_batch_summary(model_package_id='6421df32525c58cc6f991f25')

print(batch_summary)
>> CustomMetricBatchSummary({'id': '6605396413434b3a7b74342c', 'name': 'batch metric', 'value': 41.25,
'sample_count': 28, 'baseline_value': 123.0, 'percent_change': -66.46})
Return type

CustomMetricBatchSummary

class datarobot.models.deployment.custom_metrics.CustomMetricValuesOverTime(buckets=None, summary=None, metric=None, deployment_id=None, segment_attribute=None, segment_value=None)

Custom metric over time information.

New in version v3.4.

Attributes
buckets: List[Bucket]

A list of bucketed time periods and the custom metric values aggregated over that period.

summary: Summary

The summary of values over time retrieval.

metric: Dict

A custom metric definition.

deployment_id: str

The ID of the deployment.

segment_attribute: str

The name of the segment on which segment analysis is being performed.

segment_value: str

The value of the segment_attribute to segment on.

classmethod get(deployment_id, custom_metric_id, start, end, model_id=None, model_package_id=None, segment_attribute=None, segment_value=None, bucket_size='P7D')

Retrieve values of a single custom metric over a time period.

Parameters
custom_metric_id: str

The ID of the custom metric.

deployment_id: str

The ID of the deployment.

start: datetime or str

Start of the time period.

end: datetime or str

End of the time period.

model_id: Optional[str]

The ID of the model.

model_package_id: Optional[str]

The ID of the model package.

bucket_size: Optional[str]

Time duration of a bucket, in ISO 8601 time duration format.

segment_attribute: Optional[str]

The name of the segment on which segment analysis is being performed.

segment_value: Optional[str]

The value of the segment_attribute to segment on.

Returns
custom_metric_over_time: CustomMetricValuesOverTime

The queried custom metric values over time information.

Return type

CustomMetricValuesOverTime

property bucket_values: Dict[datetime, int]

The metric value for all time buckets, keyed by start time of the bucket.

Returns
bucket_values: Dict
Return type

Dict[datetime, int]

property bucket_sample_sizes: Dict[datetime, int]

The sample size for all time buckets, keyed by start time of the bucket.

Returns
bucket_sample_sizes: Dict
Return type

Dict[datetime, int]

get_buckets_as_dataframe()

Retrieves all custom metrics buckets in a pandas DataFrame.

Returns
buckets: pd.DataFrame
Return type

DataFrame

class datarobot.models.deployment.custom_metrics.CustomMetricSummary(period, metric, deployment_id=None)

The summary of a custom metric.

New in version v3.4.

Attributes
period: Period

A time period defined by a start and end tie

metric: Dict

The summary of the custom metric.

classmethod get(deployment_id, custom_metric_id, start, end, model_id=None, model_package_id=None, segment_attribute=None, segment_value=None)

Retrieve the summary of a custom metric over a time period.

Parameters
custom_metric_id: str

The ID of the custom metric.

deployment_id: str

The ID of the deployment.

start: datetime or str

Start of the time period.

end: datetime or str

End of the time period.

model_id: Optional[str]

The ID of the model.

model_package_id: Optional[str]

The ID of the model package.

segment_attribute: Optional[str]

The name of the segment on which segment analysis is being performed.

segment_value: Optional[str]

The value of the segment_attribute to segment on.

Returns
custom_metric_summary: CustomMetricSummary

The summary of the custom metric.

Return type

CustomMetricSummary

class datarobot.models.deployment.custom_metrics.CustomMetricValuesOverBatch(buckets=None, metric=None, deployment_id=None, segment_attribute=None, segment_value=None)

Custom metric over batch information.

New in version v3.4.

Attributes
buckets: List[BatchBucket]

A list of buckets with custom metric values aggregated over batches.

metric: Dict

A custom metric definition.

deployment_id: str

The ID of the deployment.

segment_attribute: str

The name of the segment on which segment analysis is being performed.

segment_value: str

The value of the segment_attribute to segment on.

classmethod get(deployment_id, custom_metric_id, batch_ids=None, model_id=None, model_package_id=None, segment_attribute=None, segment_value=None)

Retrieve values of a single custom metric over batches.

Parameters
custom_metric_id: str

The ID of the custom metric.

deployment_id: str

The ID of the deployment.

batch_idsOptional[List[str]]

Specify a list of batch IDs to pull the data for.

model_id: Optional[str]

The ID of the model.

model_package_id: Optional[str]

The ID of the model package.

segment_attribute: Optional[str]

The name of the segment on which segment analysis is being performed.

segment_value: Optional[str]

The value of the segment_attribute to segment on.

Returns
custom_metric_over_batch: CustomMetricValuesOverBatch

The queried custom metric values over batch information.

Return type

CustomMetricValuesOverBatch

property bucket_values: Dict[str, int]

The metric value for all batch buckets, keyed by batch ID

Returns
bucket_values: Dict
Return type

Dict[str, int]

property bucket_sample_sizes: Dict[str, int]

The sample size for all batch buckets, keyed by batch ID.

Returns
bucket_sample_sizes: Dict
Return type

Dict[str, int]

get_buckets_as_dataframe()

Retrieves all custom metrics buckets in a pandas DataFrame.

Returns
buckets: pd.DataFrame
Return type

DataFrame

class datarobot.models.deployment.custom_metrics.CustomMetricBatchSummary(metric, deployment_id=None)

The batch summary of a custom metric.

New in version v3.4.

Attributes
metric: Dict

The summary of the batch custom metric.

classmethod get(deployment_id, custom_metric_id, batch_ids=None, model_id=None, model_package_id=None, segment_attribute=None, segment_value=None)

Retrieve the summary of a custom metric over a batch.

Parameters
custom_metric_id: str

The ID of the custom metric.

deployment_id: str

The ID of the deployment.

batch_idsOptional[List[str]]

Specify a list of batch IDs to pull the data for.

model_id: Optional[str]

The ID of the model.

model_package_id: Optional[str]

The ID of the model package.

segment_attribute: Optional[str]

The name of the segment on which segment analysis is being performed.

segment_value: Optional[str]

The value of the segment_attribute to segment on.

Returns
custom_metric_summary: CustomMetricBatchSummary

The batch summary of the custom metric.

Return type

CustomMetricBatchSummary

class datarobot.models.deployment.custom_metrics.HostedCustomMetricTemplate(id, name, description, custom_metric_metadata, default_environment, items, template_metric_type)

Template for hosted custom metric.

classmethod list(search=None, order_by=None, metric_type=None, offset=None, limit=None)

List all hosted custom metric templates.

Parameters
search: Optional[str]

Search string.

order_by: Optional[ListHostedCustomMetricTemplatesSortQueryParams]

Ordering field.

metric_type: Optional[HostedCustomMetricsTemplateMetricTypeQueryParams]

Type of the metric.

offset: Optional[int]

Offset for pagination.

limit: Optional[int]

Limit for pagination.

Returns
templates: List[HostedCustomMetricTemplate]
Return type

List[HostedCustomMetricTemplate]

classmethod get(template_id)

Get a hosted custom metric template by ID.

Parameters
template_id: str

ID of the template.

Returns
templateHostedCustomMetricTemplate
Return type

HostedCustomMetricTemplate

class datarobot.models.deployment.custom_metrics.HostedCustomMetric(id, deployment, units, type, is_model_specific, directionality, time_step, created_at, created_by, name, custom_job_id, description=None, schedule=None, baseline_values=None, timestamp=None, value=None, sample_count=None, batch=None, parameter_overrides=None)

Hosted custom metric.

classmethod list(job_id, skip=None, limit=None)

List all hosted custom metrics for a job.

Parameters
job_id: str

ID of the job.

Returns
metrics: List[HostedCustomMetric]
Return type

List[HostedCustomMetric]

classmethod create_from_template(template_id, deployment_id, job_name, custom_metric_name, job_description=None, custom_metric_description=None, sidecar_deployment_id=None, baseline_value=None, timestamp=None, value=None, sample_count=None, batch=None, schedule=None, parameter_overrides=None)

Create a hosted custom metric from a template. A shortcut for 2 calls: Job.from_custom_metric_template(template_id) HostedCustomMetrics.create_from_custom_job()

Parameters
template_id: str

ID of the template.

deployment_id: str

ID of the deployment.

job_name: str

Name of the job.

custom_metric_name: str

Name of the metric.

job_description: Optional[str]

Description of the job.

custom_metric_description: Optional[str]

Description of the metric.

sidecar_deployment_id: Optional[str]

ID of the sidecar deployment.

baseline_value: Optional[float]

Baseline value.

timestamp: Optional[MetricTimestampSpoofing]

Timestamp details.

value: Optional[ValueField]

Value details.

sample_count: Optional[SampleCountField]

Sample count details.

batch: Optional[BatchField]

Batch details.

schedule: Optional[Schedule]

Schedule details.

parameter_overrides: Optional[List[RuntimeParameterValue]]

Parameter overrides.

Returns
metric: HostedCustomMetric
Return type

HostedCustomMetric

classmethod create_from_custom_job(custom_job_id, deployment_id, name, description=None, baseline_value=None, timestamp=None, value=None, sample_count=None, batch=None, schedule=None, parameter_overrides=None)

Create a hosted custom metric from existing custom job.

Parameters
custom_job_id: str

ID of the custom job.

deployment_id: str

ID of the deployment.

name: str

Name of the metric.

description: Optional[str]

Description of the metric.

baseline_value: Optional[float]

Baseline value.

timestamp: Optional[MetricTimestampSpoofing]

Timestamp details.

value: Optional[ValueField]

Value details.

sample_count: Optional[SampleCountField]

Sample count details.

batch: Optional[BatchField]

Batch details.

schedule: Optional[Schedule]

Schedule details.

parameter_overrides: Optional[List[RuntimeParameterValue]]

Parameter overrides.

Returns
metric: HostedCustomMetric
Return type

HostedCustomMetric

update(name=None, description=None, units=None, directionality=None, aggregation_type=None, baseline_value=None, timestamp=None, value=None, sample_count=None, batch=None, schedule=None, parameter_overrides=None)

Update the hosted custom metric.

Parameters
name: Optional[str]

Name of the metric.

description: Optional[str]

Description of the metric.

units: Optional[str]

Units of the metric.

directionality: Optional[str]

Directionality of the metric.

aggregation_type: Optional[CustomMetricAggregationType]

Aggregation type of the metric.

baseline_value: Optional[float]

Baseline values.

timestamp: Optional[MetricTimestampSpoofing]

Timestamp details.

value: Optional[ValueField]

Value details.

sample_count: Optional[SampleCountField]

Sample count details.

batch: Optional[BatchField]

Batch details.

schedule: Optional[Schedule]

Schedule details.

parameter_overrides: Optional[List[RuntimeParameterValue]]

Parameter overrides.

Returns
metric: HostedCustomMetric
Return type

HostedCustomMetric

delete()

Delete the hosted custom metric.

Return type

None

class datarobot.models.deployment.custom_metrics.DeploymentDetails(id, name, creator_first_name=None, creator_last_name=None, creator_username=None, creator_gravatar_hash=None, created_at=None)

Information about a hosted custom metric deployment.

class datarobot.models.deployment.custom_metrics.MetricBaselineValue(value)

The baseline values for a custom metric.

class datarobot.models.deployment.custom_metrics.SampleCountField(column_name)

A weight column used with columnar datasets if pre-aggregated metric values are provided.

class datarobot.models.deployment.custom_metrics.ValueField(column_name)

A custom metric value source for when reading values from a columnar dataset like a file.

class datarobot.models.deployment.custom_metrics.MetricTimestampSpoofing(column_name, time_format=None)

Custom metric timestamp spoofing. Occurs when reading values from a file, like a dataset. By default, replicates pd.to_datetime formatting behavior.

class datarobot.models.deployment.custom_metrics.BatchField(column_name)

A custom metric batch ID source for when reading values from a columnar dataset like a file.

class datarobot.models.deployment.custom_metrics.HostedCustomMetricBlueprint(id, directionality, units, type, time_step, is_model_specific, custom_job_id, created_at, updated_at, created_by, updated_by)

Hosted custom metric blueprints provide an option to share custom metric settings between multiple custom metrics sharing the same custom jobs. When a custom job of a hosted custom metric type is connected to the deployment, all the custom metric parameters from the blueprint are automatically copied.

classmethod get(custom_job_id)

Get a hosted custom metric blueprint.

Parameters
custom_job_id: str

ID of the custom job.

Returns
blueprint: HostedCustomMetricBlueprint
Return type

HostedCustomMetricBlueprint

classmethod create(custom_job_id, directionality, units, type, time_step, is_model_specific)

Create a hosted custom metric blueprint.

Parameters
custom_job_id: str

ID of the custom job.

directionality: str

Directionality of the metric.

units: str

Units of the metric.

type: str

Type of the metric.

time_step: str

Time step of the metric.

is_model_specific: bool

Whether the metric is model specific.

Returns
blueprint: HostedCustomMetricBlueprint
Return type

HostedCustomMetricBlueprint

update(directionality=None, units=None, type=None, time_step=None, is_model_specific=None)

Update a hosted custom metric blueprint.

Parameters
directionality: Optional[str]

Directionality of the metric.

units: Optional[str]

Units of the metric.

type: Optional[str]

Type of the metric.

time_step: Optional[str]

Time step of the metric.

is_model_specific: Optional[bool]

Determines whether the metric is model specific.

Returns
updated_blueprint: HostedCustomMetricBlueprint
Return type

HostedCustomMetricBlueprint

Registry Jobs

class datarobot.models.registry.job.Job(id, name, created_at, items, description=None, environment_id=None, environment_version_id=None, entry_point=None, runtime_parameters=None)

A DataRobot job.

New in version v3.4.

Attributes
id: str

The ID of the job.

name: str

The name of the job.

created_at: str

ISO-8601 formatted timestamp of when the version was created

items: List[JobFileItem]

A list of file items attached to the job.

description: str, optional

A job description.

environment_id: str, optional

The ID of the environment to use with the job.

environment_version_id: str, optional

The ID of the environment version to use with the job.

classmethod create(name, environment_id=None, environment_version_id=None, folder_path=None, files=None, file_data=None, runtime_parameter_values=None)

Create a job.

New in version v3.4.

Parameters
name: str

The name of the job.

environment_id: Optional[str]

The environment ID to use for job runs. The ID must be specified in order to run the job.

environment_version_id: Optional[str]

The environment version ID to use for job runs. If not specified, the latest version of the execution environment will be used.

folder_path: Optional[str]

The path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path.

files: Optional[Union[List[Tuple[str, str]], List[str]]]

The files to be uploaded to the job. The files can be defined in 2 ways: 1. List of tuples where 1st element is the local path of the file to be uploaded and the 2nd element is the file path in the job file system. 2. List of local paths of the files to be uploaded. In this case files are added to the root of the model file system.

file_data: Optional[Dict[str, str]]

The files content to be uploaded to the job. Defined as a dictionary where keys are the file paths in the job file system. and values are the files content.

runtime_parameter_values: Optional[List[RuntimeParameterValue]]

Additional parameters to be injected into a model at runtime. The fieldName must match a fieldName that is listed in the runtimeParameterDefinitions section of the model-metadata.yaml file.

Returns
Job

created job

Raises
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

Return type

Job

classmethod list()

List jobs.

New in version v3.4.

Returns
List[Job]

a list of jobs

Raises
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

Return type

List[Job]

classmethod get(job_id)

Get job by id.

New in version v3.4.

Parameters
job_id: str

The ID of the job.

Returns
Job

retrieved job

Raises
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

Return type

Job

update(name=None, entry_point=None, environment_id=None, environment_version_id=None, description=None, folder_path=None, files=None, file_data=None, runtime_parameter_values=None)

Update job properties.

New in version v3.4.

Parameters
name: str

The job name.

entry_point: Optional[str]

The job file item ID to use as an entry point of the job.

environment_id: Optional[str]

The environment ID to use for job runs. Must be specified in order to run the job.

environment_version_id: Optional[str]

The environment version ID to use for job runs. If not specified, the latest version of the execution environment will be used.

description: str

The job description.

folder_path: Optional[str]

The path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path.

files: Optional[Union[List[Tuple[str, str]], List[str]]]

The files to be uploaded to the job. The files can be defined in 2 ways: 1. List of tuples where 1st element is the local path of the file to be uploaded and the 2nd element is the file path in the job file system. 2. List of local paths of the files to be uploaded. In this case files are added to the root of the job file system.

file_data: Optional[Dict[str, str]]

The files content to be uploaded to the job. Defined as a dictionary where keys are the file paths in the job file system. and values are the files content.

runtime_parameter_values: Optional[List[RuntimeParameterValue]]

Additional parameters to be injected into a model at runtime. The fieldName must match a fieldName that is listed in the runtimeParameterDefinitions section of the model-metadata.yaml file.

Raises
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

Return type

None

delete()

Delete job.

New in version v3.4.

Raises
datarobot.errors.ClientError

If the server responded with 4xx status.

datarobot.errors.ServerError

If the server responded with 5xx status.

Return type

None

refresh()

Update job with the latest data from server.

New in version v3.4.

Raises
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

Return type

None

Create a job from a custom metric gallery template.

Parameters
template_id: str

ID of the template.

name: str

Name of the job.

description: Optional[str]

Description of the job.

sidecar_deployment_id: Optional[str]

ID of the sidecar deployment. Only relevant for templates that use sidecar deployments.

Returns
Job

retrieved job

Raises
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

Return type

Job

list_schedules()

List schedules for the job.

Returns
List[JobSchedule]

a list of schedules for the job.

Return type

List[JobSchedule]

class datarobot.models.registry.job.JobFileItem(id, file_name, file_path, file_source, created_at)

A file item attached to a DataRobot job.

New in version v3.4.

Attributes
id: str

The ID of the file item.

file_name: str

The name of the file item.

file_path: str

The path of the file item.

file_source: str

The source of the file item.

created_at: str

ISO-8601 formatted timestamp of when the version was created.

class datarobot.models.registry.job_run.JobRun(id, custom_job_id, created_at, items, status, duration, description=None, runtime_parameters=None)

A DataRobot job run.

New in version v3.4.

Attributes
id: str

The ID of the job run.

custom_job_id: str

The ID of the parent job.

description: str

A description of the job run.

created_at: str

ISO-8601 formatted timestamp of when the version was created

items: List[JobFileItem]

A list of file items attached to the job.

status: JobRunStatus

The status of the job run.

duration: float

The duration of the job run.

classmethod create(job_id, max_wait=600, runtime_parameter_values=None)

Create a job run.

New in version v3.4.

Parameters
job_id: str

The ID of the job.

max_wait: int, optional

max time to wait for a terminal status (“succeeded”, “failed”, “interrupted”, “canceled”). If set to None - method will return without waiting.

runtime_parameter_values: Optional[List[RuntimeParameterValue]]

Additional parameters to be injected into a model at runtime. The fieldName must match a fieldName that is listed in the runtimeParameterDefinitions section of the model-metadata.yaml file.

Returns
Job

created job

Raises
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

ValueError

if execution environment or entry point is not specified for the job

Return type

JobRun

classmethod list(job_id)

List job runs.

New in version v3.4.

Parameters
job_id: str

The ID of the job.

Returns
List[Job]

A list of job runs.

Raises
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

Return type

List[JobRun]

classmethod get(job_id, job_run_id)

Get job run by id.

New in version v3.4.

Parameters
job_id: str

The ID of the job.

job_run_id: str

The ID of the job run.

Returns
Job

The retrieved job run.

Raises
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

Return type

JobRun

update(description=None)

Update job run properties.

New in version v3.4.

Parameters
description: str

new job run description

Raises
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

Return type

None

cancel()

Cancel job run.

New in version v3.4.

Raises
datarobot.errors.ClientError

If the server responded with 4xx status.

datarobot.errors.ServerError

If the server responded with 5xx status.

Return type

None

refresh()

Update job run with the latest data from server.

New in version v3.4.

Raises
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

Return type

None

get_logs()

Get log of the job run.

New in version v3.4.

Raises
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

Return type

Optional[str]

delete_logs()

Get log of the job run.

New in version v3.4.

Raises
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

Return type

None

class datarobot.models.registry.job_run.JobRunStatus(value)

Enum of the job run statuses

class datarobot.models.registry.job.JobSchedule(id, custom_job_id, updated_at, updated_by, created_at, created_by, scheduled_job_id, schedule=None, deployment=None, parameter_overrides=None)

A job schedule.

New in version v3.5.

Attributes
id: str

The ID of the job schedule.

custom_job_id: str

The ID of the custom job.

updated_at: str

ISO-8601 formatted timestamp of when the schedule was updated.

updated_by: Dict[str, Any]

The user who updated the schedule.

created_at: str

ISO-8601 formatted timestamp of when the schedule was created.

created_by: Dict[str, Any]

The user who created the schedule.

scheduled_job_id: str

The ID of the scheduled job.

deployment: Dict[str, Any]

The deployment of the scheduled job.

schedule: Schedule

The schedule of the job.

parameter_overrides: List[RuntimeParameterValue]

The parameter overrides for this schedule.

update(schedule=None, parameter_overrides=None)

Update the job schedule.

Parameters
schedule: Optional[Schedule]

The schedule of the job.

parameter_overrides: Optional[List[RuntimeParameterValue]]

The parameter overrides for this schedule.

Return type

JobSchedule

delete()

Delete the job schedule. Returns ——- None

Return type

None

classmethod create(custom_job_id, schedule, parameter_overrides=None)

Create a job schedule.

Parameters
custom_job_id: str

The ID of the custom job.

schedule: Schedule

The schedule of the job.

parameter_overrides: Optional[List[RuntimeParameterValue]]

The parameter overrides for this schedule.

Return type

JobSchedule

Custom Models

class datarobot.models.custom_model_version.CustomModelFileItem(id, file_name, file_path, file_source, created_at=None)

A file item attached to a DataRobot custom model version.

New in version v2.21.

Attributes
id: str

The ID of the file item.

file_name: str

The name of the file item.

file_path: str

The path of the file item.

file_source: str

The source of the file item.

created_at: str, optional

ISO-8601 formatted timestamp of when the version was created.

class datarobot.CustomInferenceModel(**kwargs)

A custom inference model.

New in version v2.21.

Attributes
id: str

The ID of the custom model.

name: str

The name of the custom model.

language: str

The programming language of the custom inference model. Can be “python”, “r”, “java” or “other”.

description: str

The description of the custom inference model.

target_type: datarobot.TARGET_TYPE

Target type of the custom inference model. Values: [datarobot.TARGET_TYPE.BINARY, datarobot.TARGET_TYPE.REGRESSION, datarobot.TARGET_TYPE.MULTICLASS, datarobot.TARGET_TYPE.UNSTRUCTURED, datarobot.TARGET_TYPE.ANOMALY, datarobot.TARGET_TYPE.TEXT_GENERATION]

target_name: str, optional

Target feature name. It is optional(ignored if provided) for datarobot.TARGET_TYPE.UNSTRUCTURED or datarobot.TARGET_TYPE.ANOMALY target type.

latest_version: datarobot.CustomModelVersion or None

The latest version of the custom model if the model has a latest version.

deployments_count: int

Number of a deployments of the custom models.

target_name: str

The custom model target name.

positive_class_label: str

For binary classification projects, a label of a positive class.

negative_class_label: str

For binary classification projects, a label of a negative class.

prediction_threshold: float

For binary classification projects, a threshold used for predictions.

training_data_assignment_in_progress: bool

Flag describing if training data assignment is in progress.

training_dataset_id: str, optional

The ID of a dataset assigned to the custom model.

training_dataset_version_id: str, optional

The ID of a dataset version assigned to the custom model.

training_data_file_name: str, optional

The name of assigned training data file.

training_data_partition_column: str, optional

The name of a partition column in a training dataset assigned to the custom model.

created_by: str

The username of a user who created the custom model.

updated_at: str

ISO-8601 formatted timestamp of when the custom model was updated

created_at: str

ISO-8601 formatted timestamp of when the custom model was created

network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional

Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.PUBLIC].

maximum_memory: int, optional

The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s.

replicas: int, optional

A fixed number of replicas that will be deployed in the cluster

is_training_data_for_versions_permanently_enabled: bool, optional

Whether training data assignment on the version level is permanently enabled for the model.

classmethod list(is_deployed=None, search_for=None, order_by=None)

List custom inference models available to the user.

New in version v2.21.

Parameters
is_deployed: bool, optional

Flag for filtering custom inference models. If set to True, only deployed custom inference models are returned. If set to False, only not deployed custom inference models are returned.

search_for: str, optional

String for filtering custom inference models - only custom inference models that contain the string in name or description will be returned. If not specified, all custom models will be returned

order_by: str, optional

Property to sort custom inference models by. Supported properties are “created” and “updated”. Prefix the attribute name with a dash to sort in descending order, e.g. order_by=’-created’. By default, the order_by parameter is None which will result in custom models being returned in order of creation time descending.

Returns
List[CustomInferenceModel]

A list of custom inference models.

Raises
datarobot.errors.ClientError

If the server responded with 4xx status

datarobot.errors.ServerError

If the server responded with 5xx status

Return type

List[CustomInferenceModel]

classmethod get(custom_model_id)

Get custom inference model by id.

New in version v2.21.

Parameters
custom_model_id: str

The ID of the custom inference model.

Returns
CustomInferenceModel

Retrieved custom inference model.

Raises
datarobot.errors.ClientError

The ID the server responded with 4xx status.

datarobot.errors.ServerError

The ID the server responded with 5xx status.

Return type

CustomInferenceModel

download_latest_version(file_path)

Download the latest custom inference model version.

New in version v2.21.

Parameters
file_path: str

Path to create a file with custom model version content.

Raises
datarobot.errors.ClientError

If the server responded with 4xx status.

datarobot.errors.ServerError

If the server responded with 5xx status.

Return type

None

classmethod create(name, target_type, target_name=None, language=None, description=None, positive_class_label=None, negative_class_label=None, prediction_threshold=None, class_labels=None, class_labels_file=None, network_egress_policy=None, maximum_memory=None, replicas=None, is_training_data_for_versions_permanently_enabled=None)

Create a custom inference model.

New in version v2.21.

Parameters
name: str

Name of the custom inference model.

target_type: datarobot.TARGET_TYPE

Target type of the custom inference model. Values: [datarobot.TARGET_TYPE.BINARY, datarobot.TARGET_TYPE.REGRESSION, datarobot.TARGET_TYPE.MULTICLASS, datarobot.TARGET_TYPE.UNSTRUCTURED, datarobot.TARGET_TYPE.TEXT_GENERATION]

target_name: str, optional

Target feature name. It is optional(ignored if provided) for datarobot.TARGET_TYPE.UNSTRUCTURED target type.

language: str, optional

Programming language of the custom learning model.

description: str, optional

Description of the custom learning model.

positive_class_label: str, optional

Custom inference model positive class label for binary classification.

negative_class_label: str, optional

Custom inference model negative class label for binary classification.

prediction_threshold: float, optional

Custom inference model prediction threshold.

class_labels: List[str], optional

Custom inference model class labels for multiclass classification. Cannot be used with class_labels_file.

class_labels_file: str, optional

Path to file containing newline separated class labels for multiclass classification. Cannot be used with class_labels.

network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional

Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.PUBLIC]

maximum_memory: int, optional

The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s.

replicas: int, optional

A fixed number of replicas that will be deployed in the cluster.

is_training_data_for_versions_permanently_enabled: bool, optional

Permanently enable training data assignment on the version level for the current model, instead of training data assignment on the model level.

Returns
CustomInferenceModel

Created a custom inference model.

Raises
datarobot.errors.ClientError

If the server responded with 4xx status.

datarobot.errors.ServerError

If the server responded with 5xx status.

Return type

CustomInferenceModel

classmethod copy_custom_model(custom_model_id)

Create a custom inference model by copying existing one.

New in version v2.21.

Parameters
custom_model_id: str

The ID of the custom inference model to copy.

Returns
CustomInferenceModel

Created a custom inference model.

Raises
datarobot.errors.ClientError

If the server responded with 4xx status.

datarobot.errors.ServerError

If the server responded with 5xx status.

Return type

CustomInferenceModel

update(name=None, language=None, description=None, target_name=None, positive_class_label=None, negative_class_label=None, prediction_threshold=None, class_labels=None, class_labels_file=None, is_training_data_for_versions_permanently_enabled=None)

Update custom inference model properties.

New in version v2.21.

Parameters
name: str, optional

New custom inference model name.

language: str, optional

New custom inference model programming language.

description: str, optional

New custom inference model description.

target_name: str, optional

New custom inference model target name.

positive_class_label: str, optional

New custom inference model positive class label.

negative_class_label: str, optional

New custom inference model negative class label.

prediction_threshold: float, optional

New custom inference model prediction threshold.

class_labels: List[str], optional

custom inference model class labels for multiclass classification Cannot be used with class_labels_file

class_labels_file: str, optional

Path to file containing newline separated class labels for multiclass classification. Cannot be used with class_labels

is_training_data_for_versions_permanently_enabled: bool, optional

Permanently enable training data assignment on the version level for the current model, instead of training data assignment on the model level.

Raises
datarobot.errors.ClientError

If the server responded with 4xx status.

datarobot.errors.ServerError

If the server responded with 5xx status.

Return type

None

refresh()

Update custom inference model with the latest data from server.

New in version v2.21.

Raises
datarobot.errors.ClientError

If the server responded with 4xx status.

datarobot.errors.ServerError

If the server responded with 5xx status.

Return type

None

delete()

Delete custom inference model.

New in version v2.21.

Raises
datarobot.errors.ClientError

If the server responded with 4xx status.

datarobot.errors.ServerError

If the server responded with 5xx status.

Return type

None

assign_training_data(dataset_id, partition_column=None, max_wait=600)

Assign training data to the custom inference model.

New in version v2.21.

Parameters
dataset_id: str

The ID of the training dataset to be assigned.

partition_column: str, optional

The name of a partition column in the training dataset.

max_wait: int, optional

The max time to wait for a training data assignment. If set to None, then method will return without waiting. Defaults to 10 min.

Raises
datarobot.errors.ClientError

If the server responded with 4xx status

datarobot.errors.ServerError

If the server responded with 5xx status

Return type

None

class datarobot.CustomModelTest(**kwargs)

An custom model test.

New in version v2.21.

Attributes
id: str

test id

custom_model_image_id: str

id of a custom model image

image_type: str

the type of the image, either CUSTOM_MODEL_IMAGE_TYPE.CUSTOM_MODEL_IMAGE if the testing attempt is using a CustomModelImage as its model or CUSTOM_MODEL_IMAGE_TYPE.CUSTOM_MODEL_VERSION if the testing attempt is using a CustomModelVersion with dependency management

overall_status: str

a string representing testing status. Status can be - ‘not_tested’: the check not run - ‘failed’: the check failed - ‘succeeded’: the check succeeded - ‘warning’: the check resulted in a warning, or in non-critical failure - ‘in_progress’: the check is in progress

detailed_status: dict

detailed testing status - maps the testing types to their status and message. The keys of the dict are one of ‘errorCheck’, ‘nullValueImputation’, ‘longRunningService’, ‘sideEffects’. The values are dict with ‘message’ and ‘status’ keys.

created_by: str

a user who created a test

dataset_id: str, optional

id of a dataset used for testing

dataset_version_id: str, optional

id of a dataset version used for testing

completed_at: str, optional

ISO-8601 formatted timestamp of when the test has completed

created_at: str, optional

ISO-8601 formatted timestamp of when the version was created

network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional

Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.PUBLIC].

maximum_memory: int, optional

The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s

replicas: int, optional

A fixed number of replicas that will be deployed in the cluster

classmethod create(custom_model_id, custom_model_version_id, dataset_id=None, max_wait=600, network_egress_policy=None, maximum_memory=None, replicas=None)

Create and start a custom model test.

New in version v2.21.

Parameters
custom_model_id: str

the id of the custom model

custom_model_version_id: str

the id of the custom model version

dataset_id: str, optional

The id of the testing dataset for non-unstructured custom models. Ignored and not required for unstructured models.

max_wait: int, optional

max time to wait for a test completion. If set to None - method will return without waiting.

network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional

Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.PUBLIC].

maximum_memory: int, optional

The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s

replicas: int, optional

A fixed number of replicas that will be deployed in the cluster

Returns
CustomModelTest

created custom model test

Raises
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod list(custom_model_id)

List custom model tests.

New in version v2.21.

Parameters
custom_model_id: str

the id of the custom model

Returns
List[CustomModelTest]

a list of custom model tests

Raises
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod get(custom_model_test_id)

Get custom model test by id.

New in version v2.21.

Parameters
custom_model_test_id: str

the id of the custom model test

Returns
CustomModelTest

retrieved custom model test

Raises
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

get_log()

Get log of a custom model test.

New in version v2.21.

Raises
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

get_log_tail()

Get log tail of a custom model test.

New in version v2.21.

Raises
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

cancel()

Cancel custom model test that is in progress.

New in version v2.21.

Raises
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

refresh()

Update custom model test with the latest data from server.

New in version v2.21.

Raises
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

class datarobot.CustomModelVersion(**kwargs)

A version of a DataRobot custom model.

New in version v2.21.

Attributes
id: str

The ID of the custom model version.

custom_model_id: str

The ID of the custom model.

version_minor: int

A minor version number of the custom model version.

version_major: int

A major version number of the custom model version.

is_frozen: bool

A flag if the custom model version is frozen.

items: List[CustomModelFileItem]

A list of file items attached to the custom model version.

base_environment_id: str

The ID of the environment to use with the model.

base_environment_version_id: str

The ID of the environment version to use with the model.

label: str, optional

A short human readable string to label the version.

description: str, optional

The custom model version description.

created_at: str, optional

ISO-8601 formatted timestamp of when the version was created.

dependencies: List[CustomDependency]

The parsed dependencies of the custom model version if the version has a valid requirements.txt file.

network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional

Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.PUBLIC].

maximum_memory: int, optional

The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s.

replicas: int, optional

A fixed number of replicas that will be deployed in the cluster.

required_metadata_values: List[RequiredMetadataValue]

Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.

training_data: TrainingData, optional

The information about the training data assigned to the model version.

holdout_data: HoldoutData, optional

The information about the holdout data assigned to the model version.

classmethod from_server_data(data, keep_attrs=None)

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters
datadict

The directly translated dict of JSON from the server. No casing fixes have taken place

keep_attrsiterable

List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

Return type

CustomModelVersion

classmethod create_clean(custom_model_id, base_environment_id=None, is_major_update=True, folder_path=None, files=None, network_egress_policy=None, maximum_memory=None, replicas=None, required_metadata_values=None, training_dataset_id=None, partition_column=None, holdout_dataset_id=None, keep_training_holdout_data=None, max_wait=600, runtime_parameter_values=None, base_environment_version_id=None)

Create a custom model version without files from previous versions.

Create a version with training or holdout data: If training/holdout data related parameters are provided, the training data is assigned asynchronously. In this case: * if max_wait is not None, the function returns once the job is finished. * if max_wait is None, the function returns immediately. Progress can be polled by the user (see examples).

If training data assignment fails, new version is still created, but it is not allowed to create a model package (version) for the model version and to deploy it. To check for training data assignment error, check version.training_data.assignment_error[“message”].

New in version v2.21.

Parameters
custom_model_id: str

The ID of the custom model.

base_environment_id: str

The base environment to use with this model version. At least one of “base_environment_id” and “base_environment_version_id” must be provided. If both are specified, the version must belong to the environment.

base_environment_version_id: str

The base environment version ID to use with this model version. At least one of “base_environment_id” and “base_environment_version_id” must be provided. If both are specified, the version must belong to the environment. If not specified: in case previous model versions exist, the value from the latest model version is inherited, otherwise, latest successfully built version of the environment specified in “base_environment_id” is used.

is_major_update: bool, optional

The flag defining if a custom model version will be a minor or a major version. Default to True

folder_path: str, optional

The path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path.

files: list, optional

The list of tuples, where values in each tuple are the local filesystem path and the path the file should be placed in the model. If the list is of strings, then basenames will be used for tuples. Example: [(“/home/user/Documents/myModel/file1.txt”, “file1.txt”), (“/home/user/Documents/myModel/folder/file2.txt”, “folder/file2.txt”)] or [“/home/user/Documents/myModel/file1.txt”, “/home/user/Documents/myModel/folder/file2.txt”]

network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional

Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.PUBLIC].

maximum_memory: int, optional

The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s.

replicas: int, optional

A fixed number of replicas that will be deployed in the cluster.

required_metadata_values: List[RequiredMetadataValue]

Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.

training_dataset_id: str, optional

The ID of the training dataset to assign to the custom model.

partition_column: str, optional

Name of a partition column in a training dataset assigned to the custom model. Can only be assigned for structured models.

holdout_dataset_id: str, optional

The ID of the holdout dataset to assign to the custom model. Can only be assigned for unstructured models.

keep_training_holdout_data: bool, optional

If the version should inherit training and holdout data from the previous version. Defaults to True. This field is only applicable if the model has training data for versions enabled, otherwise the field value will be ignored.

max_wait: int, optional

Max time to wait for training data assignment. If set to None - method will return without waiting. Defaults to 10 minutes.

runtime_parameter_values: List[RuntimeParameterValue]

Additional parameters to be injected into a model at runtime. The fieldName must match a fieldName that is listed in the runtimeParameterDefinitions section of the model-metadata.yaml file.

Returns
CustomModelVersion

Created custom model version.

Raises
datarobot.errors.ClientError

If the server responded with 4xx status.

datarobot.errors.ServerError

If the server responded with 5xx status.

datarobot.errors.InvalidUsageError

If wrong parameters are provided.

datarobot.errors.TrainingDataAssignmentError

If training data assignment fails.

Examples

Create a version with blocking (default max_wait=600) training data assignment:

import datarobot as dr
from datarobot.errors import TrainingDataAssignmentError

dr.Client(token=my_token, endpoint=endpoint)

try:
    version = dr.CustomModelVersion.create_clean(
        custom_model_id="6444482e5583f6ee2e572265",
        base_environment_id="642209acc563893014a41e24",
        training_dataset_id="6421f2149a4f9b1bec6ad6dd",
    )
except TrainingDataAssignmentError as e:
    print(e)

Create a version with non-blocking training data assignment:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)

version = dr.CustomModelVersion.create_clean(
    custom_model_id="6444482e5583f6ee2e572265",
    base_environment_id="642209acc563893014a41e24",
    training_dataset_id="6421f2149a4f9b1bec6ad6dd",
    max_wait=None,
)

while version.training_data.assignment_in_progress:
    time.sleep(10)
    version.refresh()
if version.training_data.assignment_error:
    print(version.training_data.assignment_error["message"])
Return type

CustomModelVersion

classmethod create_from_previous(custom_model_id, base_environment_id=None, is_major_update=True, folder_path=None, files=None, files_to_delete=None, network_egress_policy=None, maximum_memory=None, replicas=None, required_metadata_values=None, training_dataset_id=None, partition_column=None, holdout_dataset_id=None, keep_training_holdout_data=None, max_wait=600, runtime_parameter_values=None, base_environment_version_id=None)

Create a custom model version containing files from a previous version.

Create a version with training/holdout data: If training/holdout data related parameters are provided, the training data is assigned asynchronously. In this case: * if max_wait is not None, function returns once job is finished. * if max_wait is None, function returns immediately, progress can be polled by the user, see examples.

If training data assignment fails, new version is still created, but it is not allowed to create a model package (version) for the model version and to deploy it. To check for training data assignment error, check version.training_data.assignment_error[“message”].

New in version v2.21.

Parameters
custom_model_id: str

The ID of the custom model.

base_environment_id: str

The base environment to use with this model version. At least one of “base_environment_id” and “base_environment_version_id” must be provided. If both are specified, the version must belong to the environment.

base_environment_version_id: str

The base environment version ID to use with this model version. At least one of “base_environment_id” and “base_environment_version_id” must be provided. If both are specified, the version must belong to the environment. If not specified: in case previous model versions exist, the value from the latest model version is inherited, otherwise, latest successfully built version of the environment specified in “base_environment_id” is used.

is_major_update: bool, optional

The flag defining if a custom model version will be a minor or a major version. Defaults to True.

folder_path: str, optional

The path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path.

files: list, optional

The list of tuples, where values in each tuple are the local filesystem path and the path the file should be placed in the model. If list is of strings, then basenames will be used for tuples Example: [(“/home/user/Documents/myModel/file1.txt”, “file1.txt”), (“/home/user/Documents/myModel/folder/file2.txt”, “folder/file2.txt”)] or [“/home/user/Documents/myModel/file1.txt”, “/home/user/Documents/myModel/folder/file2.txt”]

files_to_delete: list, optional

The list of a file items ids to be deleted. Example: [“5ea95f7a4024030aba48e4f9”, “5ea6b5da402403181895cc51”]

network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional

Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.PUBLIC].

maximum_memory: int, optional

The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s

replicas: int, optional

A fixed number of replicas that will be deployed in the cluster

required_metadata_values: List[RequiredMetadataValue]

Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.

training_dataset_id: str, optional

The ID of the training dataset to assign to the custom model.

partition_column: str, optional

Name of a partition column in a training dataset assigned to the custom model. Can only be assigned for structured models.

holdout_dataset_id: str, optional

The ID of the holdout dataset to assign to the custom model. Can only be assigned for unstructured models.

keep_training_holdout_data: bool, optional

If the version should inherit training and holdout data from the previous version. Defaults to True. This field is only applicable if the model has training data for versions enabled, otherwise the field value will be ignored.

max_wait: int, optional

Max time to wait for training data assignment. If set to None - method will return without waiting. Defaults to 10 minutes.

runtime_parameter_values: List[RuntimeParameterValue]

Additional parameters to be injected into the model at runtime. The fieldName must match a fieldName that is listed in the runtimeParameterDefinitions section of the model-metadata.yaml file. This list will be merged with any existing runtime values set from the prior version, so it is possible to specify a null value to unset specific parameters and fall back to the defaultValue from the definition.

Returns
CustomModelVersion

created custom model version

Raises
datarobot.errors.ClientError

If the server responded with 4xx status.

datarobot.errors.ServerError

If the server responded with 5xx status.

datarobot.errors.InvalidUsageError

If wrong parameters are provided.

datarobot.errors.TrainingDataAssignmentError

If training data assignment fails.

Examples

Create a version with blocking (default max_wait=600) training data assignment:

import datarobot as dr
from datarobot.errors import TrainingDataAssignmentError

dr.Client(token=my_token, endpoint=endpoint)

try:
    version = dr.CustomModelVersion.create_from_previous(
        custom_model_id="6444482e5583f6ee2e572265",
        base_environment_id="642209acc563893014a41e24",
        training_dataset_id="6421f2149a4f9b1bec6ad6dd",
    )
except TrainingDataAssignmentError as e:
    print(e)

Create a version with non-blocking training data assignment:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)

version = dr.CustomModelVersion.create_from_previous(
    custom_model_id="6444482e5583f6ee2e572265",
    base_environment_id="642209acc563893014a41e24",
    training_dataset_id="6421f2149a4f9b1bec6ad6dd",
    max_wait=None,
)

while version.training_data.assignment_in_progress:
    time.sleep(10)
    version.refresh()
if version.training_data.assignment_error:
    print(version.training_data.assignment_error["message"])
Return type

CustomModelVersion

classmethod list(custom_model_id)

List custom model versions.

New in version v2.21.

Parameters
custom_model_id: str

The ID of the custom model.

Returns
List[CustomModelVersion]

A list of custom model versions.

Raises
datarobot.errors.ClientError

If the server responded with 4xx status.

datarobot.errors.ServerError

If the server responded with 5xx status.

Return type

List[CustomModelVersion]

classmethod get(custom_model_id, custom_model_version_id)

Get custom model version by id.

New in version v2.21.

Parameters
custom_model_id: str

The ID of the custom model.

custom_model_version_id: str

The id of the custom model version to retrieve.

Returns
CustomModelVersion

Retrieved custom model version.

Raises
datarobot.errors.ClientError

If the server responded with 4xx status.

datarobot.errors.ServerError

If the server responded with 5xx status.

Return type

CustomModelVersion

download(file_path)

Download custom model version.

New in version v2.21.

Parameters
file_path: str

Path to create a file with custom model version content.

Raises
datarobot.errors.ClientError

If the server responded with 4xx status.

datarobot.errors.ServerError

If the server responded with 5xx status.

Return type

None

update(description=None, required_metadata_values=None)

Update custom model version properties.

New in version v2.21.

Parameters
description: str, optional

New custom model version description.

required_metadata_values: List[RequiredMetadataValue], optional

Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.

Raises
datarobot.errors.ClientError

If the server responded with 4xx status.

datarobot.errors.ServerError

If the server responded with 5xx status.

Return type

None

refresh()

Update custom model version with the latest data from server.

New in version v2.21.

Raises
datarobot.errors.ClientError

If the server responded with 4xx status.

datarobot.errors.ServerError

If the server responded with 5xx status.

Return type

None

get_feature_impact(with_metadata=False)

Get custom model feature impact.

New in version v2.23.

Parameters
with_metadatabool

The flag indicating if the result should include the metadata as well.

Returns
feature_impactslist of dict

The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises
datarobot.errors.ClientError

If the server responded with 4xx status.

datarobot.errors.ServerError

If the server responded with 5xx status.

Return type

List[Dict[str, Any]]

calculate_feature_impact(max_wait=600)

Calculate custom model feature impact.

New in version v2.23.

Parameters
max_wait: int, optional

Max time to wait for feature impact calculation. If set to None - method will return without waiting. Defaults to 10 min

Raises
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

Return type

None

class datarobot.models.execution_environment.RequiredMetadataKey(**kwargs)

Definition of a metadata key that custom models using this environment must define

New in version v2.25.

Attributes
field_name: str

The required field key. This value will be added as an environment variable when running custom models.

display_name: str

A human readable name for the required field.

class datarobot.models.CustomModelVersionConversion(**kwargs)

A conversion of a DataRobot custom model version.

New in version v2.27.

Attributes
id: str

The ID of the custom model version conversion.

custom_model_version_id: str

The ID of the custom model version.

created: str

ISO-8601 timestamp of when the custom model conversion created.

main_program_item_id: str or None

The ID of the main program item.

log_message: str or None

The conversion output log message.

generated_metadata: dict or None

The dict contains two items: ‘outputDataset’ & ‘outputColumns’.

conversion_succeeded: bool

Whether the conversion succeeded or not.

conversion_in_progress: bool

Whether a given conversion is in progress or not.