API Reference

Advanced Options

class datarobot.helpers.AdvancedOptions(weights: Optional[str] = None, response_cap: Union[bool, float, None] = None, blueprint_threshold: Optional[int] = None, seed: Optional[int] = None, smart_downsampled: Optional[bool] = None, majority_downsampling_rate: Optional[float] = None, offset: Optional[List[str]] = None, exposure: Optional[str] = None, accuracy_optimized_mb: Optional[bool] = None, scaleout_modeling_mode: Optional[str] = None, events_count: Optional[str] = None, monotonic_increasing_featurelist_id: Optional[str] = None, monotonic_decreasing_featurelist_id: Optional[str] = None, only_include_monotonic_blueprints: Optional[bool] = None, allowed_pairwise_interaction_groups: Optional[List[Tuple[str, ...]]] = None, blend_best_models: Optional[bool] = None, scoring_code_only: Optional[bool] = None, prepare_model_for_deployment: Optional[bool] = None, consider_blenders_in_recommendation: Optional[bool] = None, min_secondary_validation_model_count: Optional[int] = None, shap_only_mode: Optional[bool] = None, autopilot_data_sampling_method: Optional[str] = None, run_leakage_removed_feature_list: Optional[bool] = None, autopilot_with_feature_discovery: Optional[bool] = False, feature_discovery_supervised_feature_reduction: Optional[bool] = None, exponentially_weighted_moving_alpha: Optional[float] = None, external_time_series_baseline_dataset_id: Optional[str] = None, use_supervised_feature_reduction: Optional[bool] = True, primary_location_column: Optional[str] = None, protected_features: Optional[List[str]] = None, preferable_target_value: Optional[str] = None, fairness_metrics_set: Optional[str] = None, fairness_threshold: Optional[str] = None, bias_mitigation_feature_name: Optional[str] = None, bias_mitigation_technique: Optional[str] = None, include_bias_mitigation_feature_as_predictor_variable: Optional[bool] = None, default_monotonic_increasing_featurelist_id: Optional[str] = None, default_monotonic_decreasing_featurelist_id: Optional[str] = None)

Used when setting the target of a project to set advanced options of modeling process.

Parameters:
weights : string, optional

The name of a column indicating the weight of each row

response_cap : bool or float in [0.5, 1), optional

Defaults to none here, but server defaults to False. If specified, it is the quantile of the response distribution to use for response capping.

blueprint_threshold : int, optional

Number of hours models are permitted to run before being excluded from later autopilot stages Minimum 1

seed : int, optional

a seed to use for randomization

smart_downsampled : bool, optional

whether to use smart downsampling to throw away excess rows of the majority class. Only applicable to classification and zero-boosted regression projects.

majority_downsampling_rate : float, optional

the percentage between 0 and 100 of the majority rows that should be kept. Specify only if using smart downsampling. May not cause the majority class to become smaller than the minority class.

offset : list of str, optional

(New in version v2.6) the list of the names of the columns containing the offset of each row

exposure : string, optional

(New in version v2.6) the name of a column containing the exposure of each row

accuracy_optimized_mb : bool, optional

(New in version v2.6) Include additional, longer-running models that will be run by the autopilot and available to run manually.

scaleout_modeling_mode : string, optional

(Deprecated in 2.28. Will be removed in 2.30) DataRobot no longer supports scaleout models. Please remove any usage of this parameter as it will be removed from the API soon.

events_count : string, optional

(New in version v2.8) the name of a column specifying events count.

monotonic_increasing_featurelist_id : string, optional

(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overriden at model submission time if desired.

monotonic_decreasing_featurelist_id : string, optional

(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overriden at model submission time if desired.

only_include_monotonic_blueprints : bool, optional

(new in version 2.11) when true, only blueprints that support enforcing monotonic constraints will be available in the project or selected for the autopilot.

allowed_pairwise_interaction_groups : list of tuple, optional

(New in version v2.19) For GA2M models - specify groups of columns for which pairwise interactions will be allowed. E.g. if set to [(A, B, C), (C, D)] then GA2M models will allow interactions between columns AxB, BxC, AxC, CxD. All others (AxD, BxD) will not be considered.

blend_best_models: bool, optional

(New in version v2.19) blend best models during Autopilot run.

scoring_code_only: bool, optional

(New in version v2.19) Keep only models that can be converted to scorable java code during Autopilot run

shap_only_mode: bool, optional

(New in version v2.21) Keep only models that support SHAP values during Autopilot run. Use SHAP-based insights wherever possible. Defaults to False.

prepare_model_for_deployment: bool, optional

(New in version v2.19) Prepare model for deployment during Autopilot run. The preparation includes creating reduced feature list models, retraining best model on higher sample size, computing insights and assigning “RECOMMENDED FOR DEPLOYMENT” label.

consider_blenders_in_recommendation: bool, optional

(New in version 2.22.0) Include blenders when selecting a model to prepare for deployment in an Autopilot Run. Defaults to False.

min_secondary_validation_model_count: int, optional

(New in version v2.19) Compute “All backtest” scores (datetime models) or cross validation scores for the specified number of the highest ranking models on the Leaderboard, if over the Autopilot default.

autopilot_data_sampling_method: str, optional

(New in version v2.23) one of datarobot.enums.DATETIME_AUTOPILOT_DATA_SAMPLING_METHOD. Applicable for OTV projects only, defines if autopilot uses “random” or “latest” sampling when iteratively building models on various training samples. Defaults to “random” for duration-based projects and to “latest” for row-based projects.

run_leakage_removed_feature_list: bool, optional

(New in version v2.23) Run Autopilot on Leakage Removed feature list (if exists).

autopilot_with_feature_discovery: bool, default ``False``, optional

(New in version v2.23) If true, autopilot will run on a feature list that includes features found via search for interactions.

feature_discovery_supervised_feature_reduction: bool, optional

(New in version v2.23) Run supervised feature reduction for feature discovery projects.

exponentially_weighted_moving_alpha: float, optional

(New in version v2.26) defaults to None, value between 0 and 1 (inclusive), indicates alpha parameter used in exponentially weighted moving average within feature derivation window.

external_time_series_baseline_dataset_id: str, optional

(New in version v2.26) If provided, will generate metrics scaled by external model predictions metric for time series projects. The external predictions catalog must be validated before autopilot starts, see Project.validate_external_time_series_baseline and external baseline predictions documentation for further explanation.

use_supervised_feature_reduction: bool, default ``True` optional

Time Series only. When true, during feature generation DataRobot runs a supervised algorithm to retain only qualifying features. Setting to false can severely impact autopilot duration, especially for datasets with many features.

primary_location_column: str, optional.

The name of primary location column.

protected_features: list of str, optional.

(New in version v2.24) A list of project features to mark as protected for Bias and Fairness testing calculations. Max number of protected features allowed is 10.

preferable_target_value: str, optional.

(New in version v2.24) A target value that should be treated as a favorable outcome for the prediction. For example, if we want to check gender discrimination for giving a loan and our target is named is_bad, then the positive outcome for the prediction would be No, which means that the loan is good and that’s what we treat as a favorable result for the loaner.

fairness_metrics_set: str, optional.

(New in version v2.24) Metric to use for calculating fairness. Can be one of proportionalParity, equalParity, predictionBalance, trueFavorableAndUnfavorableRateParity or favorableAndUnfavorablePredictiveValueParity. Used and required only if Bias & Fairness in AutoML feature is enabled.

fairness_threshold: str, optional.

(New in version v2.24) Threshold value for the fairness metric. Can be in a range of [0.0, 1.0]. If the relative (i.e. normalized) fairness score is below the threshold, then the user will see a visual indication on the

bias_mitigation_feature_name : str, optional

The feature from protected features that will be used in a bias mitigation task to mitigate bias

bias_mitigation_technique : str, optional

One of datarobot.enums.BiasMitigationTechnique Options: - ‘preprocessingReweighing’ - ‘postProcessingRejectionOptionBasedClassification’ The technique by which we’ll mitigate bias, which will inform which bias mitigation task we insert into blueprints

include_bias_mitigation_feature_as_predictor_variable : bool, optional

Whether we should also use the mitigation feature as in input to the modeler just like any other categorical used for training, i.e. do we want the model to “train on” this feature in addition to using it for bias mitigation

default_monotonic_increasing_featurelist_id : str, optional

Returned from server on Project GET request - not able to be updated by user

default_monotonic_decreasing_featurelist_id : str, optional

Returned from server on Project GET request - not able to be updated by user

Examples

import datarobot as dr
advanced_options = dr.AdvancedOptions(
    weights='weights_column',
    offset=['offset_column'],
    exposure='exposure_column',
    response_cap=0.7,
    blueprint_threshold=2,
    smart_downsampled=True, majority_downsampling_rate=75.0)
update_individual_options(**kwargs) → None

Update individual attributes of an instance of AdvancedOptions.

Anomaly Assessment

class datarobot.models.anomaly_assessment.AnomalyAssessmentRecord(status, status_details, start_date, end_date, prediction_threshold, preview_location, delete_location, latest_explanations_location, **record_kwargs)

Object which keeps metadata about anomaly assessment insight for the particular subset, backtest and series and the links to proceed to get the anomaly assessment data.

New in version v2.25.

Notes

Record contains:

  • record_id : the ID of the record.
  • project_id : the project ID of the record.
  • model_id : the model ID of the record.
  • backtest : the backtest of the record.
  • source : the source of the record.
  • series_id : the series id of the record for the multiseries projects.
  • status : the status of the insight.
  • status_details : the explanation of the status.
  • start_date : the ISO-formatted timestamp of the first prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.
  • end_date : the ISO-formatted timestamp of the last prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.
  • prediction_threshold : the threshold, all rows with anomaly scores greater or equal to it have shap explanations computed. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.
  • preview_location : URL to retrieve predictions preview for the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.
  • latest_explanations_location : the URL to retrieve the latest predictions with the shap explanations. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.
  • delete_location : the URL to delete anomaly assessment record and relevant insight data.
Attributes:
record_id: str

The ID of the record.

project_id: str

The ID of the project record belongs to.

model_id: str

The ID of the model record belongs to.

backtest: int or “holdout”

The backtest of the record.

source: “training” or “validation”

The source of the record

series_id: str or None

The series id of the record for the multiseries projects. Defined only for the multiseries projects.

status: str

The status of the insight. One of datarobot.enums.AnomalyAssessmentStatus

status_details: str

The explanation of the status.

start_date: str or None

See start_date info in Notes for more details.

end_date: str or None

See end_date info in Notes for more details.

prediction_threshold: float or None

See prediction_threshold info in Notes for more details.

preview_location: str or None

See preview_location info in Notes for more details.

latest_explanations_location: str or None

See latest_explanations_location info in Notes for more details.

delete_location: str

The URL to delete anomaly assessment record and relevant insight data.

classmethod list(project_id, model_id, backtest=None, source=None, series_id=None, limit=100, offset=0, with_data_only=False)

Retrieve the list of the anomaly assessment records for the project and model. Output can be filtered and limited.

Parameters:
project_id: str

The ID of the project record belongs to.

model_id: str

The ID of the model record belongs to.

backtest: int or “holdout”

The backtest to filter records by.

source: “training” or “validation”

The source to filter records by.

series_id: str, optional

The series id to filter records by. Can be specified for multiseries projects.

limit: int, optional

100 by default. At most this many results are returned.

offset: int, optional

This many results will be skipped.

with_data_only: bool, False by default

Filter by status == AnomalyAssessmentStatus.COMPLETED. If True, records with no data or not supported will be omitted.

Returns:
AnomalyAssessmentRecord

The anomaly assessment record.

classmethod compute(project_id, model_id, backtest, source, series_id=None)

Request anomaly assessment insight computation on the specified subset.

Parameters:
project_id: str

The ID of the project to compute insight for.

model_id: str

The ID of the model to compute insight for.

backtest: int or “holdout”

The backtest to compute insight for.

source: “training” or “validation”

The source to compute insight for.

series_id: str, optional

The series id to compute insight for. Required for multiseries projects.

Returns:
AnomalyAssessmentRecord

The anomaly assessment record.

delete()

Delete anomaly assessment record with preview and explanations.

get_predictions_preview()

Retrieve aggregated predictions statistics for the anomaly assessment record.

Returns:
AnomalyAssessmentPredictionsPreview
get_latest_explanations()

Retrieve latest predictions along with shap explanations for the most anomalous records.

Returns:
AnomalyAssessmentExplanations
get_explanations(start_date=None, end_date=None, points_count=None)

Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.

Parameters:
start_date: str, optional

The start of the date range to get explanations in. Example: 2020-01-01T00:00:00.000000Z

end_date: str, optional

The end of the date range to get explanations in. Example: 2020-10-01T00:00:00.000000Z

points_count: int, optional

The number of the rows to return.

Returns:
AnomalyAssessmentExplanations
get_explanations_data_in_regions(regions, prediction_threshold=0.0)

Get predictions along with explanations for the specified regions, sorted by predictions in descending order.

Parameters:
regions: list of preview_bins

For each region explanations will be retrieved and merged.

prediction_threshold: float, optional

If specified, only points with score greater or equal to the threshold will be returned.

Returns:
dict in a form of {‘explanations’: explanations, ‘shap_base_value’: shap_base_value}
class datarobot.models.anomaly_assessment.AnomalyAssessmentExplanations(shap_base_value, data, start_date, end_date, count, **record_kwargs)

Object which keeps predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points.

New in version v2.25.

Notes

AnomalyAssessmentExplanations contains:

  • record_id : the id of the corresponding anomaly assessment record.
  • project_id : the project ID of the corresponding anomaly assessment record.
  • model_id : the model ID of the corresponding anomaly assessment record.
  • backtest : the backtest of the corresponding anomaly assessment record.
  • source : the source of the corresponding anomaly assessment record.
  • series_id : the series id of the corresponding anomaly assessment record for the multiseries projects.
  • start_date : the ISO-formatted first timestamp in the response. Will be None of there is no data in the specified range.
  • end_date : the ISO-formatted last timestamp in the response. Will be None of there is no data in the specified range.
  • count : The number of points in the response.
  • shap_base_value : the shap base value.
  • data : list of DataPoint objects in the specified date range.

DataPoint contains:

  • shap_explanation : None or an array of up to 10 ShapleyFeatureContribution objects. Only rows with the highest anomaly scores have Shapley explanations calculated. Value is None if prediction is lower than prediction_threshold.
  • timestamp (str) : ISO-formatted timestamp for the row.
  • prediction (float) : The output of the model for this row.

ShapleyFeatureContribution contains:

  • feature_value (str) : the feature value for this row. First 50 characters are returned.
  • strength (float) : the shap value for this feature and row.
  • feature (str) : the feature name.
Attributes:
record_id: str

The ID of the record.

project_id: str

The ID of the project record belongs to.

model_id: str

The ID of the model record belongs to.

backtest: int or “holdout”

The backtest of the record.

source: “training” or “validation”

The source of the record.

series_id: str or None

The series id of the record for the multiseries projects. Defined only for the multiseries projects.

start_date: str or None

The ISO-formatted datetime of the first row in the data.

end_date: str or None

The ISO-formatted datetime of the last row in the data.

data: array of `data_point` objects or None

See data info in Notes for more details.

shap_base_value: float

Shap base value.

count: int

The number of points in the data.

classmethod get(project_id, record_id, start_date=None, end_date=None, points_count=None)

Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.

Parameters:
project_id: str

The ID of the project.

record_id: str

The ID of the anomaly assessment record.

start_date: str, optional

The start of the date range to get explanations in. Example: 2020-01-01T00:00:00.000000Z

end_date: str, optional

The end of the date range to get explanations in. Example: 2020-10-01T00:00:00.000000Z

points_count: int, optional

The number of the rows to return.

Returns:
AnomalyAssessmentExplanations
class datarobot.models.anomaly_assessment.AnomalyAssessmentPredictionsPreview(start_date, end_date, preview_bins, **record_kwargs)

Aggregated predictions over time for the corresponding anomaly assessment record. Intended to find the bins with highest anomaly scores.

New in version v2.25.

Notes

AnomalyAssessmentPredictionsPreview contains:

  • record_id : the id of the corresponding anomaly assessment record.
  • project_id : the project ID of the corresponding anomaly assessment record.
  • model_id : the model ID of the corresponding anomaly assessment record.
  • backtest : the backtest of the corresponding anomaly assessment record.
  • source : the source of the corresponding anomaly assessment record.
  • series_id : the series id of the corresponding anomaly assessment record for the multiseries projects.
  • start_date : the ISO-formatted timestamp of the first prediction in the subset.
  • end_date : the ISO-formatted timestamp of the last prediction in the subset.
  • preview_bins : list of PreviewBin objects. The aggregated predictions for the subset. Bins boundaries may differ from actual start/end dates because this is an aggregation.

PreviewBin contains:

  • start_date (str) : the ISO-formatted datetime of the start of the bin.
  • end_date (str) : the ISO-formatted datetime of the end of the bin.
  • avg_predicted (float or None) : the average prediction of the model in the bin. None if there are no entries in the bin.
  • max_predicted (float or None) : the maximum prediction of the model in the bin. None if there are no entries in the bin.
  • frequency (int) : the number of the rows in the bin.
Attributes:
record_id: str

The ID of the record.

project_id: str

The ID of the project record belongs to.

model_id: str

The ID of the model record belongs to.

backtest: int or “holdout”

The backtest of the record.

source: “training” or “validation”

The source of the record

series_id: str or None

The series id of the record for the multiseries projects. Defined only for the multiseries projects.

start_date: str

the ISO-formatted timestamp of the first prediction in the subset.

end_date: str

the ISO-formatted timestamp of the last prediction in the subset.

preview_bins: list of preview_bin objects.

The aggregated predictions for the subset. See more info in Notes.

classmethod get(project_id, record_id)

Retrieve aggregated predictions over time.

Parameters:
project_id: str

The ID of the project.

record_id: str

The ID of the anomaly assessment record.

Returns:
AnomalyAssessmentPredictionsPreview
find_anomalous_regions(max_prediction_threshold=0.0)
Sort preview bins by max_predicted value and select those with max predicted value
greater or equal to max prediction threshold. Sort the result by max predicted value in descending order.
Parameters:
max_prediction_threshold: float, optional

Return bins with maximum anomaly score greater or equal to max_prediction_threshold.

Returns:
preview_bins: list of preview_bin

Filtered and sorted preview bins

Batch Predictions

class datarobot.models.BatchPredictionJob(data: Dict[str, Any], completed_resource_url: Optional[str] = None)

A Batch Prediction Job is used to score large data sets on prediction servers using the Batch Prediction API.

Attributes:
id : str

the id of the job

classmethod score(deployment: DeploymentType, intake_settings: Optional[IntakeSettings] = None, output_settings: Optional[OutputSettings] = None, csv_settings: Optional[CsvSettings] = None, timeseries_settings: Optional[TimeSeriesSettings] = None, num_concurrent: Optional[int] = None, chunk_size: Optional[Union[int, str]] = None, passthrough_columns: Optional[List[str]] = None, passthrough_columns_set: Optional[str] = None, max_explanations: Optional[int] = None, max_ngram_explanations: Optional[Union[int, str]] = None, threshold_high: Optional[float] = None, threshold_low: Optional[float] = None, prediction_warning_enabled: Optional[bool] = None, include_prediction_status: bool = False, skip_drift_tracking: bool = False, prediction_instance: Optional[PredictionInstance] = None, abort_on_error: bool = True, column_names_remapping: Optional[Dict[str, str]] = None, include_probabilities: bool = True, include_probabilities_classes: Optional[List[str]] = None, download_timeout: Optional[int] = 120, download_read_timeout: Optional[int] = 660, upload_read_timeout: Optional[int] = 600, explanations_mode: Optional[PredictionExplanationsMode] = None) → BatchPredictionJob

Create new batch prediction job, upload the scoring dataset and return a batch prediction job.

The default intake and output options are both localFile which requires the caller to pass the file parameter and either download the results using the download() method afterwards or pass a path to a file where the scored data will be downloaded to afterwards.

Returns:
BatchPredictionJob

Instance of BatchPredictionJob

Attributes:
deployment : Deployment or string ID

Deployment which will be used for scoring.

intake_settings : dict (optional)

A dict configuring how data is coming from. Supported options:

  • type : string, either localFile, s3, azure, gcp, dataset, jdbc snowflake, synapse or bigquery

Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a dr.Dataset object.

To score from a local file, add the this parameter to the settings:

  • file : file-like object, string path to file or a pandas.DataFrame of scoring data

To score from S3, add the next parameters to the settings:

  • url : string, the URL to score (e.g.: s3://bucket/key)
  • credential_id : string (optional)
  • endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default)

To score from JDBC, add the next parameters to the settings:

  • data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).
  • query : string (optional if table, schema and/or catalog is specified), a self-supplied SELECT statement of the data set you wish to predict.
  • table : string (optional if query is specified), the name of specified database table.
  • schema : string (optional if query is specified), the name of specified database schema.
  • catalog : string (optional if query is specified), (new in v2.22) the name of specified database catalog.
  • fetch_size : int (optional), Changing the fetchSize can be used to balance throughput and memory usage.
  • credential_id : string (optional) the ID of the credentials holding information about a user with read-access to the JDBC data source (see Credentials).
output_settings : dict (optional)

A dict configuring how scored data is to be saved. Supported options:

  • type : string, either localFile, s3, azure, gcp, jdbc, snowflake, synapse or bigquery

To save scored data to a local file, add this parameters to the settings:

  • path : string (optional), path to save the scored data as CSV. If a path is not specified, you must download the scored data yourself with job.download(). If a path is specified, the call will block until the job is done. if there are no other jobs currently processing for the targeted prediction instance, uploading, scoring, downloading will happen in parallel without waiting for a full job to complete. Otherwise, it will still block, but start downloading the scored data as soon as it starts generating data. This is the fastest method to get predictions.

To save scored data to S3, add the next parameters to the settings:

  • url : string, the URL for storing the results (e.g.: s3://bucket/key)
  • credential_id : string (optional)
  • endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default)

To save scored data to JDBC, add the next parameters to the settings:

  • data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).
  • table : string, the name of specified database table.
  • schema : string (optional), the name of specified database schema.
  • catalog : string (optional), (new in v2.22) the name of specified database catalog.
  • statement_type : string, the type of insertion statement to create, one of datarobot.enums.AVAILABLE_STATEMENT_TYPES.
  • update_columns : list(string) (optional), a list of strings containing those column names to be updated in case statement_type is set to a value related to update or upsert.
  • where_columns : list(string) (optional), a list of strings containing those column names to be selected in case statement_type is set to a value related to insert or update.
  • credential_id : string, the ID of the credentials holding information about a user with write-access to the JDBC data source (see Credentials).
  • create_table_if_not_exists : bool (optional), If no existing table is detected, attempt to create it before writing data with the strategy defined in the statementType parameter.
csv_settings : dict (optional)

CSV intake and output settings. Supported options:

  • delimiter : string (optional, default ,), fields are delimited by this character. Use the string tab to denote TSV (TAB separated values). Must be either a one-character string or the string tab.
  • quotechar : string (optional, default ), fields containing the delimiter must be quoted using this character.
  • encoding : string (optional, default utf-8), encoding for the CSV files. For example (but not limited to): shift_jis, latin_1 or mskanji.
timeseries_settings : dict (optional)

Configuration for time-series scoring. Supported options:

  • type : string, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point or rows in the dataset without target. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.
  • forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions, by default value will be inferred from the dataset. May be passed if timeseries_settings.type=forecast.
  • predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
  • predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
  • relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
num_concurrent : int (optional)

Number of concurrent chunks to score simultaneously. Defaults to the available number of cores of the deployment. Lower it to leave resources for real-time scoring.

chunk_size : string or int (optional)

Which strategy should be used to determine the chunk size. Can be either a named strategy or a fixed size in bytes. - auto: use fixed or dynamic based on flipper - fixed: use 1MB for explanations, 5MB for regular requests - dynamic: use dynamic chunk sizes - int: use this many bytes per chunk

passthrough_columns : list[string] (optional)

Keep these columns from the scoring dataset in the scored dataset. This is useful for correlating predictions with source data.

passthrough_columns_set : string (optional)

To pass through every column from the scoring dataset, set this to all. Takes precedence over passthrough_columns if set.

max_explanations : int (optional)

Compute prediction explanations for this amount of features.

max_ngram_explanations : int or str (optional)

Compute text explanations for this amount of ngrams. Set to all to return all ngram explanations, or set to a positive integer value to limit the amount of ngram explanations returned. By default no ngram explanations will be computed and returned.

threshold_high : float (optional)

Only compute prediction explanations for predictions above this threshold. Can be combined with threshold_low.

threshold_low : float (optional)

Only compute prediction explanations for predictions below this threshold. Can be combined with threshold_high.

explanations_mode : PredictionExplanationsMode, optional

Mode of prediction explanations calculation for multiclass and clustering models, if not specified - server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).

prediction_warning_enabled : boolean (optional)

Add prediction warnings to the scored data. Currently only supported for regression models.

include_prediction_status : boolean (optional)

Include the prediction_status column in the output, defaults to False.

skip_drift_tracking : boolean (optional)

Skips drift tracking on any predictions made from this job. This is useful when running non-production workloads to not affect drift tracking and cause unnecessary alerts. Defaults to False.

prediction_instance : dict (optional)

Defaults to instance specified by deployment or system configuration. Supported options:

  • hostName : string
  • sslEnabled : boolean (optional, default true). Set to false to run prediction requests from the batch prediction job without SSL.
  • datarobotKey : string (optional), if running a job against a prediction instance in the Managed AI Cloud, you must provide the organization level DataRobot-Key
  • apiKey : string (optional), by default, prediction requests will use the API key of the user that created the job. This allows you to make requests on behalf of other users.
abort_on_error : boolean (optional)

Default behavior is to abort the job if too many rows fail scoring. This will free up resources for other jobs that may score successfully. Set to false to unconditionally score every row no matter how many errors are encountered. Defaults to True.

column_names_remapping : dict (optional)

Mapping with column renaming for output table. Defaults to {}.

include_probabilities : boolean (optional)

Flag that enables returning of all probability columns. Defaults to True.

include_probabilities_classes : list (optional)

List the subset of classes if a user doesn’t want all the classes. Defaults to [].

download_timeout : int (optional)

New in version 2.22.

If using localFile output, wait this many seconds for the download to become available. See download().

download_read_timeout : int (optional, default 660)

New in version 2.22.

If using localFile output, wait this many seconds for the server to respond between chunks.

upload_read_timeout: int (optional, default 600)

New in version 2.28.

If using localFile intake, wait this many seconds for the server to respond after whole dataset upload.

classmethod apply_time_series_data_prep_and_score(deployment: Deployment, intake_settings: IntakeSettings, timeseries_settings: TimeSeriesSettings, **kwargs) → BatchPredictionJob

Prepare the dataset with time series data prep, create new batch prediction job, upload the scoring dataset, and return a batch prediction job.

The supported intake_settings are of type localFile or dataset.

For timeseries_settings of type forecast the forecast_point must be specified.

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

New in version v3.1.

Returns:
BatchPredictionJob

Instance of BatchPredictionJob

Raises:
InvalidUsageError

If the deployment does not support time series data prep. If the intake type is not supported for time series data prep.

Attributes:
deployment : Deployment

Deployment which will be used for scoring.

intake_settings : dict

A dict configuring where data is coming from. Supported options:

  • type : string, either localFile, dataset

Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a Dataset object.

To score from a local file, add this parameter to the settings:

  • file : file-like object, string path to file or a pandas.DataFrame of scoring data.
timeseries_settings : dict

Configuration for time-series scoring. Supported options:

  • type : string, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.
  • forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions. Must be passed if timeseries_settings.type=forecast.
  • predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
  • predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
  • relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
classmethod score_to_file(deployment: DeploymentType, intake_path, output_path: str, **kwargs)

Create new batch prediction job, upload the scoring dataset and download the scored CSV file concurrently.

Will block until the entire file is scored.

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Returns:
BatchPredictionJob

Instance of BatchPredictionJob

Attributes:
deployment : Deployment or string ID

Deployment which will be used for scoring.

intake_path : file-like object/string path to file/pandas.DataFrame

Scoring data

output_path : str

Filename to save the result under

classmethod apply_time_series_data_prep_and_score_to_file(deployment: Deployment, intake_path: Union[str, pd.DataFrame, io.IOBase], output_path: str, timeseries_settings: TimeSeriesSettings, **kwargs) → BatchPredictionJob

Prepare the input dataset with time series data prep. Then, create a new batch prediction job using the prepared AI catalog item as input and concurrently download the scored CSV file.

The function call will return when the entire file is scored.

For timeseries_settings of type forecast the forecast_point must be specified.

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

New in version v3.1.

Returns:
BatchPredictionJob

Instance of BatchPredictionJob.

Raises:
InvalidUsageError

If the deployment does not support time series data prep.

Attributes:
deployment : Deployment

The deployment which will be used for scoring.

intake_path : file-like object/string path to file/pandas.DataFrame

The scoring data.

output_path : str

The filename under which you save the result.

timeseries_settings : dict

Configuration for time-series scoring. Supported options:

  • type : string, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.
  • forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions. Must be passed if timeseries_settings.type=forecast.
  • predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
  • predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
  • relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
classmethod score_s3(deployment: DeploymentType, source_url: str, destination_url: str, credential=None, endpoint_url: Optional[str] = None, **kwargs)

Create new batch prediction job, with a scoring dataset from S3 and writing the result back to S3.

This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job)

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Returns:
BatchPredictionJob

Instance of BatchPredictionJob

Attributes:
deployment : Deployment or string ID

Deployment which will be used for scoring.

source_url : string

The URL for the prediction dataset (e.g.: s3://bucket/key)

destination_url : string

The URL for the scored dataset (e.g.: s3://bucket/key)

credential : string or Credential (optional)

The AWS Credential object or credential id

endpoint_url : string (optional)

Any non-default endpoint URL for S3 access (omit to use the default)

classmethod score_azure(deployment: DeploymentType, source_url: str, destination_url: str, credential=None, **kwargs)

Create new batch prediction job, with a scoring dataset from Azure blob storage and writing the result back to Azure blob storage.

This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job).

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Returns:
BatchPredictionJob

Instance of BatchPredictionJob

Attributes:
deployment : Deployment or string ID

Deployment which will be used for scoring.

source_url : string

The URL for the prediction dataset (e.g.: https://storage_account.blob.endpoint/container/blob_name)

destination_url : string

The URL for the scored dataset (e.g.: https://storage_account.blob.endpoint/container/blob_name)

credential : string or Credential (optional)

The Azure Credential object or credential id

classmethod score_gcp(deployment: DeploymentType, source_url: str, destination_url: str, credential=None, **kwargs)

Create new batch prediction job, with a scoring dataset from Google Cloud Storage and writing the result back to one.

This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job).

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Returns:
BatchPredictionJob

Instance of BatchPredictionJob

Attributes:
deployment : Deployment or string ID

Deployment which will be used for scoring.

source_url : string

The URL for the prediction dataset (e.g.: http(s)://storage.googleapis.com/[bucket]/[object])

destination_url : string

The URL for the scored dataset (e.g.: http(s)://storage.googleapis.com/[bucket]/[object])

credential : string or Credential (optional)

The GCP Credential object or credential id

classmethod score_from_existing(batch_prediction_job_id: str) → datarobot.models.batch_prediction_job.BatchPredictionJob

Create a new batch prediction job based on the settings from a previously created one

Returns:
BatchPredictionJob

Instance of BatchPredictionJob

Attributes:
batch_prediction_job_id: str

ID of the previous batch prediction job

classmethod score_pandas(deployment: DeploymentType, df: pd.DataFrame, read_timeout: int = 660, **kwargs) → Tuple[BatchPredictionJob, pd.DataFrame]

Run a batch prediction job, with a scoring dataset from a pandas dataframe. The output from the prediction will be joined to the passed DataFrame and returned.

Use columnNamesRemapping to drop or rename columns in the output

This method blocks until the job has completed or raises an exception on errors.

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Returns:
BatchPredictionJob

Instance of BatchPredictonJob

pandas.DataFrame

The original dataframe merged with the predictions

Attributes:
deployment : Deployment or string ID

Deployment which will be used for scoring.

df : pandas.DataFrame

The dataframe to score

classmethod get(batch_prediction_job_id: str) → datarobot.models.batch_prediction_job.BatchPredictionJob

Get batch prediction job

Returns:
BatchPredictionJob

Instance of BatchPredictionJob

Attributes:
batch_prediction_job_id: str

ID of batch prediction job

download(fileobj, timeout: int = 120, read_timeout: int = 660) → None

Downloads the CSV result of a prediction job

Attributes:
fileobj: file-like object

Write CSV data to this file-like object

timeout : int (optional, default 120)

New in version 2.22.

Seconds to wait for the download to become available.

The download will not be available before the job has started processing. In case other jobs are occupying the queue, processing may not start immediately.

If the timeout is reached, the job will be aborted and RuntimeError is raised.

Set to -1 to wait infinitely.

read_timeout : int (optional, default 660)

New in version 2.22.

Seconds to wait for the server to respond between chunks.

delete(ignore_404_errors: bool = False) → None

Cancel this job. If this job has not finished running, it will be removed and canceled.

get_status()

Get status of batch prediction job

Returns:
BatchPredictionJob status data

Dict with job status

classmethod list_by_status(statuses: Optional[List[str]] = None) → List[datarobot.models.batch_prediction_job.BatchPredictionJob]

Get jobs collection for specific set of statuses

Returns:
BatchPredictionJob statuses

List of job statuses dicts with specific statuses

Attributes:
statuses

List of statuses to filter jobs ([ABORTED|COMPLETED…]) if statuses is not provided, returns all jobs for user

class datarobot.models.BatchPredictionJobDefinition(id: Optional[str] = None, name: Optional[str] = None, enabled: Optional[bool] = None, schedule: Optional[Schedule] = None, batch_prediction_job=None, created: Optional[str] = None, updated: Optional[str] = None, created_by=None, updated_by=None, last_failed_run_time: Optional[str] = None, last_successful_run_time: Optional[str] = None, last_started_job_status: Optional[str] = None, last_scheduled_run_time: Optional[str] = None)
classmethod get(batch_prediction_job_definition_id: str) → datarobot.models.batch_prediction_job.BatchPredictionJobDefinition

Get batch prediction job definition

Returns:
BatchPredictionJobDefinition

Instance of BatchPredictionJobDefinition

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.get('5a8ac9ab07a57a0001be501f')
>>> definition
BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
Attributes:
batch_prediction_job_definition_id: str

ID of batch prediction job definition

classmethod list() → List[datarobot.models.batch_prediction_job.BatchPredictionJobDefinition]

Get job all definitions

Returns:
List[BatchPredictionJobDefinition]

List of job definitions the user has access to see

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.list()
>>> definition
[
    BatchPredictionJobDefinition(60912e09fd1f04e832a575c1),
    BatchPredictionJobDefinition(6086ba053f3ef731e81af3ca)
]
classmethod create(enabled: bool, batch_prediction_job, name: Optional[str] = None, schedule: Optional[Schedule] = None) → BatchPredictionJobDefinition

Creates a new batch prediction job definition to be run either at scheduled interval or as a manual run.

Returns:
BatchPredictionJobDefinition

Instance of BatchPredictionJobDefinition

Examples

>>> import datarobot as dr
>>> job_spec = {
...    "num_concurrent": 4,
...    "deployment_id": "foobar",
...    "intake_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...    "output_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...}
>>> schedule = {
...    "day_of_week": [
...        1
...    ],
...    "month": [
...        "*"
...    ],
...    "hour": [
...        16
...    ],
...    "minute": [
...        0
...    ],
...    "day_of_month": [
...        1
...    ]
...}
>>> definition = BatchPredictionJobDefinition.create(
...    enabled=False,
...    batch_prediction_job=job_spec,
...    name="some_definition_name",
...    schedule=schedule
... )
>>> definition
BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
Attributes:
enabled : bool (default False)

Whether or not the definition should be active on a scheduled basis. If True, schedule is required.

batch_prediction_job: dict

The job specifications for your batch prediction job. It requires the same job input parameters as used with score(), only it will not initialize a job scoring, only store it as a definition for later use.

name : string (optional)

The name you want your job to be identified with. Must be unique across the organization’s existing jobs. If you don’t supply a name, a random one will be generated for you.

schedule : dict (optional)

The schedule payload defines at what intervals the job should run, which can be combined in various ways to construct complex scheduling terms if needed. In all of the elements in the objects, you can supply either an asterisk ["*"] denoting “every” time denomination or an array of integers (e.g. [1, 2, 3]) to define a specific interval.

The schedule payload is split up in the following items:

Minute:

The minute(s) of the day that the job will run. Allowed values are either ["*"] meaning every minute of the day or [0 ... 59]

Hour: The hour(s) of the day that the job will run. Allowed values are either ["*"] meaning every hour of the day or [0 ... 23].

Day of Month: The date(s) of the month that the job will run. Allowed values are either [1 ... 31] or ["*"] for all days of the month. This field is additive with dayOfWeek, meaning the job will run both on the date(s) defined in this field and the day specified by dayOfWeek (for example, dates 1st, 2nd, 3rd, plus every Tuesday). If dayOfMonth is set to ["*"] and dayOfWeek is defined, the scheduler will trigger on every day of the month that matches dayOfWeek (for example, Tuesday the 2nd, 9th, 16th, 23rd, 30th). Invalid dates such as February 31st are ignored.

Month: The month(s) of the year that the job will run. Allowed values are either [1 ... 12] or ["*"] for all months of the year. Strings, either 3-letter abbreviations or the full name of the month, can be used interchangeably (e.g., “jan” or “october”). Months that are not compatible with dayOfMonth are ignored, for example {"dayOfMonth": [31], "month":["feb"]}

Day of Week: The day(s) of the week that the job will run. Allowed values are [0 .. 6], where (Sunday=0), or ["*"], for all days of the week. Strings, either 3-letter abbreviations or the full name of the day, can be used interchangeably (e.g., “sunday”, “Sunday”, “sun”, or “Sun”, all map to [0]. This field is additive with dayOfMonth, meaning the job will run both on the date specified by dayOfMonth and the day defined in this field.

update(enabled: bool, batch_prediction_job=None, name: Optional[str] = None, schedule: Optional[Schedule] = None) → BatchPredictionJobDefinition

Updates a job definition with the changed specs.

Takes the same input as create()

Returns:
BatchPredictionJobDefinition

Instance of the updated BatchPredictionJobDefinition

Examples

>>> import datarobot as dr
>>> job_spec = {
...    "num_concurrent": 5,
...    "deployment_id": "foobar_new",
...    "intake_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...    "output_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...}
>>> schedule = {
...    "day_of_week": [
...        1
...    ],
...    "month": [
...        "*"
...    ],
...    "hour": [
...        "*"
...    ],
...    "minute": [
...        30, 59
...    ],
...    "day_of_month": [
...        1, 2, 6
...    ]
...}
>>> definition = BatchPredictionJobDefinition.create(
...    enabled=False,
...    batch_prediction_job=job_spec,
...    name="updated_definition_name",
...    schedule=schedule
... )
>>> definition
BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
Attributes:
enabled : bool (default False)

Same as enabled in create().

batch_prediction_job: dict

Same as batch_prediction_job in create().

name : string (optional)

Same as name in create().

schedule : dict

Same as schedule in create().

run_on_schedule(schedule: Schedule) → BatchPredictionJobDefinition

Sets the run schedule of an already created job definition.

If the job was previously not enabled, this will also set the job to enabled.

Returns:
BatchPredictionJobDefinition

Instance of the updated BatchPredictionJobDefinition with the new / updated schedule.

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.create('...')
>>> schedule = {
...    "day_of_week": [
...        1
...    ],
...    "month": [
...        "*"
...    ],
...    "hour": [
...        "*"
...    ],
...    "minute": [
...        30, 59
...    ],
...    "day_of_month": [
...        1, 2, 6
...    ]
...}
>>> definition.run_on_schedule(schedule)
BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
Attributes:
schedule : dict

Same as schedule in create().

run_once() → datarobot.models.batch_prediction_job.BatchPredictionJob

Manually submits a batch prediction job to the queue, based off of an already created job definition.

Returns:
BatchPredictionJob

Instance of BatchPredictionJob

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.create('...')
>>> job = definition.run_once()
>>> job.wait_for_completion()
delete() → None

Deletes the job definition and disables any future schedules of this job if any. If a scheduled job is currently running, this will not be cancelled.

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.get('5a8ac9ab07a57a0001be501f')
>>> definition.delete()

Blueprint

class datarobot.models.Blueprint(id: Optional[str] = None, processes: Optional[List[str]] = None, model_type: Optional[str] = None, project_id: Optional[str] = None, blueprint_category: Optional[str] = None, monotonic_increasing_featurelist_id: Optional[str] = None, monotonic_decreasing_featurelist_id: Optional[str] = None, supports_monotonic_constraints: Optional[bool] = None, recommended_featurelist_id: Optional[str] = None, supports_composable_ml: Optional[bool] = None)

A Blueprint which can be used to fit models

Attributes:
id : str

the id of the blueprint

processes : list of str

the processes used by the blueprint

model_type : str

the model produced by the blueprint

project_id : str

the project the blueprint belongs to

blueprint_category : str

(New in version v2.6) Describes the category of the blueprint and the kind of model it produces.

recommended_featurelist_id: str or null

(New in v2.18) The ID of the feature list recommended for this blueprint. If this field is not present, then there is no recommended feature list.

supports_composable_ml : bool or None

(New in version v2.26) whether this blueprint is supported in the Composable ML.

classmethod get(project_id: str, blueprint_id: str) → datarobot.models.blueprint.Blueprint

Retrieve a blueprint.

Parameters:
project_id : str

The project’s id.

blueprint_id : str

Id of blueprint to retrieve.

Returns:
blueprint : Blueprint

The queried blueprint.

get_chart() → datarobot.models.blueprint.BlueprintChart

Retrieve a chart.

Returns:
BlueprintChart

The current blueprint chart.

get_documents() → List[datarobot.models.blueprint.BlueprintTaskDocument]

Get documentation for tasks used in the blueprint.

Returns:
list of BlueprintTaskDocument

All documents available for blueprint.

classmethod from_data(data: Union[Dict[str, Any], List[Dict[str, Any]]]) → T

Instantiate an object of this class using a dict.

Parameters:
data : dict

Correctly snake_cased keys and their values.

classmethod from_server_data(data: Union[Dict[str, Any], List[Dict[str, Any]]], keep_attrs: Optional[Iterable[str]] = None) → T

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters:
data : dict

The directly translated dict of JSON from the server. No casing fixes have taken place

keep_attrs : iterable

List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

class datarobot.models.BlueprintTaskDocument(title: Optional[str] = None, task: Optional[str] = None, description: Optional[str] = None, parameters: Optional[List[ParameterType]] = None, links: Optional[List[LinkType]] = None, references: Optional[List[ReferenceType]] = None)

Document describing a task from a blueprint.

Attributes:
title : str

Title of document.

task : str

Name of the task described in document.

description : str

Task description.

parameters : list of dict(name, type, description)

Parameters that task can receive in human-readable format.

links : list of dict(name, url)

External links used in document

references : list of dict(name, url)

References used in document. When no link available url equals None.

class datarobot.models.BlueprintChart(nodes: List[Dict[str, str]], edges: List[Tuple[str, str]])

A Blueprint chart that can be used to understand data flow in blueprint.

Attributes:
nodes : list of dict (id, label)

Chart nodes, id unique in chart.

edges : list of tuple (id1, id2)

Directions of data flow between blueprint chart nodes.

classmethod get(project_id: str, blueprint_id: str) → datarobot.models.blueprint.BlueprintChart

Retrieve a blueprint chart.

Parameters:
project_id : str

The project’s id.

blueprint_id : str

Id of blueprint to retrieve chart.

Returns:
BlueprintChart

The queried blueprint chart.

to_graphviz() → str

Get blueprint chart in graphviz DOT format.

Returns:
unicode

String representation of chart in graphviz DOT language.

class datarobot.models.ModelBlueprintChart(nodes: List[Dict[str, str]], edges: List[Tuple[str, str]])

A Blueprint chart that can be used to understand data flow in model. Model blueprint chart represents reduced repository blueprint chart with only elements that used to build this particular model.

Attributes:
nodes : list of dict (id, label)

Chart nodes, id unique in chart.

edges : list of tuple (id1, id2)

Directions of data flow between blueprint chart nodes.

classmethod get(project_id: str, model_id: str) → datarobot.models.blueprint.ModelBlueprintChart

Retrieve a model blueprint chart.

Parameters:
project_id : str

The project’s id.

model_id : str

Id of model to retrieve model blueprint chart.

Returns:
ModelBlueprintChart

The queried model blueprint chart.

to_graphviz() → str

Get blueprint chart in graphviz DOT format.

Returns:
unicode

String representation of chart in graphviz DOT language.

Calendar File

class datarobot.CalendarFile(calendar_end_date: Optional[str] = None, calendar_start_date: Optional[str] = None, created: Optional[str] = None, id: Optional[str] = None, name: Optional[str] = None, num_event_types: Optional[int] = None, num_events: Optional[int] = None, project_ids: Optional[List[str]] = None, role: Optional[str] = None, multiseries_id_columns: Optional[List[str]] = None)

Represents the data for a calendar file.

For more information about calendar files, see the calendar documentation.

Attributes:
id : str

The id of the calendar file.

calendar_start_date : str

The earliest date in the calendar.

calendar_end_date : str

The last date in the calendar.

created : str

The date this calendar was created, i.e. uploaded to DR.

name : str

The name of the calendar.

num_event_types : int

The number of different event types.

num_events : int

The number of events this calendar has.

project_ids : list of strings

A list containing the projectIds of the projects using this calendar.

multiseries_id_columns: list of str or None

A list of columns in calendar which uniquely identify events for different series. Currently, only one column is supported. If multiseries id columns are not provided, calendar is considered to be single series.

role : str

The access role the user has for this calendar.

classmethod create(file_path: str, calendar_name: Optional[str] = None, multiseries_id_columns: Optional[List[str]] = None) → datarobot.models.calendar_file.CalendarFile

Creates a calendar using the given file. For information about calendar files, see the calendar documentation

The provided file must be a CSV in the format:

Date,   Event,          Series ID,    Event Duration
<date>, <event_type>,   <series id>,  <event duration>
<date>, <event_type>,              ,  <event duration>

A header row is required, and the “Series ID” and “Event Duration” columns are optional.

Once the CalendarFile has been created, pass its ID with the DatetimePartitioningSpecification when setting the target for a time series project in order to use it.

Parameters:
file_path : string

A string representing a path to a local csv file.

calendar_name : string, optional

A name to assign to the calendar. Defaults to the name of the file if not provided.

multiseries_id_columns : list of str or None

A list of the names of multiseries id columns to define which series an event belongs to. Currently only one multiseries id column is supported.

Returns:
calendar_file : CalendarFile

Instance with initialized data.

Raises:
AsyncProcessUnsuccessfulError

Raised if there was an error processing the provided calendar file.

Examples

# Creating a calendar with a specified name
cal = dr.CalendarFile.create('/home/calendars/somecalendar.csv',
                                         calendar_name='Some Calendar Name')
cal.id
>>> 5c1d4904211c0a061bc93013
cal.name
>>> Some Calendar Name

# Creating a calendar without specifying a name
cal = dr.CalendarFile.create('/home/calendars/somecalendar.csv')
cal.id
>>> 5c1d4904211c0a061bc93012
cal.name
>>> somecalendar.csv

# Creating a calendar with multiseries id columns
cal = dr.CalendarFile.create('/home/calendars/somemultiseriescalendar.csv',
                             calendar_name='Some Multiseries Calendar Name',
                             multiseries_id_columns=['series_id'])
cal.id
>>> 5da9bb21962d746f97e4daee
cal.name
>>> Some Multiseries Calendar Name
cal.multiseries_id_columns
>>> ['series_id']
classmethod create_calendar_from_dataset(dataset_id: str, dataset_version_id: Optional[str] = None, calendar_name: Optional[str] = None, multiseries_id_columns: Optional[List[str]] = None, delete_on_error: Optional[bool] = False) → datarobot.models.calendar_file.CalendarFile

Creates a calendar using the given dataset. For information about calendar files, see the calendar documentation

The provided dataset have the following format:

Date,   Event,          Series ID,    Event Duration
<date>, <event_type>,   <series id>,  <event duration>
<date>, <event_type>,              ,  <event duration>

The “Series ID” and “Event Duration” columns are optional.

Once the CalendarFile has been created, pass its ID with the DatetimePartitioningSpecification when setting the target for a time series project in order to use it.

Parameters:
dataset_id : string

The identifier of the dataset from which to create the calendar.

dataset_version_id : string, optional

The identifier of the dataset version from which to create the calendar.

calendar_name : string, optional

A name to assign to the calendar. Defaults to the name of the dataset if not provided.

multiseries_id_columns : list of str, optional

A list of the names of multiseries id columns to define which series an event belongs to. Currently only one multiseries id column is supported.

delete_on_error : boolean, optional

Whether delete calendar file from Catalog if it’s not valid.

Returns:
calendar_file : CalendarFile

Instance with initialized data.

Raises:
AsyncProcessUnsuccessfulError

Raised if there was an error processing the provided calendar file.

Examples

# Creating a calendar from a dataset
dataset = dr.Dataset.create_from_file('/home/calendars/somecalendar.csv')
cal = dr.CalendarFile.create_calendar_from_dataset(
    dataset.id, calendar_name='Some Calendar Name'
)
cal.id
>>> 5c1d4904211c0a061bc93013
cal.name
>>> Some Calendar Name

# Creating a calendar from a new dataset version
new_dataset_version = dr.Dataset.create_version_from_file(
    dataset.id, '/home/calendars/anothercalendar.csv'
)
cal = dr.CalendarFile.create(
    new_dataset_version.id, dataset_version_id=new_dataset_version.version_id
)
cal.id
>>> 5c1d4904211c0a061bc93012
cal.name
>>> anothercalendar.csv
classmethod create_calendar_from_country_code(country_code: str, start_date: datetime.datetime, end_date: datetime.datetime) → datarobot.models.calendar_file.CalendarFile

Generates a calendar based on the provided country code and dataset start date and end dates. The provided country code should be uppercase and 2-3 characters long. See CalendarFile.get_allowed_country_codes for a list of allowed country codes.

Parameters:
country_code : string

The country code for the country to use for generating the calendar.

start_date : datetime.datetime

The earliest date to include in the generated calendar.

end_date : datetime.datetime

The latest date to include in the generated calendar.

Returns:
calendar_file : CalendarFile

Instance with initialized data.

classmethod get_allowed_country_codes(offset: Optional[int] = None, limit: Optional[int] = None) → List[CountryCode]

Retrieves the list of allowed country codes that can be used for generating the preloaded calendars.

Parameters:
offset : int

Optional, defaults to 0. This many results will be skipped.

limit : int

Optional, defaults to 100, maximum 1000. At most this many results are returned.

Returns:
list

A list dicts, each of which represents an allowed country codes. Each item has the following structure:

classmethod get(calendar_id: str) → datarobot.models.calendar_file.CalendarFile

Gets the details of a calendar, given the id.

Parameters:
calendar_id : str

The identifier of the calendar.

Returns:
calendar_file : CalendarFile

The requested calendar.

Raises:
DataError

Raised if the calendar_id is invalid, i.e. the specified CalendarFile does not exist.

Examples

cal = dr.CalendarFile.get(some_calendar_id)
cal.id
>>> some_calendar_id
classmethod list(project_id: Optional[str] = None, batch_size: Optional[int] = None) → List[datarobot.models.calendar_file.CalendarFile]

Gets the details of all calendars this user has view access for.

Parameters:
project_id : str, optional

If provided, will filter for calendars associated only with the specified project.

batch_size : int, optional

The number of calendars to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of calendars. If not specified, an appropriate default will be chosen by the server.

Returns:
calendar_list : list of CalendarFile

A list of CalendarFile objects.

Examples

calendars = dr.CalendarFile.list()
len(calendars)
>>> 10
classmethod delete(calendar_id: str) → None

Deletes the calendar specified by calendar_id.

Parameters:
calendar_id : str

The id of the calendar to delete. The requester must have OWNER access for this calendar.

Raises:
ClientError

Raised if an invalid calendar_id is provided.

Examples

# Deleting with a valid calendar_id
status_code = dr.CalendarFile.delete(some_calendar_id)
status_code
>>> 204
dr.CalendarFile.get(some_calendar_id)
>>> ClientError: Item not found
classmethod update_name(calendar_id: str, new_calendar_name: str) → int

Changes the name of the specified calendar to the specified name. The requester must have at least READ_WRITE permissions on the calendar.

Parameters:
calendar_id : str

The id of the calendar to update.

new_calendar_name : str

The new name to set for the specified calendar.

Returns:
status_code : int

200 for success

Raises:
ClientError

Raised if an invalid calendar_id is provided.

Examples

response = dr.CalendarFile.update_name(some_calendar_id, some_new_name)
response
>>> 200
cal = dr.CalendarFile.get(some_calendar_id)
cal.name
>>> some_new_name
classmethod share(calendar_id: str, access_list: List[datarobot.models.sharing.SharingAccess]) → int

Shares the calendar with the specified users, assigning the specified roles.

Parameters:
calendar_id : str

The id of the calendar to update

access_list:

A list of dr.SharingAccess objects. Specify None for the role to delete a user’s access from the specified CalendarFile. For more information on specific access levels, see the sharing documentation.

Returns:
status_code : int

200 for success

Raises:
ClientError

Raised if unable to update permissions for a user.

AssertionError

Raised if access_list is invalid.

Examples

# assuming some_user is a valid user, share this calendar with some_user
sharing_list = [dr.SharingAccess(some_user_username,
                                 dr.enums.SHARING_ROLE.READ_WRITE)]
response = dr.CalendarFile.share(some_calendar_id, sharing_list)
response.status_code
>>> 200

# delete some_user from this calendar, assuming they have access of some kind already
delete_sharing_list = [dr.SharingAccess(some_user_username,
                                        None)]
response = dr.CalendarFile.share(some_calendar_id, delete_sharing_list)
response.status_code
>>> 200

# Attempt to add an invalid user to a calendar
invalid_sharing_list = [dr.SharingAccess(invalid_username,
                                         dr.enums.SHARING_ROLE.READ_WRITE)]
dr.CalendarFile.share(some_calendar_id, invalid_sharing_list)
>>> ClientError: Unable to update access for this calendar
classmethod get_access_list(calendar_id: str, batch_size: Optional[int] = None) → List[datarobot.models.sharing.SharingAccess]

Retrieve a list of users that have access to this calendar.

Parameters:
calendar_id : str

The id of the calendar to retrieve the access list for.

batch_size : int, optional

The number of access records to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of calendars. If not specified, an appropriate default will be chosen by the server.

Returns:
access_control_list : list of SharingAccess

A list of SharingAccess objects.

Raises:
ClientError

Raised if user does not have access to calendar or calendar does not exist.

Automated Documentation

class datarobot.models.automated_documentation.AutomatedDocument(entity_id=None, document_type=None, output_format=None, locale=None, template_id=None, id=None, filepath=None, created_at=None)

An automated documentation object.

New in version v2.24.

Attributes:
document_type : str or None

Type of automated document. You can specify: MODEL_COMPLIANCE, AUTOPILOT_SUMMARY depending on your account settings. Required for document generation.

entity_id : str or None

ID of the entity to generate the document for. It can be model ID or project ID. Required for document generation.

output_format : str or None

Format of the generate document, either docx or html. Required for document generation.

locale : str or None

Localization of the document, dependent on your account settings. Default setting is EN_US.

template_id : str or None

Template ID to use for the document outline. Defaults to standard DataRobot template. See the documentation for ComplianceDocTemplate for more information.

id : str or None

ID of the document. Required to download or delete a document.

filepath : str or None

Path to save a downloaded document to. Either include a file path and name or the file will be saved to the directory from which the script is launched.

created_at : datetime or None

Document creation timestamp.

classmethod list_available_document_types()

Get a list of all available document types and locales.

Returns:
List of dicts

Examples

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
doc_types = dr.AutomatedDocument.list_available_document_types()
is_model_compliance_initialized

Check if model compliance documentation pre-processing is initialized. Model compliance documentation pre-processing must be initialized before generating documentation for a custom model.

Returns:
Tuple of (boolean, string)
  • boolean flag is whether model compliance documentation pre-processing is initialized
  • string value is the initialization status
initialize_model_compliance()

Initialize model compliance documentation pre-processing. Must be called before generating documentation for a custom model.

Returns:
Tuple of (boolean, string)
  • boolean flag is whether model compliance documentation pre-processing is initialized
  • string value is the initialization status

Examples

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)

# NOTE: entity_id is either a model id or a model package id
doc = dr.AutomatedDocument(
        document_type="MODEL_COMPLIANCE",
        entity_id="6f50cdb77cc4f8d1560c3ed5",
        output_format="docx",
        locale="EN_US")

doc.initialize_model_compliance()
generate(max_wait: int = 600) → requests.models.Response

Request generation of an automated document.

Required attributes to request document generation: document_type, entity_id, and output_format.

Returns:
requests.models.Response

Examples

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)

doc = dr.AutomatedDocument(
        document_type="MODEL_COMPLIANCE",
        entity_id="6f50cdb77cc4f8d1560c3ed5",
        output_format="docx",
        locale="EN_US",
        template_id="50efc9db8aff6c81a374aeec",
        filepath="/Users/username/Documents/example.docx"
        )

doc.generate()
doc.download()
download()

Download a generated Automated Document. Document ID is required to download a file.

Returns:
requests.models.Response

Examples

Generating and downloading the generated document:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)

doc = dr.AutomatedDocument(
        document_type="AUTOPILOT_SUMMARY",
        entity_id="6050d07d9da9053ebb002ef7",
        output_format="docx",
        filepath="/Users/username/Documents/Project_Report_1.docx"
        )

doc.generate()
doc.download()

Downloading an earlier generated document when you know the document ID:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
doc = dr.AutomatedDocument(id='5e8b6a34d2426053ab9a39ed')
doc.download()

Notice that filepath was not set for this document. In this case, the file is saved to the directory from which the script was launched.

Downloading a document chosen from a list of earlier generated documents:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)

model_id = "6f5ed3de855962e0a72a96fe"
docs = dr.AutomatedDocument.list_generated_documents(entity_ids=[model_id])
doc = docs[0]
doc.filepath = "/Users/me/Desktop/Recommended_model_doc.docx"
doc.download()
delete()

Delete a document using its ID.

Returns:
requests.models.Response

Examples

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
doc = dr.AutomatedDocument(id="5e8b6a34d2426053ab9a39ed")
doc.delete()

If you don’t know the document ID, you can follow the same workflow to get the ID as in the examples for the AutomatedDocument.download method.

classmethod list_generated_documents(document_types=None, entity_ids=None, output_formats=None, locales=None, offset=None, limit=None)

Get information about all previously generated documents available for your account. The information includes document ID and type, ID of the entity it was generated for, time of creation, and other information.

Parameters:
document_types : List of str or None

Query for one or more document types.

entity_ids : List of str or None

Query generated documents by one or more entity IDs.

output_formats : List of str or None

Query for one or more output formats.

locales : List of str or None

Query generated documents by one or more locales.

offset: int or None

Number of items to skip. Defaults to 0 if not provided.

limit: int or None

Number of items to return, maximum number of items is 1000.

Returns:
List of AutomatedDocument objects, where each object contains attributes described in
AutomatedDocument

Examples

To get a list of all generated documents:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
docs = AutomatedDocument.list_generated_documents()

To get a list of all AUTOPILOT_SUMMARY documents:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
docs = AutomatedDocument.list_generated_documents(document_types=["AUTOPILOT_SUMMARY"])

To get a list of 5 recently created automated documents in html format:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
docs = AutomatedDocument.list_generated_documents(output_formats=["html"], limit=5)

To get a list of automated documents created for specific entities (projects or models):

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)
docs = AutomatedDocument.list_generated_documents(
    entity_ids=["6051d3dbef875eb3be1be036",
                "6051d3e1fbe65cd7a5f6fde6",
                "6051d3e7f86c04486c2f9584"]
    )

Note, that the list of results contains AutomatedDocument objects, which means that you can execute class-related methods on them. Here’s how you can list, download, and then delete from the server all automated documents related to a certain entity:

import datarobot as dr

dr.Client(token=my_token, endpoint=endpoint)

ids = ["6051d3dbef875eb3be1be036", "5fe1d3d55cd810ebdb60c517f"]
docs = AutomatedDocument.list_generated_documents(entity_ids=ids)
for doc in docs:
    doc.download()
    doc.delete()

Class Mapping Aggregation Settings

For multiclass projects with a lot of unique values in target column you can specify the parameters for aggregation of rare values to improve the modeling performance and decrease the runtime and resource usage of resulting models.

class datarobot.helpers.ClassMappingAggregationSettings(max_unaggregated_class_values: Optional[int] = None, min_class_support: Optional[int] = None, excluded_from_aggregation: Optional[List[str]] = None, aggregation_class_name: Optional[str] = None)

Class mapping aggregation settings. For multiclass projects allows fine control over which target values will be preserved as classes. Classes which aren’t preserved will be - aggregated into a single “catch everything else” class in case of multiclass - or will be ignored in case of multilabel. All attributes are optional, if not specified - server side defaults will be used.

Attributes:
max_unaggregated_class_values : int, optional

Maximum amount of unique values allowed before aggregation kicks in.

min_class_support : int, optional

Minimum number of instances necessary for each target value in the dataset. All values with less instances will be aggregated.

excluded_from_aggregation : list, optional

List of target values that should be guaranteed to kept as is, regardless of other settings.

aggregation_class_name : str, optional

If some of the values will be aggregated - this is the name of the aggregation class that will replace them.

Client Configuration

datarobot.client.Client(token: Optional[str] = None, endpoint: Optional[str] = None, config_path: Optional[str] = None, connect_timeout: Optional[int] = None, user_agent_suffix: Optional[str] = None, ssl_verify: bool = True, max_retries: Union[int, urllib3.util.retry.Retry, None] = None, token_type: str = 'Token') → datarobot.rest.RESTClientObject

Configures the global API client for the Python SDK with optional configuration. Missing configuration will be read from env or config file.

Parameters:
token : str, optional

API token

endpoint : str, optional

Base url of API

config_path : str, optional

Alternate location of config file

connect_timeout : int, optional

How long the client should be willing to wait before establishing a connection with the server.

user_agent_suffix : str, optional

Additional text that is appended to the User-Agent HTTP header when communicating with the DataRobot REST API. This can be useful for identifying different applications that are built on top of the DataRobot Python Client, which can aid debugging and help track usage.

ssl_verify : bool or str, optional

Whether to check SSL certificate. Could be set to path with certificates of trusted certification authorities.

max_retries : int or datarobot.rest.Retry, optional

Either an integer number of times to retry connection errors, or a urllib3.util.retry.Retry object to configure retries.

token_type: str, “Token” by default

Authentication token type: Token, Bearer. “Bearer” is for DataRobot OAuth2 token, “Token” for token generated in Developer Tools.

Returns:
The RESTClientObject instance created.
datarobot.client.set_client(client: datarobot.rest.RESTClientObject) → Optional[datarobot.rest.RESTClientObject]

Configure the global HTTP client for the Python SDK. Returns previous instance.

datarobot.client.client_configuration(*args, **kwargs)

This context manager can be used to temporarily change the global HTTP client.

In multithreaded scenarios, it is highly recommended to use a fresh manager object per thread.

DataRobot does not recommend nesting these contexts.

Parameters:
args : Parameters passed to datarobot.client.Client()
kwargs : Keyword arguments passed to datarobot.Client()

Examples

from datarobot.client import client_configuration
from datarobot.models import Project

with client_configuration(token="api-key-here", endpoint="https://host-name.com"):
    Project.list()
from datarobot.client import Client, client_configuration
from datarobot.models import Project

Client()  # Interact with DataRobot using the default configuration.
Project.list()

with client_configuration(config_path="/path/to/a/drconfig.yaml"):
    # Interact with DataRobot using a different configuration.
    Project.list()
class datarobot.rest.RESTClientObject(auth: str, endpoint: str, connect_timeout: Optional[int] = 6.05, verify: bool = True, user_agent_suffix: Optional[str] = None, max_retries: Union[int, urllib3.util.retry.Retry, None] = None, authentication_type: Optional[str] = None)
Parameters
connect_timeout
timeout for http request and connection
headers
headers for outgoing requests
open_in_browser() → None

Opens the DataRobot app in a web browser, or logs the URL if a browser is not available.

Clustering

class datarobot.models.ClusteringModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, is_n_clusters_dynamically_determined=None, blueprint_id=None, metrics=None, project=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, n_clusters=None, has_empty_clusters=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, use_project_settings=None, supports_composable_ml=None)

ClusteringModel extends Model class. It provides provides properties and methods specific to clustering projects.

compute_insights(max_wait: int = 600) → List[datarobot.models.cluster_insight.ClusterInsight]

Compute and retrieve cluster insights for model. This method awaits completion of job computing cluster insights and returns results after it is finished. If computation takes longer than specified max_wait exception will be raised.

Parameters:
project_id: str

Project to start creation in.

model_id: str

Project’s model to start creation in.

max_wait: int

Maximum number of seconds to wait before giving up

Returns:
List of ClusterInsight
Raises:
ClientError

Server rejected creation due to client error. Most likely cause is bad project_id or model_id.

AsyncFailureError

If any of the responses from the server are unexpected

AsyncProcessUnsuccessfulError

If the cluster insights computation has failed or was cancelled.

AsyncTimeoutError

If the cluster insights computation did not resolve in time

insights

Return actual list of cluster insights if already computed.

Returns:
List of ClusterInsight
clusters

Return actual list of Clusters.

Returns:
List of Cluster
update_cluster_names(cluster_name_mappings: List[Tuple[str, str]]) → List[datarobot.models.cluster.Cluster]

Change many cluster names at once based on list of name mappings.

Parameters:
cluster_name_mappings: List of tuples

Cluster names mapping consisting of current cluster name and old cluster name. Example:

cluster_name_mappings = [
    ("current cluster name 1", "new cluster name 1"),
    ("current cluster name 2", "new cluster name 2")]
Returns:
List of Cluster
Raises:
datarobot.errors.ClientError

Server rejected update of cluster names. Possible reasons include: incorrect format of mapping, mapping introduces duplicates.

update_cluster_name(current_name: str, new_name: str) → List[datarobot.models.cluster.Cluster]

Change cluster name from current_name to new_name.

Parameters:
current_name: str

Current cluster name.

new_name: str

New cluster name.

Returns:
List of Cluster
Raises:
datarobot.errors.ClientError

Server rejected update of cluster names.

class datarobot.models.cluster.Cluster(**kwargs)

Representation of a single cluster.

Attributes:
name: str

Current cluster name

percent: float

Percent of data contained in the cluster. This value is reported after cluster insights are computed for the model.

classmethod list(project_id: str, model_id: str) → List[datarobot.models.cluster.Cluster]

Retrieve a list of clusters in the model.

Parameters:
project_id: str

ID of the project that the model is part of.

model_id: str

ID of the model.

Returns:
List of clusters
classmethod update_multiple_names(project_id: str, model_id: str, cluster_name_mappings: List[Tuple[str, str]]) → List[datarobot.models.cluster.Cluster]

Update many clusters at once based on list of name mappings.

Parameters:
project_id: str

ID of the project that the model is part of.

model_id: str

ID of the model.

cluster_name_mappings: List of tuples

Cluster name mappings, consisting of current and previous names for each cluster. Example:

cluster_name_mappings = [
    ("current cluster name 1", "new cluster name 1"),
    ("current cluster name 2", "new cluster name 2")]
Returns:
List of clusters
Raises:
datarobot.errors.ClientError

Server rejected update of cluster names.

ValueError

Invalid cluster name mapping provided.

classmethod update_name(project_id: str, model_id: str, current_name: str, new_name: str) → List[datarobot.models.cluster.Cluster]

Change cluster name from current_name to new_name

Parameters:
project_id: str

ID of the project that the model is part of.

model_id: str

ID of the model.

current_name: str

Current cluster name

new_name: str

New cluster name

Returns:
List of Cluster
class datarobot.models.cluster_insight.ClusterInsight(**kwargs)

Holds data on all insights related to feature as well as breakdown per cluster.

Parameters:
feature_name: str

Name of a feature from the dataset.

feature_type: str

Type of feature.

insights : List of classes (ClusterInsight)

List provides information regarding the importance of a specific feature in relation to each cluster. Results help understand how the model is grouping data and what each cluster represents.

feature_impact: float

Impact of a feature ranging from 0 to 1.

classmethod compute(project_id: str, model_id: str, max_wait: int = 600) → List[datarobot.models.cluster_insight.ClusterInsight]

Starts creation of cluster insights for the model and if successful, returns computed ClusterInsights. This method allows calculation to continue for a specified time and if not complete, cancels the request.

Parameters:
project_id: str

ID of the project to begin creation of cluster insights for.

model_id: str

ID of the project model to begin creation of cluster insights for.

max_wait: int

Maximum number of seconds to wait canceling the request.

Returns:
List[ClusterInsight]
Raises:
ClientError

Server rejected creation due to client error. Most likely cause is bad project_id or model_id.

AsyncFailureError

Indicates whether any of the responses from the server are unexpected.

AsyncProcessUnsuccessfulError

Indicates whether the cluster insights computation failed or was cancelled.

AsyncTimeoutError

Indicates whether the cluster insights computation did not resolve within the specified time limit (max_wait).

Compliance Documentation Templates

class datarobot.models.compliance_doc_template.ComplianceDocTemplate(id, creator_id, creator_username, name, org_id=None, sections=None)

A compliance documentation template. Templates are used to customize contents of AutomatedDocument.

New in version v2.14.

Notes

Each section dictionary has the following schema:

  • title : title of the section
  • type : type of section. Must be one of “datarobot”, “user” or “table_of_contents”.

Each type of section has a different set of attributes described bellow.

Section of type "datarobot" represent a section owned by DataRobot. DataRobot sections have the following additional attributes:

  • content_id : The identifier of the content in this section. You can get the default template with get_default for a complete list of possible DataRobot section content ids.
  • sections : list of sub-section dicts nested under the parent section.

Section of type "user" represent a section with user-defined content. Those sections may contain text generated by user and have the following additional fields:

  • regularText : regular text of the section, optionally separated by \n to split paragraphs.
  • highlightedText : highlighted text of the section, optionally separated by \n to split paragraphs.
  • sections : list of sub-section dicts nested under the parent section.

Section of type "table_of_contents" represent a table of contents and has no additional attributes.

Attributes:
id : str

the id of the template

name : str

the name of the template.

creator_id : str

the id of the user who created the template

creator_username : str

username of the user who created the template

org_id : str

the id of the organization the template belongs to

sections : list of dicts

the sections of the template describing the structure of the document. Section schema is described in Notes section above.

classmethod get_default(template_type=None)

Get a default DataRobot template. This template is used for generating compliance documentation when no template is specified.

Parameters:
template_type : str or None

Type of the template. Currently supported values are “normal” and “time_series”

Returns:
template : ComplianceDocTemplate

the default template object with sections attribute populated with default sections.

classmethod create_from_json_file(name, path)

Create a template with the specified name and sections in a JSON file.

This is useful when working with sections in a JSON file. Example:

default_template = ComplianceDocTemplate.get_default()
default_template.sections_to_json_file('path/to/example.json')
# ... edit example.json in your editor
my_template = ComplianceDocTemplate.create_from_json_file(
    name='my template',
    path='path/to/example.json'
)
Parameters:
name : str

the name of the template. Must be unique for your user.

path : str

the path to find the JSON file at

Returns:
template : ComplianceDocTemplate

the created template

classmethod create(name, sections)

Create a template with the specified name and sections.

Parameters:
name : str

the name of the template. Must be unique for your user.

sections : list

list of section objects

Returns:
template : ComplianceDocTemplate

the created template

classmethod get(template_id)

Retrieve a specific template.

Parameters:
template_id : str

the id of the template to retrieve

Returns:
template : ComplianceDocTemplate

the retrieved template

classmethod list(name_part=None, limit=None, offset=None)

Get a paginated list of compliance documentation template objects.

Parameters:
name_part : str or None

Return only the templates with names matching specified string. The matching is case-insensitive.

limit : int

The number of records to return. The server will use a (possibly finite) default if not specified.

offset : int

The number of records to skip.

Returns:
templates : list of ComplianceDocTemplate

the list of template objects

sections_to_json_file(path, indent=2)

Save sections of the template to a json file at the specified path

Parameters:
path : str

the path to save the file to

indent : int

indentation to use in the json file.

update(name=None, sections=None)

Update the name or sections of an existing doc template.

Note that default or non-existent templates can not be updated.

Parameters:
name : str, optional

the new name for the template

sections : list of dicts

list of sections

delete()

Delete the compliance documentation template.

Confusion Chart

class datarobot.models.confusion_chart.ConfusionChart(source, data, source_model_id)

Confusion Chart data for model.

Notes

ClassMetrics is a dict containing the following:

  • class_name (string) name of the class
  • actual_count (int) number of times this class is seen in the validation data
  • predicted_count (int) number of times this class has been predicted for the validation data
  • f1 (float) F1 score
  • recall (float) recall score
  • precision (float) precision score
  • was_actual_percentages (list of dict) one vs all actual percentages in format specified below.
    • other_class_name (string) the name of the other class
    • percentage (float) the percentage of the times this class was predicted when is was actually class (from 0 to 1)
  • was_predicted_percentages (list of dict) one vs all predicted percentages in format specified below.
    • other_class_name (string) the name of the other class
    • percentage (float) the percentage of the times this class was actual predicted (from 0 to 1)
  • confusion_matrix_one_vs_all (list of list) 2d list representing 2x2 one vs all matrix.
    • This represents the True/False Negative/Positive rates as integer for each class. The data structure looks like:
    • [ [ True Negative, False Positive ], [ False Negative, True Positive ] ]
Attributes:
source : str

Confusion Chart data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.

raw_data : dict

All of the raw data for the Confusion Chart

confusion_matrix : list of list

The NxN confusion matrix

classes : list

The names of each of the classes

class_metrics : list of dicts

List of dicts with schema described as ClassMetrics above.

source_model_id : str

ID of the model this Confusion chart represents; in some cases, insights from the parent of a frozen model may be used

Credentials

class datarobot.models.Credential(credential_id: Optional[str] = None, name: Optional[str] = None, credential_type: Optional[str] = None, creation_date: Optional[datetime.datetime] = None, description: Optional[str] = None)
classmethod list() → List[datarobot.models.credential.Credential]

Returns list of available credentials.

Returns:
credentials : list of Credential instances

contains a list of available credentials.

Examples

>>> import datarobot as dr
>>> data_sources = dr.Credential.list()
>>> data_sources
[
    Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'),
    Credential('5e42cc4dcf8a5f3256865840', 'my_jdbc_cred', 'jdbc'),
]
classmethod get(credential_id: str) → datarobot.models.credential.Credential

Gets the Credential.

Parameters:
credential_id : str

the identifier of the credential.

Returns:
credential : Credential

the requested credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.get('5a8ac9ab07a57a0001be501f')
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'),
delete() → None

Deletes the Credential the store.

Parameters:
credential_id : str

the identifier of the credential.

Returns:
credential : Credential

the requested credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.get('5a8ac9ab07a57a0001be501f')
>>> cred.delete()
classmethod create_basic(name: str, user: str, password: str, description: Optional[str] = None) → datarobot.models.credential.Credential

Creates the credentials.

Parameters:
name : str

the name to use for this set of credentials.

user : str

the username to store for this set of credentials.

password : str

the password to store for this set of credentials.

description : str, optional

the description to use for this set of credentials.

Returns:
credential : Credential

the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_basic(
...     name='my_basic_cred',
...     user='username',
...     password='password',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_basic_cred', 'basic'),
classmethod create_oauth(name: str, token: str, refresh_token: str, description: Optional[str] = None) → datarobot.models.credential.Credential

Creates the OAUTH credentials.

Parameters:
name : str

the name to use for this set of credentials.

token: str

the OAUTH token

refresh_token: str

The OAUTH token

description : str, optional

the description to use for this set of credentials.

Returns:
credential : Credential

the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_oauth(
...     name='my_oauth_cred',
...     token='XXX',
...     refresh_token='YYY',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_oauth_cred', 'oauth'),
classmethod create_s3(name: str, aws_access_key_id: Optional[str] = None, aws_secret_access_key: Optional[str] = None, aws_session_token: Optional[str] = None, description: Optional[str] = None) → datarobot.models.credential.Credential

Creates the S3 credentials.

Parameters:
name : str

the name to use for this set of credentials.

aws_access_key_id : str, optional

the AWS access key id.

aws_secret_access_key : str, optional

the AWS secret access key.

aws_session_token : str, optional

the AWS session token.

description : str, optional

the description to use for this set of credentials.

Returns:
credential : Credential

the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_s3(
...     name='my_s3_cred',
...     aws_access_key_id='XXX',
...     aws_secret_access_key='YYY',
...     aws_session_token='ZZZ',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'),
classmethod create_azure(name: str, azure_connection_string: str, description: Optional[str] = None) → datarobot.models.credential.Credential

Creates the Azure storage credentials.

Parameters:
name : str

the name to use for this set of credentials.

azure_connection_string : str

the Azure connection string.

description : str, optional

the description to use for this set of credentials.

Returns:
credential : Credential

the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_azure(
...     name='my_azure_cred',
...     azure_connection_string='XXX',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_azure_cred', 'azure'),
classmethod create_gcp(name: str, gcp_key: Union[str, Dict[str, str], None] = None, description: Optional[str] = None) → datarobot.models.credential.Credential

Creates the GCP credentials.

Parameters:
name : str

the name to use for this set of credentials.

gcp_key : str | dict

the GCP key in json format or parsed as dict.

description : str, optional

the description to use for this set of credentials.

Returns:
credential : Credential

the created credential.

Examples

>>> import datarobot as dr
>>> cred = dr.Credential.create_gcp(
...     name='my_gcp_cred',
...     gcp_key='XXX',
... )
>>> cred
Credential('5e429d6ecf8a5f36c5693e03', 'my_gcp_cred', 'gcp'),

Custom Models

class datarobot.models.custom_model_version.CustomModelFileItem(id, file_name, file_path, file_source, created_at=None)

A file item attached to a DataRobot custom model version.

New in version v2.21.

Attributes:
id: str

id of the file item

file_name: str

name of the file item

file_path: str

path of the file item

file_source: str

source of the file item

created_at: str, optional

ISO-8601 formatted timestamp of when the version was created

class datarobot.CustomInferenceModel(**kwargs)

A custom inference model.

New in version v2.21.

Attributes:
id: str

id of the custom model

name: str

name of the custom model

language: str

programming language of the custom model. Can be “python”, “r”, “java” or “other”

description: str

description of the custom model

target_type: datarobot.TARGET_TYPE

target type of the custom inference model. Values: [datarobot.TARGET_TYPE.BINARY, datarobot.TARGET_TYPE.REGRESSION, datarobot.TARGET_TYPE.MULTICLASS, datarobot.TARGET_TYPE.UNSTRUCTURED, datarobot.TARGET_TYPE.ANOMALY]

target_name: str, optional

Target feature name; it is optional(ignored if provided) for datarobot.TARGET_TYPE.UNSTRUCTURED or datarobot.TARGET_TYPE.ANOMALY target type

latest_version: datarobot.CustomModelVersion or None

latest version of the custom model if the model has a latest version

deployments_count: int

number of a deployments of the custom models

target_name: str

custom model target name

positive_class_label: str

for binary classification projects, a label of a positive class

negative_class_label: str

for binary classification projects, a label of a negative class

prediction_threshold: float

for binary classification projects, a threshold used for predictions

training_data_assignment_in_progress: bool

flag describing if training data assignment is in progress

training_dataset_id: str, optional

id of a dataset assigned to the custom model

training_dataset_version_id: str, optional

id of a dataset version assigned to the custom model

training_data_file_name: str, optional

name of assigned training data file

training_data_partition_column: str, optional

name of a partition column in a training dataset assigned to the custom model

created_by: str

username of a user who user who created the custom model

updated_at: str

ISO-8601 formatted timestamp of when the custom model was updated

created_at: str

ISO-8601 formatted timestamp of when the custom model was created

network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional

Determines whether the given custom model is isolated, or can access the public network. Can be either ‘datarobot.NONE’ or ‘datarobot.PUBLIC’

maximum_memory: int, optional

The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s

replicas: int, optional

A fixed number of replicas that will be deployed in the cluster

classmethod list(is_deployed=None, search_for=None, order_by=None)

List custom inference models available to the user.

New in version v2.21.

Parameters:
is_deployed: bool, optional

flag for filtering custom inference models. If set to True, only deployed custom inference models are returned. If set to False, only not deployed custom inference models are returned

search_for: str, optional

string for filtering custom inference models - only custom inference models that contain the string in name or description will be returned. If not specified, all custom models will be returned

order_by: str, optional

property to sort custom inference models by. Supported properties are “created” and “updated”. Prefix the attribute name with a dash to sort in descending order, e.g. order_by=’-created’. By default, the order_by parameter is None which will result in custom models being returned in order of creation time descending

Returns:
List[CustomInferenceModel]

a list of custom inference models.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod get(custom_model_id)

Get custom inference model by id.

New in version v2.21.

Parameters:
custom_model_id: str

id of the custom inference model

Returns:
CustomInferenceModel

retrieved custom inference model

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

download_latest_version(file_path)

Download the latest custom inference model version.

New in version v2.21.

Parameters:
file_path: str

path to create a file with custom model version content

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

classmethod create(name, target_type, target_name=None, language=None, description=None, positive_class_label=None, negative_class_label=None, prediction_threshold=None, class_labels=None, class_labels_file=None, network_egress_policy=None, maximum_memory=None, replicas=None)

Create a custom inference model.

New in version v2.21.

Parameters:
name: str

name of the custom inference model

target_type: datarobot.TARGET_TYPE

target type of the custom inference model. Values: [datarobot.TARGET_TYPE.BINARY, datarobot.TARGET_TYPE.REGRESSION, datarobot.TARGET_TYPE.MULTICLASS, datarobot.TARGET_TYPE.UNSTRUCTURED]

target_name: str, optional

Target feature name; it is optional(ignored if provided) for datarobot.TARGET_TYPE.UNSTRUCTURED target type

language: str, optional

programming language of the custom learning model

description: str, optional

description of the custom learning model

positive_class_label: str, optional

custom inference model positive class label for binary classification

negative_class_label: str, optional

custom inference model negative class label for binary classification

prediction_threshold: float, optional

custom inference model prediction threshold

class_labels: List[str], optional

custom inference model class labels for multiclass classification Cannot be used with class_labels_file

class_labels_file: str, optional

path to file containing newline separated class labels for multiclass classification. Cannot be used with class_labels

network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional

Determines whether the given custom model is isolated, or can access the public network. Can be either ‘datarobot.NONE’ or ‘datarobot.PUBLIC’

maximum_memory: int, optional

The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s

replicas: int, optional

A fixed number of replicas that will be deployed in the cluster

Returns:
CustomInferenceModel

created a custom inference model

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod copy_custom_model(custom_model_id)

Create a custom inference model by copying existing one.

New in version v2.21.

Parameters:
custom_model_id: str

id of the custom inference model to copy

Returns:
CustomInferenceModel

created a custom inference model

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

update(name=None, language=None, description=None, target_name=None, positive_class_label=None, negative_class_label=None, prediction_threshold=None, class_labels=None, class_labels_file=None)

Update custom inference model properties.

New in version v2.21.

Parameters:
name: str, optional

new custom inference model name

language: str, optional

new custom inference model programming language

description: str, optional

new custom inference model description

target_name: str, optional

new custom inference model target name

positive_class_label: str, optional

new custom inference model positive class label

negative_class_label: str, optional

new custom inference model negative class label

prediction_threshold: float, optional

new custom inference model prediction threshold

class_labels: List[str], optional

custom inference model class labels for multiclass classification Cannot be used with class_labels_file

class_labels_file: str, optional

path to file containing newline separated class labels for multiclass classification. Cannot be used with class_labels

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

refresh()

Update custom inference model with the latest data from server.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

delete()

Delete custom inference model.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

assign_training_data(dataset_id, partition_column=None, max_wait=600)

Assign training data to the custom inference model.

New in version v2.21.

Parameters:
dataset_id: str

the id of the training dataset to be assigned

partition_column: str, optional

name of a partition column in the training dataset

max_wait: int, optional

max time to wait for a training data assignment. If set to None - method will return without waiting. Defaults to 10 min

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

class datarobot.CustomModelTest(**kwargs)

An custom model test.

New in version v2.21.

Attributes:
id: str

test id

custom_model_image_id: str

id of a custom model image

image_type: str

the type of the image, either CUSTOM_MODEL_IMAGE_TYPE.CUSTOM_MODEL_IMAGE if the testing attempt is using a CustomModelImage as its model or CUSTOM_MODEL_IMAGE_TYPE.CUSTOM_MODEL_VERSION if the testing attempt is using a CustomModelVersion with dependency management

overall_status: str

a string representing testing status. Status can be - ‘not_tested’: the check not run - ‘failed’: the check failed - ‘succeeded’: the check succeeded - ‘warning’: the check resulted in a warning, or in non-critical failure - ‘in_progress’: the check is in progress

detailed_status: dict

detailed testing status - maps the testing types to their status and message. The keys of the dict are one of ‘errorCheck’, ‘nullValueImputation’, ‘longRunningService’, ‘sideEffects’. The values are dict with ‘message’ and ‘status’ keys.

created_by: str

a user who created a test

dataset_id: str, optional

id of a dataset used for testing

dataset_version_id: str, optional

id of a dataset version used for testing

completed_at: str, optional

ISO-8601 formatted timestamp of when the test has completed

created_at: str, optional

ISO-8601 formatted timestamp of when the version was created

network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional

Determines whether the given custom model is isolated, or can access the public network. Can be either ‘datarobot.NONE’ or ‘datarobot.PUBLIC’

maximum_memory: int, optional

The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s

replicas: int, optional

A fixed number of replicas that will be deployed in the cluster

classmethod create(custom_model_id, custom_model_version_id, dataset_id=None, max_wait=600, network_egress_policy=None, maximum_memory=None, replicas=None)

Create and start a custom model test.

New in version v2.21.

Parameters:
custom_model_id: str

the id of the custom model

custom_model_version_id: str

the id of the custom model version

dataset_id: str, optional

The id of the testing dataset for non-unstructured custom models. Ignored and not required for unstructured models.

max_wait: int, optional

max time to wait for a test completion. If set to None - method will return without waiting.

network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional

Determines whether the given custom model is isolated, or can access the public network. Can be either ‘datarobot.NONE’ or ‘datarobot.PUBLIC’

maximum_memory: int, optional

The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s

replicas: int, optional

A fixed number of replicas that will be deployed in the cluster

Returns:
CustomModelTest

created custom model test

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod list(custom_model_id)

List custom model tests.

New in version v2.21.

Parameters:
custom_model_id: str

the id of the custom model

Returns:
List[CustomModelTest]

a list of custom model tests

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod get(custom_model_test_id)

Get custom model test by id.

New in version v2.21.

Parameters:
custom_model_test_id: str

the id of the custom model test

Returns:
CustomModelTest

retrieved custom model test

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

get_log()

Get log of a custom model test.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

get_log_tail()

Get log tail of a custom model test.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

cancel()

Cancel custom model test that is in progress.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

refresh()

Update custom model test with the latest data from server.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

class datarobot.CustomModelVersion(**kwargs)

A version of a DataRobot custom model.

New in version v2.21.

Attributes:
id: str

id of the custom model version

custom_model_id: str

id of the custom model

version_minor: int

a minor version number of custom model version

version_major: int

a major version number of custom model version

is_frozen: bool

a flag if the custom model version is frozen

items: List[CustomModelFileItem]

a list of file items attached to the custom model version

base_environment_id: str

id of the environment to use with the model

base_environment_version_id: str

id of the environment version to use with the model

label: str, optional

short human readable string to label the version

description: str, optional

custom model version description

created_at: str, optional

ISO-8601 formatted timestamp of when the version was created

dependencies: List[CustomDependency]

the parsed dependencies of the custom model version if the version has a valid requirements.txt file

network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional

Determines whether the given custom model is isolated, or can access the public network. Can be either ‘datarobot.NONE’ or ‘datarobot.PUBLIC’

maximum_memory: int, optional

The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s

replicas: int, optional

A fixed number of replicas that will be deployed in the cluster

required_metadata_values: List[RequiredMetadataValue]

Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.

classmethod from_server_data(data, keep_attrs=None)

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters:
data : dict

The directly translated dict of JSON from the server. No casing fixes have taken place

keep_attrs : iterable

List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

classmethod create_clean(custom_model_id, base_environment_id, is_major_update=True, folder_path=None, files=None, network_egress_policy=None, maximum_memory=None, replicas=None, required_metadata_values=None)

Create a custom model version without files from previous versions.

New in version v2.21.

Parameters:
custom_model_id: str

the id of the custom model

base_environment_id: str

the id of the base environment to use with the custom model version

is_major_update: bool

the flag defining if a custom model version will be a minor or a major version. Default to True

folder_path: str, optional

the path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path

files: list, optional

the list of tuples, where values in each tuple are the local filesystem path and the path the file should be placed in the model. if list is of strings, then basenames will be used for tuples Example: [(“/home/user/Documents/myModel/file1.txt”, “file1.txt”), (“/home/user/Documents/myModel/folder/file2.txt”, “folder/file2.txt”)] or [“/home/user/Documents/myModel/file1.txt”, “/home/user/Documents/myModel/folder/file2.txt”]

network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional

Determines whether the given custom model is isolated, or can access the public network. Can be either ‘datarobot.NONE’ or ‘datarobot.PUBLIC’

maximum_memory: int, optional

The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s

replicas: int, optional

A fixed number of replicas that will be deployed in the cluster

required_metadata_values: List[RequiredMetadataValue]

Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.

Returns:
CustomModelVersion

created custom model version

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod create_from_previous(custom_model_id, base_environment_id, is_major_update=True, folder_path=None, files=None, files_to_delete=None, network_egress_policy=None, maximum_memory=None, replicas=None, required_metadata_values=None)

Create a custom model version containing files from a previous version.

New in version v2.21.

Parameters:
custom_model_id: str

the id of the custom model

base_environment_id: str

the id of the base environment to use with the custom model version

is_major_update: bool, optional

the flag defining if a custom model version will be a minor or a major version. Default to True

folder_path: str, optional

the path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path

files: list, optional

the list of tuples, where values in each tuple are the local filesystem path and the path the file should be placed in the model. if list is of strings, then basenames will be used for tuples Example: [(“/home/user/Documents/myModel/file1.txt”, “file1.txt”), (“/home/user/Documents/myModel/folder/file2.txt”, “folder/file2.txt”)] or [“/home/user/Documents/myModel/file1.txt”, “/home/user/Documents/myModel/folder/file2.txt”]

files_to_delete: list, optional

the list of a file items ids to be deleted Example: [“5ea95f7a4024030aba48e4f9”, “5ea6b5da402403181895cc51”]

network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional

Determines whether the given custom model is isolated, or can access the public network. Can be either ‘datarobot.NONE’ or ‘datarobot.PUBLIC’

maximum_memory: int, optional

The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s

replicas: int, optional

A fixed number of replicas that will be deployed in the cluster

required_metadata_values: List[RequiredMetadataValue]

Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.

Returns:
CustomModelVersion

created custom model version

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod list(custom_model_id)

List custom model versions.

New in version v2.21.

Parameters:
custom_model_id: str

the id of the custom model

Returns:
List[CustomModelVersion]

a list of custom model versions

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod get(custom_model_id, custom_model_version_id)

Get custom model version by id.

New in version v2.21.

Parameters:
custom_model_id: str

the id of the custom model

custom_model_version_id: str

the id of the custom model version to retrieve

Returns:
CustomModelVersion

retrieved custom model version

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

download(file_path)

Download custom model version.

New in version v2.21.

Parameters:
file_path: str

path to create a file with custom model version content

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

update(description=None, required_metadata_values=None)

Update custom model version properties.

New in version v2.21.

Parameters:
description: str

new custom model version description

required_metadata_values: List[RequiredMetadataValue]

Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

refresh()

Update custom model version with the latest data from server.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

get_feature_impact(with_metadata=False)

Get custom model feature impact.

New in version v2.23.

Parameters:
with_metadata : bool

The flag indicating if the result should include the metadata as well.

Returns:
feature_impacts : list of dict

The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

calculate_feature_impact(max_wait=600)

Calculate custom model feature impact.

New in version v2.23.

Parameters:
max_wait: int, optional

max time to wait for feature impact calculation. If set to None - method will return without waiting. Defaults to 10 min

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

class datarobot.models.execution_environment.RequiredMetadataKey(**kwargs)

Definition of a metadata key that custom models using this environment must define

New in version v2.25.

Attributes:
field_name: str

The required field key. This value will be added as an environment variable when running custom models.

display_name: str

A human readable name for the required field.

class datarobot.models.CustomModelVersionConversion(**kwargs)

A conversion of a DataRobot custom model version.

New in version v2.27.

Attributes:
id: str

ID of the custom model version conversion.

custom_model_version_id: str

ID of the custom model version.

created: str

ISO-8601 timestamp of when the custom model conversion created.

main_program_item_id: str or None

ID of the main program item.

log_message: str or None

The conversion output log message.

generated_metadata: dict or None

The dict contains two items: ‘outputDataset’ & ‘outputColumns’.

conversion_succeeded: bool

Whether the conversion succeeded or not.

conversion_in_progress: bool

Whether a given conversion is in progress or not.

should_stop: bool

Whether the user asked to stop a conversion.

classmethod run_conversion(custom_model_id, custom_model_version_id, main_program_item_id, max_wait=None)

Initiate a new custom model version conversion.

Parameters:
custom_model_id : str

The associated custom model ID.

custom_model_version_id : str

The associated custom model version ID.

main_program_item_id : str

The selected main program item ID. This should be one of the SAS items in the associated custom model version.

max_wait: int or None

Max wait time in seconds. If None, than don’t wait.

Returns:
conversion_id : str

The ID of the newly created conversion entity.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx statuscustom model conversion

classmethod stop_conversion(custom_model_id, custom_model_version_id, conversion_id)

Stop a conversion that is in progress.

Parameters:
custom_model_id : str

ID of the associated custom model.

custom_model_version_id : str

ID of the associated custom model version.

conversion_id

ID of a conversion that is in-progress.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

classmethod get(custom_model_id, custom_model_version_id, conversion_id)

Get custom model version conversion by id.

New in version v2.27.

Parameters:
custom_model_id: str

The ID of the custom model.

custom_model_version_id: str

The ID of the custom model version.

conversion_id: str

The ID of the conversion to retrieve.

Returns:
CustomModelVersionConversion

Retrieved custom model version conversion.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

classmethod get_latest(custom_model_id, custom_model_version_id)

Get latest custom model version conversion for a given custom model version.

New in version v2.27.

Parameters:
custom_model_id: str

The ID of the custom model.

custom_model_version_id: str

The ID of the custom model version.

Returns:
CustomModelVersionConversion or None

Retrieved latest conversion for a given custom model version.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

classmethod list(custom_model_id, custom_model_version_id)

Get custom model version conversions list per custom model version.

New in version v2.27.

Parameters:
custom_model_id: str

The ID of the custom model.

custom_model_version_id: str

The ID of the custom model version.

Returns:
List[CustomModelVersionConversion]

Retrieved conversions for a given custom model version.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

class datarobot.CustomModelVersionDependencyBuild(**kwargs)

Metadata about a DataRobot custom model version’s dependency build

New in version v2.22.

Attributes:
custom_model_id: str

id of the custom model

custom_model_version_id: str

id of the custom model version

build_status: str

the status of the custom model version’s dependency build

started_at: str

ISO-8601 formatted timestamp of when the build was started

completed_at: str, optional

ISO-8601 formatted timestamp of when the build has completed

classmethod get_build_info(custom_model_id, custom_model_version_id)

Retrieve information about a custom model version’s dependency build

New in version v2.22.

Parameters:
custom_model_id: str

the id of the custom model

custom_model_version_id: str

the id of the custom model version

Returns:
CustomModelVersionDependencyBuild

the dependency build information

classmethod start_build(custom_model_id, custom_model_version_id, max_wait=600)

Start the dependency build for a custom model version dependency build

New in version v2.22.

Parameters:
custom_model_id: str

the id of the custom model

custom_model_version_id: str

the id of the custom model version

max_wait: int, optional

max time to wait for a build completion. If set to None - method will return without waiting.

get_log()

Get log of a custom model version dependency build.

New in version v2.22.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

cancel()

Cancel custom model version dependency build that is in progress.

New in version v2.22.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

refresh()

Update custom model version dependency build with the latest data from server.

New in version v2.22.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

class datarobot.ExecutionEnvironment(**kwargs)

An execution environment entity.

New in version v2.21.

Attributes:
id: str

the id of the execution environment

name: str

the name of the execution environment

description: str, optional

the description of the execution environment

programming_language: str, optional

the programming language of the execution environment. Can be “python”, “r”, “java” or “other”

is_public: bool, optional

public accessibility of environment, visible only for admin user

created_at: str, optional

ISO-8601 formatted timestamp of when the execution environment version was created

latest_version: ExecutionEnvironmentVersion, optional

the latest version of the execution environment

classmethod create(name, description=None, programming_language=None, required_metadata_keys=None)

Create an execution environment.

New in version v2.21.

Parameters:
name: str

execution environment name

description: str, optional

execution environment description

programming_language: str, optional

programming language of the environment to be created. Can be “python”, “r”, “java” or “other”. Default value - “other”

required_metadata_keys: List[RequiredMetadataKey]

Definition of a metadata keys that custom models using this environment must define

Returns:
ExecutionEnvironment

created execution environment

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod list(search_for=None)

List execution environments available to the user.

New in version v2.21.

Parameters:
search_for: str, optional

the string for filtering execution environment - only execution environments that contain the string in name or description will be returned.

Returns:
List[ExecutionEnvironment]

a list of execution environments.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod get(execution_environment_id)

Get execution environment by it’s id.

New in version v2.21.

Parameters:
execution_environment_id: str

ID of the execution environment to retrieve

Returns:
ExecutionEnvironment

retrieved execution environment

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

delete()

Delete execution environment.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

update(name=None, description=None, required_metadata_keys=None)

Update execution environment properties.

New in version v2.21.

Parameters:
name: str, optional

new execution environment name

description: str, optional

new execution environment description

required_metadata_keys: List[RequiredMetadataKey]

Definition of a metadata keys that custom models using this environment must define

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

refresh()

Update execution environment with the latest data from server.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

class datarobot.ExecutionEnvironmentVersion(**kwargs)

A version of a DataRobot execution environment.

New in version v2.21.

Attributes:
id: str

the id of the execution environment version

environment_id: str

the id of the execution environment the version belongs to

build_status: str

the status of the execution environment version build

label: str, optional

the label of the execution environment version

description: str, optional

the description of the execution environment version

created_at: str, optional

ISO-8601 formatted timestamp of when the execution environment version was created

docker_context_size: int, optional

The size of the uploaded Docker context in bytes if available or None if not

docker_image_size: int, optional

The size of the built Docker image in bytes if available or None if not

classmethod create(execution_environment_id, docker_context_path, label=None, description=None, max_wait=600)

Create an execution environment version.

New in version v2.21.

Parameters:
execution_environment_id: str

the id of the execution environment

docker_context_path: str

the path to a docker context archive or folder

label: str, optional

short human readable string to label the version

description: str, optional

execution environment version description

max_wait: int, optional

max time to wait for a final build status (“success” or “failed”). If set to None - method will return without waiting.

Returns:
ExecutionEnvironmentVersion

created execution environment version

Raises:
datarobot.errors.AsyncTimeoutError

if version did not reach final state during timeout seconds

datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod list(execution_environment_id, build_status=None)

List execution environment versions available to the user.

New in version v2.21.

Parameters:
execution_environment_id: str

the id of the execution environment

build_status: str, optional

build status of the execution environment version to filter by. See datarobot.enums.EXECUTION_ENVIRONMENT_VERSION_BUILD_STATUS for valid options

Returns:
List[ExecutionEnvironmentVersion]

a list of execution environment versions.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod get(execution_environment_id, version_id)

Get execution environment version by id.

New in version v2.21.

Parameters:
execution_environment_id: str

the id of the execution environment

version_id: str

the id of the execution environment version to retrieve

Returns:
ExecutionEnvironmentVersion

retrieved execution environment version

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

download(file_path)

Download execution environment version.

New in version v2.21.

Parameters:
file_path: str

path to create a file with execution environment version content

Returns:
ExecutionEnvironmentVersion

retrieved execution environment version

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

get_build_log()

Get execution environment version build log and error.

New in version v2.21.

Returns:
Tuple[str, str]

retrieved execution environment version build log and error. If there is no build error - None is returned.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

refresh()

Update execution environment version with the latest data from server.

New in version v2.21.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

Custom Tasks

class datarobot.CustomTask(id: str, target_type: datarobot.enums.CUSTOM_TASK_TARGET_TYPE, latest_version: Optional[datarobot.models.custom_task_version.CustomTaskVersion], created_at: str, updated_at: str, name: str, description: str, language: datarobot.enums.Enum, created_by: str, calibrate_predictions: Optional[bool] = None)

A custom task. This can be in a partial state or a complete state. When the latest_version is None, the empty task has been initialized with some metadata. It is not yet use-able for actual training. Once the first CustomTaskVersion has been created, you can put the CustomTask in UserBlueprints to train Models in Projects

New in version v2.26.

Attributes:
id: str

id of the custom task

name: str

name of the custom task

language: str

programming language of the custom task. Can be “python”, “r”, “java” or “other”

description: str

description of the custom task

target_type: datarobot.enums.CUSTOM_TASK_TARGET_TYPE

the target type of the custom task. One of:

  • datarobot.enums.CUSTOM_TASK_TARGET_TYPE.BINARY
  • datarobot.enums.CUSTOM_TASK_TARGET_TYPE.REGRESSION
  • datarobot.enums.CUSTOM_TASK_TARGET_TYPE.MULTICLASS
  • datarobot.enums.CUSTOM_TASK_TARGET_TYPE.ANOMALY
  • datarobot.enums.CUSTOM_TASK_TARGET_TYPE.TRANSFORM
latest_version: datarobot.CustomTaskVersion or None

latest version of the custom task if the task has a latest version. If the latest version is None, the custom task is not ready for use in user blueprints. You must create its first CustomTaskVersion before you can use the CustomTask

created_by: str

username of a user who user who created the custom task

updated_at: str

ISO-8601 formatted timestamp of when the custom task was updated

created_at: str

ISO-8601 formatted timestamp of when the custom task was created

calibrate_predictions: bool

whether anomaly predictions should be calibrated to be between 0 and 1 by DR. only applies to custom estimators with target type datarobot.enums.CUSTOM_TASK_TARGET_TYPE.ANOMALY

classmethod from_server_data(data: Union[Dict[str, Any], List[Dict[str, Any]]], keep_attrs: Optional[Iterable[str]] = None) → datarobot.models.custom_task.CustomTask

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters:
data : dict

The directly translated dict of JSON from the server. No casing fixes have taken place

keep_attrs : iterable

List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

classmethod list(order_by: Optional[str] = None, search_for: Optional[str] = None) → List[datarobot.models.custom_task.CustomTask]

List custom tasks available to the user.

New in version v2.26.

Parameters:
search_for: str, optional

string for filtering custom tasks - only tasks that contain the string in name or description will be returned. If not specified, all custom task will be returned

order_by: str, optional

property to sort custom tasks by. Supported properties are “created” and “updated”. Prefix the attribute name with a dash to sort in descending order, e.g. order_by=’-created’. By default, the order_by parameter is None which will result in custom tasks being returned in order of creation time descending

Returns:
List[CustomTask]

a list of custom tasks.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod get(custom_task_id: str) → datarobot.models.custom_task.CustomTask

Get custom task by id.

New in version v2.26.

Parameters:
custom_task_id: str

id of the custom task

Returns:
CustomTask

retrieved custom task

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

classmethod copy(custom_task_id: str) → datarobot.models.custom_task.CustomTask

Create a custom task by copying existing one.

New in version v2.26.

Parameters:
custom_task_id: str

id of the custom task to copy

Returns:
CustomTask
Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod create(name: str, target_type: datarobot.enums.CUSTOM_TASK_TARGET_TYPE, language: Optional[datarobot.enums.Enum] = None, description: Optional[str] = None, calibrate_predictions: Optional[bool] = None, **kwargs) → datarobot.models.custom_task.CustomTask

Creates only the metadata for a custom task. This task will not be use-able until you have created a CustomTaskVersion attached to this task.

New in version v2.26.

Parameters:
name: str

name of the custom task

target_type: datarobot.enums.CUSTOM_TASK_TARGET_TYPE

the target typed based on the following values. Anything else will raise an error

  • datarobot.enums.CUSTOM_TASK_TARGET_TYPE.BINARY
  • datarobot.enums.CUSTOM_TASK_TARGET_TYPE.REGRESSION
  • datarobot.enums.CUSTOM_TASK_TARGET_TYPE.MULTICLASS
  • datarobot.enums.CUSTOM_TASK_TARGET_TYPE.ANOMALY
  • datarobot.enums.CUSTOM_TASK_TARGET_TYPE.TRANSFORM
language: str, optional

programming language of the custom task. Can be “python”, “r”, “java” or “other”

description: str, optional

description of the custom task

calibrate_predictions: bool, optional

whether anomaly predictions should be calibrated to be between 0 and 1 by DR. if None, uses default value from DR app (True). only applies to custom estimators with target type datarobot.enums.CUSTOM_TASK_TARGET_TYPE.ANOMALY

Returns:
CustomTask
Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

update(name: Optional[str] = None, language: Optional[datarobot.enums.Enum] = None, description: Optional[str] = None, **kwargs) → None

Update custom task properties.

New in version v2.26.

Parameters:
name: str, optional

new custom task name

language: str, optional

new custom task programming language

description: str, optional

new custom task description

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

refresh() → None

Update custom task with the latest data from server.

New in version v2.26.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

delete() → None

Delete custom task.

New in version v2.26.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

download_latest_version(file_path: str) → None

Download the latest custom task version.

New in version v2.26.

Parameters:
file_path: str

the full path of the target zip file

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

get_access_list() → List[datarobot.models.sharing.SharingAccess]

Retrieve access control settings of this custom task.

New in version v2.27.

Returns:
list of : class:SharingAccess <datarobot.SharingAccess>
share(access_list: List[datarobot.models.sharing.SharingAccess]) → None

Update the access control settings of this custom task.

New in version v2.27.

Parameters:
access_list : list of SharingAccess

A list of SharingAccess to update.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

Examples

Transfer access to the custom task from old_user@datarobot.com to new_user@datarobot.com

import datarobot as dr

new_access = dr.SharingAccess(new_user@datarobot.com,
                              dr.enums.SHARING_ROLE.OWNER, can_share=True)
access_list = [dr.SharingAccess(old_user@datarobot.com, None), new_access]

dr.CustomTask.get('custom-task-id').share(access_list)
class datarobot.models.custom_task_version.CustomTaskFileItem(id, file_name, file_path, file_source, created_at=None)

A file item attached to a DataRobot custom task version.

New in version v2.26.

Attributes:
id: str

id of the file item

file_name: str

name of the file item

file_path: str

path of the file item

file_source: str

source of the file item

created_at: str

ISO-8601 formatted timestamp of when the version was created

class datarobot.CustomTaskVersion(id, custom_task_id, version_major, version_minor, label, created_at, is_frozen, items, description=None, base_environment_id=None, maximum_memory=None, base_environment_version_id=None, dependencies=None, required_metadata_values=None, arguments=None)

A version of a DataRobot custom task.

New in version v2.26.

Attributes:
id: str

id of the custom task version

custom_task_id: str

id of the custom task

version_minor: int

a minor version number of custom task version

version_major: int

a major version number of custom task version

label: str

short human readable string to label the version

created_at: str

ISO-8601 formatted timestamp of when the version was created

is_frozen: bool

a flag if the custom task version is frozen

items: List[CustomTaskFileItem]

a list of file items attached to the custom task version

description: str, optional

custom task version description

base_environment_id: str, optional

id of the environment to use with the task

base_environment_version_id: str, optional

id of the environment version to use with the task

dependencies: List[CustomDependency]

the parsed dependencies of the custom task version if the version has a valid requirements.txt file

required_metadata_values: List[RequiredMetadataValue]

Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.

arguments: List[UserBlueprintTaskArgument]

A list of custom task version arguments.

classmethod from_server_data(data, keep_attrs=None)

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters:
data : dict

The directly translated dict of JSON from the server. No casing fixes have taken place

keep_attrs : iterable

List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

classmethod create_clean(custom_task_id, base_environment_id, maximum_memory=None, is_major_update=True, folder_path=None, required_metadata_values=None)

Create a custom task version without files from previous versions.

New in version v2.26.

Parameters:
custom_task_id: str

the id of the custom task

base_environment_id: str

the id of the base environment to use with the custom task version

is_major_update: bool, optional

if the current version is 2.3, True would set the new version at 3.0. False would set the new version at 2.4. Default to True

folder_path: str, optional

the path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path

required_metadata_values: List[RequiredMetadataValue]

Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.

maximum_memory: int

A number in bytes about how much memory custom tasks’ inference containers can run with.

Returns:
CustomTaskVersion

created custom task version

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod create_from_previous(custom_task_id, base_environment_id, is_major_update=True, folder_path=None, files_to_delete=None, required_metadata_values=None, maximum_memory=None)

Create a custom task version containing files from a previous version.

New in version v2.26.

Parameters:
custom_task_id: str

the id of the custom task

base_environment_id: str

the id of the base environment to use with the custom task version

is_major_update: bool, optional

if the current version is 2.3, True would set the new version at 3.0. False would set the new version at 2.4. Default to True

folder_path: str, optional

the path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path

files_to_delete: list, optional

the list of a file items ids to be deleted Example: [“5ea95f7a4024030aba48e4f9”, “5ea6b5da402403181895cc51”]

required_metadata_values: List[RequiredMetadataValue]

Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.

maximum_memory: int

A number in bytes about how much memory custom tasks’ inference containers can run with.

Returns:
CustomTaskVersion

created custom task version

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod list(custom_task_id)

List custom task versions.

New in version v2.26.

Parameters:
custom_task_id: str

the id of the custom task

Returns:
List[CustomTaskVersion]

a list of custom task versions

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

classmethod get(custom_task_id, custom_task_version_id)

Get custom task version by id.

New in version v2.26.

Parameters:
custom_task_id: str

the id of the custom task

custom_task_version_id: str

the id of the custom task version to retrieve

Returns:
CustomTaskVersion

retrieved custom task version

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

download(file_path)

Download custom task version.

New in version v2.26.

Parameters:
file_path: str

path to create a file with custom task version content

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

update(description=None, required_metadata_values=None)

Update custom task version properties.

New in version v2.26.

Parameters:
description: str

new custom task version description

required_metadata_values: List[RequiredMetadataValue]

Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status.

datarobot.errors.ServerError

if the server responded with 5xx status.

refresh()

Update custom task version with the latest data from server.

New in version v2.26.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

start_dependency_build()

Start the dependency build for a custom task version and return build status. .. versionadded:: v2.27

Returns:
CustomTaskVersionDependencyBuild

DTO of custom task version dependency build.

start_dependency_build_and_wait(max_wait)

Start the dependency build for a custom task version and wait while pulling status. .. versionadded:: v2.27

Parameters:
max_wait: int

max time to wait for a build completion

Returns:
CustomTaskVersionDependencyBuild

DTO of custom task version dependency build.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

datarobot.errors.AsyncTimeoutError

Raised if the dependency build is not finished after max_wait.

cancel_dependency_build()

Cancel custom task version dependency build that is in progress. .. versionadded:: v2.27

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

get_dependency_build()

Retrieve information about a custom task version’s dependency build. .. versionadded:: v2.27

Returns:
CustomTaskVersionDependencyBuild

DTO of custom task version dependency build.

download_dependency_build_log(file_directory='.')

Get log of a custom task version dependency build. .. versionadded:: v2.27

Parameters:
file_directory: str (optional, default is “.”)

Directory path where downloaded file is to save.

Raises:
datarobot.errors.ClientError

if the server responded with 4xx status

datarobot.errors.ServerError

if the server responded with 5xx status

Database Connectivity

class datarobot.DataDriver(id: Optional[str] = None, creator: Optional[str] = None, base_names: Optional[List[str]] = None, class_name: Optional[str] = None, canonical_name: Optional[str] = None)

A data driver

Attributes:
id : str

the id of the driver.

class_name : str

the Java class name for the driver.

canonical_name : str

the user-friendly name of the driver.

creator : str

the id of the user who created the driver.

base_names : list of str

a list of the file name(s) of the jar files.

classmethod list() → List[datarobot.models.driver.DataDriver]

Returns list of available drivers.

Returns:
drivers : list of DataDriver instances

contains a list of available drivers.

Examples

>>> import datarobot as dr
>>> drivers = dr.DataDriver.list()
>>> drivers
[DataDriver('mysql'), DataDriver('RedShift'), DataDriver('PostgreSQL')]
classmethod get(driver_id: str) → datarobot.models.driver.DataDriver

Gets the driver.

Parameters:
driver_id : str

the identifier of the driver.

Returns:
driver : DataDriver

the required driver.

Examples

>>> import datarobot as dr
>>> driver = dr.DataDriver.get('5ad08a1889453d0001ea7c5c')
>>> driver
DataDriver('PostgreSQL')
classmethod create(class_name: str, canonical_name: str, files: List[str]) → datarobot.models.driver.DataDriver

Creates the driver. Only available to admin users.

Parameters:
class_name : str

the Java class name for the driver.

canonical_name : str

the user-friendly name of the driver.

files : list of str

a list of the file paths on file system file_path(s) for the driver.

Returns:
driver : DataDriver

the created driver.

Raises:
ClientError

raised if user is not granted for Can manage JDBC database drivers feature

Examples

>>> import datarobot as dr
>>> driver = dr.DataDriver.create(
...     class_name='org.postgresql.Driver',
...     canonical_name='PostgreSQL',
...     files=['/tmp/postgresql-42.2.2.jar']
... )
>>> driver
DataDriver('PostgreSQL')
update(class_name: Optional[str] = None, canonical_name: Optional[str] = None) → None

Updates the driver. Only available to admin users.

Parameters:
class_name : str

the Java class name for the driver.

canonical_name : str

the user-friendly name of the driver.

Raises:
ClientError

raised if user is not granted for Can manage JDBC database drivers feature

Examples

>>> import datarobot as dr
>>> driver = dr.DataDriver.get('5ad08a1889453d0001ea7c5c')
>>> driver.canonical_name
'PostgreSQL'
>>> driver.update(canonical_name='postgres')
>>> driver.canonical_name
'postgres'
delete() → None

Removes the driver. Only available to admin users.

Raises:
ClientError

raised if user is not granted for Can manage JDBC database drivers feature

class datarobot.Connector(id: Optional[str] = None, creator_id: Optional[str] = None, configuration_id: Optional[str] = None, base_name: Optional[str] = None, canonical_name: Optional[str] = None)

A connector

Attributes:
id : str

the id of the connector.

creator_id : str

the id of the user who created the connector.

base_name : str

the file name of the jar file.

canonical_name : str

the user-friendly name of the connector.

configuration_id : str

the id of the configuration of the connector.

classmethod list() → List[datarobot.models.connector.Connector]

Returns list of available connectors.

Returns:
connectors : list of Connector instances

contains a list of available connectors.

Examples

>>> import datarobot as dr
>>> connectors = dr.Connector.list()
>>> connectors
[Connector('ADLS Gen2 Connector'), Connector('S3 Connector')]
classmethod get(connector_id: str) → datarobot.models.connector.Connector

Gets the connector.

Parameters:
connector_id : str

the identifier of the connector.

Returns:
connector : Connector

the required connector.

Examples

>>> import datarobot as dr
>>> connector = dr.Connector.get('5fe1063e1c075e0245071446')
>>> connector
Connector('ADLS Gen2 Connector')
classmethod create(file_path: str) → datarobot.models.connector.Connector

Creates the connector from a jar file. Only available to admin users.

Parameters:
file_path : str

the file path on file system file_path(s) for the connector.

Returns:
connector : Connector

the created connector.

Raises:
ClientError

raised if user is not granted for Can manage connectors feature

Examples

>>> import datarobot as dr
>>> connector = dr.Connector.create('/tmp/connector-adls-gen2.jar')
>>> connector
Connector('ADLS Gen2 Connector')
update(file_path: str) → datarobot.models.connector.Connector

Updates the connector with new jar file. Only available to admin users.

Parameters:
file_path : str

the file path on file system file_path(s) for the connector.

Returns:
connector : Connector

the updated connector.

Raises:
ClientError

raised if user is not granted for Can manage connectors feature

Examples

>>> import datarobot as dr
>>> connector = dr.Connector.get('5fe1063e1c075e0245071446')
>>> connector.base_name
'connector-adls-gen2.jar'
>>> connector.update('/tmp/connector-s3.jar')
>>> connector.base_name
'connector-s3.jar'
delete() → None

Removes the connector. Only available to admin users.

Raises:
ClientError

raised if user is not granted for Can manage connectors feature

class datarobot.DataStore(data_store_id: Optional[str] = None, data_store_type: Optional[str] = None, canonical_name: Optional[str] = None, creator: Optional[str] = None, updated: Optional[datetime.datetime] = None, params: Optional[datarobot.models.data_store.DataStoreParameters] = None, role: Optional[str] = None)

A data store. Represents database

Attributes:
id : str

The id of the data store.

data_store_type : str

The type of data store.

canonical_name : str

The user-friendly name of the data store.

creator : str

The id of the user who created the data store.

updated : datetime.datetime

The time of the last update

params : DataStoreParameters

A list specifying data store parameters.

role : str

Your access role for this data store.

classmethod list() → List[datarobot.models.data_store.DataStore]

Returns list of available data stores.

Returns:
data_stores : list of DataStore instances

contains a list of available data stores.

Examples

>>> import datarobot as dr
>>> data_stores = dr.DataStore.list()
>>> data_stores
[DataStore('Demo'), DataStore('Airlines')]
classmethod get(data_store_id: str) → datarobot.models.data_store.DataStore

Gets the data store.

Parameters:
data_store_id : str

the identifier of the data store.

Returns:
data_store : DataStore

the required data store.

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5a8ac90b07a57a0001be501e')
>>> data_store
DataStore('Demo')
classmethod create(data_store_type: str, canonical_name: str, driver_id: str, jdbc_url: str) → datarobot.models.data_store.DataStore

Creates the data store.

Parameters:
data_store_type : str

the type of data store.

canonical_name : str

the user-friendly name of the data store.

driver_id : str

the identifier of the DataDriver.

jdbc_url : str

the full JDBC url, for example jdbc:postgresql://my.dbaddress.org:5432/my_db.

Returns:
data_store : DataStore

the created data store.

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.create(
...     data_store_type='jdbc',
...     canonical_name='Demo DB',
...     driver_id='5a6af02eb15372000117c040',
...     jdbc_url='jdbc:postgresql://my.db.address.org:5432/perftest'
... )
>>> data_store
DataStore('Demo DB')
update(canonical_name: Optional[str] = None, driver_id: Optional[str] = None, jdbc_url: Optional[str] = None) → None

Updates the data store.

Parameters:
canonical_name : str

optional, the user-friendly name of the data store.

driver_id : str

optional, the identifier of the DataDriver.

jdbc_url : str

optional, the full JDBC url, for example jdbc:postgresql://my.dbaddress.org:5432/my_db.

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae')
>>> data_store
DataStore('Demo DB')
>>> data_store.update(canonical_name='Demo DB updated')
>>> data_store
DataStore('Demo DB updated')
delete() → None

Removes the DataStore

test(username: str, password: str) → TestResponse

Tests database connection.

Parameters:
username : str

the username for database authentication.

password : str

the password for database authentication. The password is encrypted at server side and never saved / stored

Returns:
message : dict

message with status.

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae')
>>> data_store.test(username='db_username', password='db_password')
{'message': 'Connection successful'}
schemas(username: str, password: str) → SchemasResponse

Returns list of available schemas.

Parameters:
username : str

the username for database authentication.

password : str

the password for database authentication. The password is encrypted at server side and never saved / stored

Returns:
response : dict

dict with database name and list of str - available schemas

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae')
>>> data_store.schemas(username='db_username', password='db_password')
{'catalog': 'perftest', 'schemas': ['demo', 'information_schema', 'public']}
tables(username: str, password: str, schema: Optional[str] = None) → TablesResponse

Returns list of available tables in schema.

Parameters:
username : str

optional, the username for database authentication.

password : str

optional, the password for database authentication. The password is encrypted at server side and never saved / stored

schema : str

optional, the schema name.

Returns:
response : dict

dict with catalog name and tables info

Examples

>>> import datarobot as dr
>>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae')
>>> data_store.tables(username='db_username', password='db_password', schema='demo')
{'tables': [{'type': 'TABLE', 'name': 'diagnosis', 'schema': 'demo'}, {'type': 'TABLE',
'name': 'kickcars', 'schema': 'demo'}, {'type': 'TABLE', 'name': 'patient',
'schema': 'demo'}, {'type': 'TABLE', 'name': 'transcript', 'schema': 'demo'}],
'catalog': 'perftest'}
classmethod from_server_data(data: Union[Dict[str, Any], List[Dict[str, Any]]], keep_attrs: Optional[List[str]] = None) → datarobot.models.data_store.DataStore

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters:
data : dict

The directly translated dict of JSON from the server. No casing fixes have taken place

keep_attrs : iterable

List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

get_access_list() → List[datarobot.models.sharing.SharingAccess]

Retrieve what users have access to this data store

New in version v2.14.

Returns:
list of : class:SharingAccess <datarobot.SharingAccess>
share(access_list: List[datarobot.models.sharing.SharingAccess]) → None

Modify the ability of users to access this data store

New in version v2.14.

Parameters:
access_list : list of SharingAccess

the modifications to make.

Raises:
datarobot.ClientError :

if you do not have permission to share this data store, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the data store without an owner.

Examples

Transfer access to the data store from old_user@datarobot.com to new_user@datarobot.com

import datarobot as dr

new_access = dr.SharingAccess(new_user@datarobot.com,
                              dr.enums.SHARING_ROLE.OWNER, can_share=True)
access_list = [dr.SharingAccess(old_user@datarobot.com, None), new_access]

dr.DataStore.get('my-data-store-id').share(access_list)
class datarobot.DataSource(data_source_id: Optional[str] = None, data_source_type: Optional[str] = None, canonical_name: Optional[str] = None, creator: Optional[str] = None, updated: Optional[datetime.datetime] = None, params: Optional[datarobot.models.data_source.DataSourceParameters] = None, role: Optional[str] = None)

A data source. Represents data request

Attributes:
id : str

the id of the data source.

type : str

the type of data source.

canonical_name : str

the user-friendly name of the data source.

creator : str

the id of the user who created the data source.

updated : datetime.datetime

the time of the last update.

params : DataSourceParameters

a list specifying data source parameters.

role : str or None

if a string, represents a particular level of access and should be one of datarobot.enums.SHARING_ROLE. For more information on the specific access levels, see the sharing documentation. If None, can be passed to a share function to revoke access for a specific user.

classmethod list() → List[datarobot.models.data_source.DataSource]

Returns list of available data sources.

Returns:
data_sources : list of DataSource instances

contains a list of available data sources.

Examples

>>> import datarobot as dr
>>> data_sources = dr.DataSource.list()
>>> data_sources
[DataSource('Diagnostics'), DataSource('Airlines 100mb'), DataSource('Airlines 10mb')]
classmethod get(data_source_id: str) → TDataSource

Gets the data source.

Parameters:
data_source_id : str

the identifier of the data source.

Returns:
data_source : DataSource

the requested data source.

Examples

>>> import datarobot as dr
>>> data_source = dr.DataSource.get('5a8ac9ab07a57a0001be501f')
>>> data_source
DataSource('Diagnostics')
classmethod create(data_source_type: str, canonical_name: str, params: datarobot.models.data_source.DataSourceParameters) → TDataSource

Creates the data source.

Parameters:
data_source_type : str

the type of data source.

canonical_name : str

the user-friendly name of the data source.

params : DataSourceParameters

a list specifying data source parameters.

Returns:
data_source : DataSource

the created data source.

Examples

>>> import datarobot as dr
>>> params = dr.DataSourceParameters(
...     data_store_id='5a8ac90b07a57a0001be501e',
...     query='SELECT * FROM airlines10mb WHERE "Year" >= 1995;'
... )
>>> data_source = dr.DataSource.create(
...     data_source_type='jdbc',
...     canonical_name='airlines stats after 1995',
...     params=params
... )
>>> data_source
DataSource('airlines stats after 1995')
update(canonical_name: Optional[str] = None, params: Optional[datarobot.models.data_source.DataSourceParameters] = None) → None

Creates the data source.

Parameters:
canonical_name : str

optional, the user-friendly name of the data source.

params : DataSourceParameters

optional, the identifier of the DataDriver.

Examples

>>> import datarobot as dr
>>> data_source = dr.DataSource.get('5ad840cc613b480001570953')
>>> data_source
DataSource('airlines stats after 1995')
>>> params = dr.DataSourceParameters(
...     query='SELECT * FROM airlines10mb WHERE "Year" >= 1990;'
... )
>>> data_source.update(
...     canonical_name='airlines stats after 1990',
...     params=params
... )
>>> data_source
DataSource('airlines stats after 1990')
delete() → None

Removes the DataSource

classmethod from_server_data(data: ServerDataType, keep_attrs: Optional[Iterable[str]] = None) → TDataSource

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters:
data : dict

The directly translated dict of JSON from the server. No casing fixes have taken place

keep_attrs : iterable

List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

get_access_list() → List[datarobot.models.sharing.SharingAccess]

Retrieve what users have access to this data source

New in version v2.14.

Returns:
list of : class:SharingAccess <datarobot.SharingAccess>
share(access_list: List[datarobot.models.sharing.SharingAccess]) → None

Modify the ability of users to access this data source

New in version v2.14.

Parameters:
access_list: list of : class:SharingAccess <datarobot.SharingAccess>

The modifications to make.

Raises:
datarobot.ClientError:

If you do not have permission to share this data source, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the data source without an owner.

Examples

Transfer access to the data source from old_user@datarobot.com to new_user@datarobot.com

from datarobot.enums import SHARING_ROLE
from datarobot.models.data_source import DataSource
from datarobot.models.sharing import SharingAccess

new_access = SharingAccess(
    "[email protected]",
    SHARING_ROLE.OWNER,
    can_share=True,
)
access_list = [
    SharingAccess("[email protected]", SHARING_ROLE.OWNER, can_share=True),
    new_access,
]

DataSource.get('my-data-source-id').share(access_list)
create_dataset(username: Optional[str] = None, password: Optional[str] = None, do_snapshot: Optional[bool] = None, persist_data_after_ingestion: Optional[bool] = None, categories: Optional[List[str]] = None, credential_id: Optional[str] = None, use_kerberos: Optional[bool] = None) → datarobot.models.dataset.Dataset

Create a Dataset from this data source.

New in version v2.22.

Parameters:
username: string, optional

The username for database authentication.

password: string, optional

The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored.

do_snapshot: bool, optional

If unset, uses the server default: True. If true, creates a snapshot dataset; if false, creates a remote dataset. Creating snapshots from non-file sources requires an additional permission, Enable Create Snapshot Data Source.

persist_data_after_ingestion: bool, optional

If unset, uses the server default: True. If true, will enforce saving all data (for download and sampling) and will allow a user to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.). If false, will not enforce saving data. The data schema (feature names and types) still will be available. Specifying this parameter to false and doSnapshot to true will result in an error.

categories: list[string], optional

An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.

credential_id: string, optional

The ID of the set of credentials to use instead of user and password. Note that with this change, username and password will become optional.

use_kerberos: bool, optional

If unset, uses the server default: False. If true, use kerberos authentication for database authentication.

Returns:
response: Dataset

The Dataset created from the uploaded data

class datarobot.DataSourceParameters(data_store_id: Optional[str] = None, table: Optional[str] = None, schema: Optional[str] = None, partition_column: Optional[str] = None, query: Optional[str] = None, fetch_size: Optional[int] = None)

Data request configuration

Attributes:
data_store_id : str

the id of the DataStore.

table : str

optional, the name of specified database table.

schema : str

optional, the name of the schema associated with the table.

partition_column : str

optional, the name of the partition column.

query : str

optional, the user specified SQL query.

fetch_size : int

optional, a user specified fetch size in the range [1, 20000]. By default a fetchSize will be assigned to balance throughput and memory usage

Datasets

class datarobot.models.Dataset(dataset_id: str, version_id: str, name: str, categories: List[str], created_at: str, is_data_engine_eligible: bool, is_latest_version: bool, is_snapshot: bool, processing_state: str, created_by: Optional[str] = None, data_persisted: Optional[bool] = None, size: Optional[int] = None, row_count: Optional[int] = None)

Represents a Dataset returned from the api/v2/datasets/ endpoints.

Attributes:
id: string

The ID of this dataset

name: string

The name of this dataset in the catalog

is_latest_version: bool

Whether this dataset version is the latest version of this dataset

version_id: string

The object ID of the catalog_version the dataset belongs to

categories: list(string)

An array of strings describing the intended use of the dataset. The supported options are “TRAINING” and “PREDICTION”.

created_at: string

The date when the dataset was created

created_by: string, optional

Username of the user who created the dataset

is_snapshot: bool

Whether the dataset version is an immutable snapshot of data which has previously been retrieved and saved to Data_robot

data_persisted: bool, optional

If true, user is allowed to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.) and download data. If false, download is not allowed and only the data schema (feature names and types) will be available.

is_data_engine_eligible: bool

Whether this dataset can be a data source of a data engine query.

processing_state: string

Current ingestion process state of the dataset

row_count: int, optional

The number of rows in the dataset.

size: int, optional

The size of the dataset as a CSV in bytes.

get_uri() → str
Returns:
url : str

Permanent static hyperlink to this dataset in AI Catalog.

classmethod upload(source: Union[str, pandas.core.frame.DataFrame, io.IOBase]) → TDataset

This method covers Dataset creation from local materials (file & DataFrame) and a URL.

Parameters:
source: str, pd.DataFrame or file object

Pass a URL, filepath, file or DataFrame to create and return a Dataset.

Returns:
response: Dataset

The Dataset created from the uploaded data source.

Raises:
InvalidUsageError

If the source parameter cannot be determined to be a URL, filepath, file or DataFrame.

Examples

# Upload a local file
dataset_one = Dataset.upload("./data/examples.csv")

# Create a dataset via URL
dataset_two = Dataset.upload(
    "https://raw.githubusercontent.com/curran/data/gh-pages/dbpedia/cities/data.csv"
)

# Create dataset with a pandas Dataframe
dataset_three = Dataset.upload(my_df)

# Create dataset using a local file
with open("./data/examples.csv", "rb") as file_pointer:
    dataset_four = Dataset.create_from_file(filelike=file_pointer)
classmethod create_from_file(file_path: Optional[str] = None, filelike: Optional[io.IOBase] = None, categories: Optional[List[str]] = None, read_timeout: int = 600, max_wait: int = 600) → TDataset

A blocking call that creates a new Dataset from a file. Returns when the dataset has been successfully uploaded and processed.

Warning: This function does not clean up it’s open files. If you pass a filelike, you are responsible for closing it. If you pass a file_path, this will create a file object from the file_path but will not close it.

Parameters:
file_path: string, optional

The path to the file. This will create a file object pointing to that file but will not close it.

filelike: file, optional

An open and readable file object.

categories: list[string], optional

An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.

read_timeout: int, optional

The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete

max_wait: int, optional

Time in seconds after which dataset creation is considered unsuccessful

Returns:
response: Dataset

A fully armed and operational Dataset

classmethod create_from_in_memory_data(data_frame: Optional[pandas.core.frame.DataFrame] = None, records: Optional[List[Dict[str, Any]]] = None, categories: Optional[List[str]] = None, read_timeout: int = 600, max_wait: int = 600, fname: Optional[str] = None) → TDataset

A blocking call that creates a new Dataset from in-memory data. Returns when the dataset has been successfully uploaded and processed.

The data can be either a pandas DataFrame or a list of dictionaries with identical keys.

Parameters:
data_frame: DataFrame, optional

The data frame to upload

records: list[dict], optional

A list of dictionaries with identical keys to upload

categories: list[string], optional

An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.

read_timeout: int, optional

The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete

max_wait: int, optional

Time in seconds after which dataset creation is considered unsuccessful

fname: string, optional

The file name, “data.csv” by default

Returns:
response: Dataset

The Dataset created from the uploaded data.

Raises:
InvalidUsageError

If neither a DataFrame or list of records is passed.

classmethod create_from_url(url: str, do_snapshot: Optional[bool] = None, persist_data_after_ingestion: Optional[bool] = None, categories: Optional[List[str]] = None, max_wait: int = 600) → TDataset

A blocking call that creates a new Dataset from data stored at a url. Returns when the dataset has been successfully uploaded and processed.

Parameters:
url: string

The URL to use as the source of data for the dataset being created.

do_snapshot: bool, optional

If unset, uses the server default: True. If true, creates a snapshot dataset; if false, creates a remote dataset. Creating snapshots from non-file sources may be disabled by the permission, Disable AI Catalog Snapshots.

persist_data_after_ingestion: bool, optional

If unset, uses the server default: True. If true, will enforce saving all data (for download and sampling) and will allow a user to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.). If false, will not enforce saving data. The data schema (feature names and types) still will be available. Specifying this parameter to false and doSnapshot to true will result in an error.

categories: list[string], optional

An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.

max_wait: int, optional

Time in seconds after which dataset creation is considered unsuccessful.

Returns:
response: Dataset

The Dataset created from the uploaded data

classmethod create_from_data_source(data_source_id: str, username: Optional[str] = None, password: Optional[str] = None, do_snapshot: Optional[bool] = None, persist_data_after_ingestion: Optional[bool] = None, categories: Optional[List[str]] = None, credential_id: Optional[str] = None, use_kerberos: Optional[bool] = None, credential_data: Optional[Dict[str, str]] = None, max_wait: int = 600) → TDataset

A blocking call that creates a new Dataset from data stored at a DataSource. Returns when the dataset has been successfully uploaded and processed.

New in version v2.22.

Parameters:
data_source_id: string

The ID of the DataSource to use as the source of data.

username: string, optional

The username for database authentication.

password: string, optional

The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored.

do_snapshot: bool, optional

If unset, uses the server default: True. If true, creates a snapshot dataset; if false, creates a remote dataset. Creating snapshots from non-file sources requires may be disabled by the permission, Disable AI Catalog Snapshots.

persist_data_after_ingestion: bool, optional

If unset, uses the server default: True. If true, will enforce saving all data (for download and sampling) and will allow a user to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.). If false, will not enforce saving data. The data schema (feature names and types) still will be available. Specifying this parameter to false and doSnapshot to true will result in an error.

categories: list[string], optional

An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.

credential_id: string, optional

The ID of the set of credentials to use instead of user and password. Note that with this change, username and password will become optional.

use_kerberos: bool, optional

If unset, uses the server default: False. If true, use kerberos authentication for database authentication.

credential_data: dict, optional

The credentials to authenticate with the database, to use instead of user/password or credential ID.

max_wait: int, optional

Time in seconds after which project creation is considered unsuccessful.

Returns:
response: Dataset

The Dataset created from the uploaded data

classmethod create_from_query_generator(generator_id: str, dataset_id: Optional[str] = None, dataset_version_id: Optional[str] = None, max_wait: int = 600) → TDataset

A blocking call that creates a new Dataset from the query generator. Returns when the dataset has been successfully processed. If optional parameters are not specified the query is applied to the dataset_id and dataset_version_id stored in the query generator. If specified they will override the stored dataset_id/dataset_version_id, e.g. to prep a prediction dataset.

Parameters:
generator_id: str

The id of the query generator to use.

dataset_id: str, optional

The id of the dataset to apply the query to.

dataset_version_id: str, optional

The id of the dataset version to apply the query to. If not specified the latest version associated with dataset_id (if specified) is used.

max_wait : int

optional, the maximum number of seconds to wait before giving up.

Returns:
response: Dataset

The Dataset created from the query generator

classmethod get(dataset_id: str) → TDataset

Get information about a dataset.

Parameters:
dataset_id : string

the id of the dataset

Returns:
dataset : Dataset

the queried dataset

classmethod delete(dataset_id: str) → None

Soft deletes a dataset. You cannot get it or list it or do actions with it, except for un-deleting it.

Parameters:
dataset_id: string

The id of the dataset to mark for deletion

Returns:
None
classmethod un_delete(dataset_id: str) → None

Un-deletes a previously deleted dataset. If the dataset was not deleted, nothing happens.

Parameters:
dataset_id: string

The id of the dataset to un-delete

Returns:
None
classmethod list(category: Optional[str] = None, filter_failed: Optional[bool] = None, order_by: Optional[str] = None) → List[TDataset]

List all datasets a user can view.

Parameters:
category: string, optional

Optional. If specified, only dataset versions that have the specified category will be included in the results. Categories identify the intended use of the dataset; supported categories are “TRAINING” and “PREDICTION”.

filter_failed: bool, optional

If unset, uses the server default: False. Whether datasets that failed during import should be excluded from the results. If True invalid datasets will be excluded.

order_by: string, optional

If unset, uses the server default: “-created”. Sorting order which will be applied to catalog list, valid options are: - “created” – ascending order by creation datetime; - “-created” – descending order by creation datetime.

Returns:
list[Dataset]

a list of datasets the user can view

classmethod iterate(offset: Optional[int] = None, limit: Optional[int] = None, category: Optional[str] = None, order_by: Optional[str] = None, filter_failed: Optional[bool] = None) → Generator[TDataset, None, None]

Get an iterator for the requested datasets a user can view. This lazily retrieves results. It does not get the next page from the server until the current page is exhausted.

Parameters:
offset: int, optional

If set, this many results will be skipped

limit: int, optional

Specifies the size of each page retrieved from the server. If unset, uses the server default.

category: string, optional

Optional. If specified, only dataset versions that have the specified category will be included in the results. Categories identify the intended use of the dataset; supported categories are “TRAINING” and “PREDICTION”.

filter_failed: bool, optional

If unset, uses the server default: False. Whether datasets that failed during import should be excluded from the results. If True invalid datasets will be excluded.

order_by: string, optional

If unset, uses the server default: “-created”. Sorting order which will be applied to catalog list, valid options are: - “created” – ascending order by creation datetime; - “-created” – descending order by creation datetime.

Yields:
Dataset

An iterator of the datasets the user can view

update() → None

Updates the Dataset attributes in place with the latest information from the server.

Returns:
None
modify(name: Optional[str] = None, categories: Optional[List[str]] = None) → None

Modifies the Dataset name and/or categories. Updates the object in place.

Parameters:
name: string, optional

The new name of the dataset

categories: list[string], optional

A list of strings describing the intended use of the dataset. The supported options are “TRAINING” and “PREDICTION”. If any categories were previously specified for the dataset, they will be overwritten.

Returns:
None
share(access_list: List[datarobot.models.sharing.SharingAccess], apply_grant_to_linked_objects: bool = False) → None

Modify the ability of users to access this dataset

Parameters:
access_list: list of : class:SharingAccess <datarobot.SharingAccess>

The modifications to make.

apply_grant_to_linked_objects: bool

If true for any users being granted access to the dataset, grant the user read access to any linked objects such as DataSources and DataStores that may be used by this dataset. Ignored if no such objects are relevant for dataset, defaults to False.

Raises:
datarobot.ClientError:

If you do not have permission to share this dataset, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the dataset without an owner.

Examples

Transfer access to the dataset from old_user@datarobot.com to new_user@datarobot.com

from datarobot.enums import SHARING_ROLE
from datarobot.models.dataset import Dataset
from datarobot.models.sharing import SharingAccess

new_access = SharingAccess(
    "[email protected]",
    SHARING_ROLE.OWNER,
    can_share=True,
)
access_list = [
    SharingAccess(
        "[email protected]",
        SHARING_ROLE.OWNER,
        can_share=True,
        can_use_data=True,
    ),
    new_access,
]

Dataset.get('my-dataset-id').share(access_list)
get_details() → datarobot.models.dataset.DatasetDetails

Gets the details for this Dataset

Returns:
DatasetDetails
get_all_features(order_by: Optional[str] = None) → List[datarobot.models.feature.DatasetFeature]

Get a list of all the features for this dataset.

Parameters:
order_by: string, optional

If unset, uses the server default: ‘name’. How the features should be ordered. Can be ‘name’ or ‘featureType’.

Returns:
list[DatasetFeature]
iterate_all_features(offset: Optional[int] = None, limit: Optional[int] = None, order_by: Optional[str] = None) → Generator[datarobot.models.feature.DatasetFeature, None, None]

Get an iterator for the requested features of a dataset. This lazily retrieves results. It does not get the next page from the server until the current page is exhausted.

Parameters:
offset: int, optional

If set, this many results will be skipped.

limit: int, optional

Specifies the size of each page retrieved from the server. If unset, uses the server default.

order_by: string, optional

If unset, uses the server default: ‘name’. How the features should be ordered. Can be ‘name’ or ‘featureType’.

Yields:
DatasetFeature
get_featurelists() → List[datarobot.models.featurelist.DatasetFeaturelist]

Get DatasetFeaturelists created on this Dataset

Returns:
feature_lists: list[DatasetFeaturelist]
create_featurelist(name: str, features: List[str]) → datarobot.models.featurelist.DatasetFeaturelist

Create a new dataset featurelist

Parameters:
name : str

the name of the modeling featurelist to create. Names must be unique within the dataset, or the server will return an error.

features : list of str

the names of the features to include in the dataset featurelist. Each feature must be a dataset feature.

Returns:
featurelist : DatasetFeaturelist

the newly created featurelist

Examples

dataset = Dataset.get('1234deadbeeffeeddead4321')
dataset_features = dataset.get_all_features()
selected_features = [feat.name for feat in dataset_features][:5]  # select first five
new_flist = dataset.create_featurelist('Simple Features', selected_features)
get_file(file_path: Optional[str] = None, filelike: Optional[io.IOBase] = None) → None

Retrieves all the originally uploaded data in CSV form. Writes it to either the file or a filelike object that can write bytes.

Only one of file_path or filelike can be provided and it must be provided as a keyword argument (i.e. file_path=’path-to-write-to’). If a file-like object is provided, the user is responsible for closing it when they are done.

The user must also have permission to download data.

Parameters:
file_path: string, optional

The destination to write the file to.

filelike: file, optional

A file-like object to write to. The object must be able to write bytes. The user is responsible for closing the object

Returns:
None
get_as_dataframe() → pandas.core.frame.DataFrame

Retrieves all the originally uploaded data in a pandas DataFrame.

New in version v3.0.

Returns:
pd.DataFrame
get_projects() → List[datarobot.models.dataset.ProjectLocation]

Retrieves the Dataset’s projects as ProjectLocation named tuples.

Returns:
locations: list[ProjectLocation]
create_project(project_name: Optional[str] = None, user: Optional[str] = None, password: Optional[str] = None, credential_id: Optional[str] = None, use_kerberos: Optional[bool] = None, credential_data: Optional[Dict[str, str]] = None) → datarobot.models.project.Project

Create a datarobot.models.Project from this dataset

Parameters:
project_name: string, optional

The name of the project to be created. If not specified, will be “Untitled Project” for database connections, otherwise the project name will be based on the file used.

user: string, optional

The username for database authentication.

password: string, optional

The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored

credential_id: string, optional

The ID of the set of credentials to use instead of user and password.

use_kerberos: bool, optional

Server default is False. If true, use kerberos authentication for database authentication.

credential_data: dict, optional

The credentials to authenticate with the database, to use instead of user/password or credential ID.

Returns:
Project
classmethod create_version_from_file(dataset_id: str, file_path: Optional[str] = None, filelike: Optional[io.IOBase] = None, categories: Optional[List[str]] = None, read_timeout: int = 600, max_wait: int = 600) → TDataset

A blocking call that creates a new Dataset version from a file. Returns when the new dataset version has been successfully uploaded and processed.

Warning: This function does not clean up it’s open files. If you pass a filelike, you are responsible for closing it. If you pass a file_path, this will create a file object from the file_path but will not close it.

New in version v2.23.

Parameters:
dataset_id: string

The ID of the dataset for which new version to be created

file_path: string, optional

The path to the file. This will create a file object pointing to that file but will not close it.

filelike: file, optional

An open and readable file object.

categories: list[string], optional

An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.

read_timeout: int, optional

The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete

max_wait: int, optional

Time in seconds after which project creation is considered unsuccessful

Returns:
response: Dataset

A fully armed and operational Dataset version

classmethod create_version_from_in_memory_data(dataset_id: str, data_frame: Optional[pandas.core.frame.DataFrame] = None, records: Optional[List[Dict[str, Any]]] = None, categories: Optional[List[str]] = None, read_timeout: int = 600, max_wait: int = 600) → TDataset

A blocking call that creates a new Dataset version for a dataset from in-memory data. Returns when the dataset has been successfully uploaded and processed.

The data can be either a pandas DataFrame or a list of dictionaries with identical keys.

New in version v2.23.

Parameters:
dataset_id: string

The ID of the dataset for which new version to be created

data_frame: DataFrame, optional

The data frame to upload

records: list[dict], optional

A list of dictionaries with identical keys to upload

categories: list[string], optional

An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.

read_timeout: int, optional

The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete

max_wait: int, optional

Time in seconds after which project creation is considered unsuccessful

Returns:
response: Dataset

The Dataset version created from the uploaded data

Raises:
InvalidUsageError

If neither a DataFrame or list of records is passed.

classmethod create_version_from_url(dataset_id: str, url: str, categories: Optional[List[str]] = None, max_wait: int = 600) → TDataset

A blocking call that creates a new Dataset from data stored at a url for a given dataset. Returns when the dataset has been successfully uploaded and processed.

New in version v2.23.

Parameters:
dataset_id: string

The ID of the dataset for which new version to be created

url: string

The URL to use as the source of data for the dataset being created.

categories: list[string], optional

An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.

max_wait: int, optional

Time in seconds after which project creation is considered unsuccessful

Returns:
response: Dataset

The Dataset version created from the uploaded data

classmethod create_version_from_data_source(dataset_id: str, data_source_id: str, username: Optional[str] = None, password: Optional[str] = None, categories: Optional[List[str]] = None, credential_id: Optional[str] = None, use_kerberos: Optional[bool] = None, credential_data: Optional[Dict[str, str]] = None, max_wait: int = 600) → TDataset

A blocking call that creates a new Dataset from data stored at a DataSource. Returns when the dataset has been successfully uploaded and processed.

New in version v2.23.

Parameters:
dataset_id: string

The ID of the dataset for which new version to be created

data_source_id: string

The ID of the DataSource to use as the source of data.

username: string, optional

The username for database authentication.

password: string, optional

The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored.

categories: list[string], optional

An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.

credential_id: string, optional

The ID of the set of credentials to use instead of user and password. Note that with this change, username and password will become optional.

use_kerberos: bool, optional

If unset, uses the server default: False. If true, use kerberos authentication for database authentication.

credential_data: dict, optional

The credentials to authenticate with the database, to use instead of user/password or credential ID.

max_wait: int, optional

Time in seconds after which project creation is considered unsuccessful

Returns:
response: Dataset

The Dataset version created from the uploaded data

classmethod from_data(data: Union[Dict[str, Any], List[Dict[str, Any]]]) → T

Instantiate an object of this class using a dict.

Parameters:
data : dict

Correctly snake_cased keys and their values.

classmethod from_server_data(data: Union[Dict[str, Any], List[Dict[str, Any]]], keep_attrs: Optional[Iterable[str]] = None) → T

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters:
data : dict

The directly translated dict of JSON from the server. No casing fixes have taken place

keep_attrs : iterable

List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

open_in_browser() → None

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

class datarobot.DatasetDetails(dataset_id: str, version_id: str, categories: List[str], created_by: str, created_at: str, data_source_type: str, error: str, is_latest_version: bool, is_snapshot: bool, is_data_engine_eligible: bool, last_modification_date: str, last_modifier_full_name: str, name: str, uri: str, processing_state: str, data_persisted: Optional[bool] = None, data_engine_query_id: Optional[str] = None, data_source_id: Optional[str] = None, description: Optional[str] = None, eda1_modification_date: Optional[str] = None, eda1_modifier_full_name: Optional[str] = None, feature_count: Optional[int] = None, feature_count_by_type: Optional[List[datarobot.models.dataset.FeatureTypeCount]] = None, row_count: Optional[int] = None, size: Optional[int] = None, tags: Optional[List[str]] = None)

Represents a detailed view of a Dataset. The to_dataset method creates a Dataset from this details view.

Attributes:
dataset_id: string

The ID of this dataset

name: string

The name of this dataset in the catalog

is_latest_version: bool

Whether this dataset version is the latest version of this dataset

version_id: string

The object ID of the catalog_version the dataset belongs to

categories: list(string)

An array of strings describing the intended use of the dataset. The supported options are “TRAINING” and “PREDICTION”.

created_at: string

The date when the dataset was created

created_by: string

Username of the user who created the dataset

is_snapshot: bool

Whether the dataset version is an immutable snapshot of data which has previously been retrieved and saved to Data_robot

data_persisted: bool, optional

If true, user is allowed to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.) and download data. If false, download is not allowed and only the data schema (feature names and types) will be available.

is_data_engine_eligible: bool

Whether this dataset can be a data source of a data engine query.

processing_state: string

Current ingestion process state of the dataset

row_count: int, optional

The number of rows in the dataset.

size: int, optional

The size of the dataset as a CSV in bytes.

data_engine_query_id: string, optional

ID of the source data engine query

data_source_id: string, optional

ID of the datasource used as the source of the dataset

data_source_type: string

the type of the datasource that was used as the source of the dataset

description: string, optional

the description of the dataset

eda1_modification_date: string, optional

the ISO 8601 formatted date and time when the EDA1 for the dataset was updated

eda1_modifier_full_name: string, optional

the user who was the last to update EDA1 for the dataset

error: string

details of exception raised during ingestion process, if any

feature_count: int, optional

total number of features in the dataset

feature_count_by_type: list[FeatureTypeCount]

number of features in the dataset grouped by feature type

last_modification_date: string

the ISO 8601 formatted date and time when the dataset was last modified

last_modifier_full_name: string

full name of user who was the last to modify the dataset

tags: list[string]

list of tags attached to the item

uri: string

the uri to datasource like: - ‘file_name.csv’ - ‘jdbc:DATA_SOURCE_GIVEN_NAME/SCHEMA.TABLE_NAME’ - ‘jdbc:DATA_SOURCE_GIVEN_NAME/<query>’ - for query based datasources - ‘https://s3.amazonaws.com/datarobot_test/kickcars-sample-200.csv’ - etc.

classmethod get(dataset_id: str) → TDatasetDetails

Get details for a Dataset from the server

Parameters:
dataset_id: str

The id for the Dataset from which to get details

Returns:
DatasetDetails
to_dataset() → datarobot.models.dataset.Dataset

Build a Dataset object from the information in this object

Returns:
Dataset

Data Engine Query Generator

class datarobot.DataEngineQueryGenerator(**generator_kwargs)

DataEngineQueryGenerator is used to set up time series data prep.

New in version v2.27.

Attributes:
id: str

id of the query generator

query: str

text of the generated Spark SQL query

datasets: list(QueryGeneratorDataset)

datasets associated with the query generator

generator_settings: QueryGeneratorSettings

the settings used to define the query

generator_type: str

“TimeSeries” is the only supported type

classmethod create(generator_type, datasets, generator_settings)

Creates a query generator entity.

New in version v2.27.

Parameters:
generator_type : str

Type of data engine query generator

datasets : List[QueryGeneratorDataset]

Source datasets in the Data Engine workspace.

generator_settings : dict

Data engine generator settings of the given generator_type.

Returns:
query_generator : DataEngineQueryGenerator

The created generator

Examples

import datarobot as dr
from datarobot.models.data_engine_query_generator import (
   QueryGeneratorDataset,
   QueryGeneratorSettings,
)
dataset = QueryGeneratorDataset(
   alias='My_Awesome_Dataset_csv',
   dataset_id='61093144cabd630828bca321',
   dataset_version_id=1,
)
settings = QueryGeneratorSettings(
   datetime_partition_column='date',
   time_unit='DAY',
   time_step=1,
   default_numeric_aggregation_method='sum',
   default_categorical_aggregation_method='mostFrequent',
)
g = dr.DataEngineQueryGenerator.create(
   generator_type='TimeSeries',
   datasets=[dataset],
   generator_settings=settings,
)
g.id
>>>'54e639a18bd88f08078ca831'
g.generator_type
>>>'TimeSeries'
classmethod get(generator_id)

Gets information about a query generator.

Parameters:
generator_id : str

The identifier of the query generator you want to load.

Returns:
query_generator : DataEngineQueryGenerator

The queried generator

Examples

import datarobot as dr
g = dr.DataEngineQueryGenerator.get(generator_id='54e639a18bd88f08078ca831')
g.id
>>>'54e639a18bd88f08078ca831'
g.generator_type
>>>'TimeSeries'
create_dataset(dataset_id=None, dataset_version_id=None, max_wait=600)

A blocking call that creates a new Dataset from the query generator. Returns when the dataset has been successfully processed. If optional parameters are not specified the query is applied to the dataset_id and dataset_version_id stored in the query generator. If specified they will override the stored dataset_id/dataset_version_id, i.e. to prep a prediction dataset.

Parameters:
dataset_id: str, optional

The id of the unprepped dataset to apply the query to

dataset_version_id: str, optional

The version_id of the unprepped dataset to apply the query to

Returns:
response: Dataset

The Dataset created from the query generator

prepare_prediction_dataset_from_catalog(project_id: str, dataset_id: str, dataset_version_id: Optional[str] = None, max_wait: Optional[int] = 600, relax_known_in_advance_features_check: Optional[bool] = None) → datarobot.models.prediction_dataset.PredictionDataset

Apply time series data prep to a catalog dataset and upload it to the project as a PredictionDataset.

New in version v3.1.

Parameters:
project_id : str

The id of the project to which you upload the prediction dataset.

dataset_id : str

The identifier of the dataset.

dataset_version_id : str, optional

The version id of the dataset to use.

max_wait : int, optional

Optional, the maximum number of seconds to wait before giving up.

relax_known_in_advance_features_check : bool, optional

For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

Returns:
dataset : PredictionDataset

The newly uploaded dataset.

prepare_prediction_dataset(sourcedata: Union[str, pandas.core.frame.DataFrame, io.IOBase], project_id: str, max_wait: Optional[int] = 600, relax_known_in_advance_features_check: Optional[bool] = None) → datarobot.models.prediction_dataset.PredictionDataset

Apply time series data prep and upload the PredictionDataset to the project.

New in version v3.1.

Parameters:
sourcedata : str, file or pandas.DataFrame

Data to be used for predictions. If it is a string, it can be either a path to a local file, or raw file content. If using a file on disk, the filename must consist of ASCII characters only.

project_id : str

The id of the project to which you upload the prediction dataset.

max_wait : int, optional

The maximum number of seconds to wait for the uploaded dataset to be processed before raising an error.

relax_known_in_advance_features_check : bool, optional

For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

Returns
——-
dataset : PredictionDataset

The newly uploaded dataset.

Raises:
InputNotUnderstoodError

Raised if sourcedata isn’t one of supported types.

AsyncFailureError

Raised if polling for the status of an async process resulted in a response with an unsupported status code.

AsyncProcessUnsuccessfulError

Raised if project creation was unsuccessful (i.e. the server reported an error in uploading the dataset).

AsyncTimeoutError

Raised if processing the uploaded dataset took more time than specified by the max_wait parameter.

Datetime Trend Plots

class datarobot.models.datetime_trend_plots.AccuracyOverTimePlotsMetadata(project_id, model_id, forecast_distance, resolutions, backtest_metadata, holdout_metadata, backtest_statuses, holdout_statuses)

Accuracy over Time metadata for datetime model.

New in version v2.25.

Notes

Backtest/holdout status is a dict containing the following:

  • training: string
    Status backtest/holdout training. One of datarobot.enums.DATETIME_TREND_PLOTS_STATUS
  • validation: string
    Status backtest/holdout validation. One of datarobot.enums.DATETIME_TREND_PLOTS_STATUS

Backtest/holdout metadata is a dict containing the following:

  • training: dict
    Start and end dates for the backtest/holdout training.
  • validation: dict
    Start and end dates for the backtest/holdout validation.

Each dict in the training and validation in backtest/holdout metadata is structured like:

  • start_date: datetime.datetime or None
    The datetime of the start of the chart data (inclusive). None if chart data is not computed.
  • end_date: datetime.datetime or None
    The datetime of the end of the chart data (exclusive). None if chart data is not computed.
Attributes:
project_id: string

The project ID.

model_id: string

The model ID.

forecast_distance: int or None

The forecast distance for which the metadata was retrieved. None for OTV projects.

resolutions: list of string

A list of datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION, which represents available time resolutions for which plots can be retrieved.

backtest_metadata: list of dict

List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details.

holdout_metadata: dict

Holdout metadata dict. See backtest/holdout metadata info in Notes for more details.

backtest_statuses: list of dict

List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details.

holdout_statuses: dict

Holdout status dict. See backtest/holdout status info in Notes for more details.

class datarobot.models.datetime_trend_plots.AccuracyOverTimePlot(project_id, model_id, start_date, end_date, resolution, bins, statistics, calendar_events)

Accuracy over Time plot for datetime model.

New in version v2.25.

Notes

Bin is a dict containing the following:

  • start_date: datetime.datetime
    The datetime of the start of the bin (inclusive).
  • end_date: datetime.datetime
    The datetime of the end of the bin (exclusive).
  • actual: float or None
    Average actual value of the target in the bin. None if there are no entries in the bin.
  • predicted: float or None
    Average prediction of the model in the bin. None if there are no entries in the bin.
  • frequency: int or None
    Indicates number of values averaged in bin.

Statistics is a dict containing the following:

  • durbin_watson: float or None
    The Durbin-Watson statistic for the chart data. Value is between 0 and 4. Durbin-Watson statistic is a test statistic used to detect the presence of autocorrelation at lag 1 in the residuals (prediction errors) from a regression analysis. More info https://wikipedia.org/wiki/Durbin%E2%80%93Watson_statistic

Calendar event is a dict containing the following:

  • name: string
    Name of the calendar event.
  • date: datetime
    Date of the calendar event.
  • series_id: string or None
    The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.
Attributes:
project_id: string

The project ID.

model_id: string

The model ID.

resolution: string

The resolution that is used for binning. One of datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION

start_date: datetime.datetime

The datetime of the start of the chart data (inclusive).

end_date: datetime.datetime

The datetime of the end of the chart data (exclusive).

bins: list of dict

List of plot bins. See bin info in Notes for more details.

statistics: dict

Statistics for plot. See statistics info in Notes for more details.

calendar_events: list of dict

List of calendar events for the plot. See calendar events info in Notes for more details.

class datarobot.models.datetime_trend_plots.AccuracyOverTimePlotPreview(project_id, model_id, start_date, end_date, bins)

Accuracy over Time plot preview for datetime model.

New in version v2.25.

Notes

Bin is a dict containing the following:

  • start_date: datetime.datetime
    The datetime of the start of the bin (inclusive).
  • end_date: datetime.datetime
    The datetime of the end of the bin (exclusive).
  • actual: float or None
    Average actual value of the target in the bin. None if there are no entries in the bin.
  • predicted: float or None
    Average prediction of the model in the bin. None if there are no entries in the bin.
Attributes:
project_id: string

The project ID.

model_id: string

The model ID.

start_date: datetime.datetime

The datetime of the start of the chart data (inclusive).

end_date: datetime.datetime

The datetime of the end of the chart data (exclusive).

bins: list of dict

List of plot bins. See bin info in Notes for more details.

class datarobot.models.datetime_trend_plots.ForecastVsActualPlotsMetadata(project_id, model_id, resolutions, backtest_metadata, holdout_metadata, backtest_statuses, holdout_statuses)

Forecast vs Actual plots metadata for datetime model.

New in version v2.25.

Notes

Backtest/holdout status is a dict containing the following:

  • training: dict
    Dict containing each of datarobot.enums.DATETIME_TREND_PLOTS_STATUS as dict key, and list of forecast distances for particular status as dict value.
  • validation: dict
    Dict containing each of datarobot.enums.DATETIME_TREND_PLOTS_STATUS as dict key, and list of forecast distances for particular status as dict value.

Backtest/holdout metadata is a dict containing the following:

  • training: dict
    Start and end dates for the backtest/holdout training.
  • validation: dict
    Start and end dates for the backtest/holdout validation.

Each dict in the training and validation in backtest/holdout metadata is structured like:

  • start_date: datetime.datetime or None
    The datetime of the start of the chart data (inclusive). None if chart data is not computed.
  • end_date: datetime.datetime or None
    The datetime of the end of the chart data (exclusive). None if chart data is not computed.
Attributes:
project_id: string

The project ID.

model_id: string

The model ID.

resolutions: list of string

A list of datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION, which represents available time resolutions for which plots can be retrieved.

backtest_metadata: list of dict

List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details.

holdout_metadata: dict

Holdout metadata dict. See backtest/holdout metadata info in Notes for more details.

backtest_statuses: list of dict

List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details.

holdout_statuses: dict

Holdout status dict. See backtest/holdout status info in Notes for more details.

class datarobot.models.datetime_trend_plots.ForecastVsActualPlot(project_id, model_id, forecast_distances, start_date, end_date, resolution, bins, calendar_events)

Forecast vs Actual plot for datetime model.

New in version v2.25.

Notes

Bin is a dict containing the following:

  • start_date: datetime.datetime
    The datetime of the start of the bin (inclusive).
  • end_date: datetime.datetime
    The datetime of the end of the bin (exclusive).
  • actual: float or None
    Average actual value of the target in the bin. None if there are no entries in the bin.
  • forecasts: list of float
    A list of average forecasts for the model for each forecast distance. Empty if there are no forecasts in the bin. Each index in the forecasts list maps to forecastDistances list index.
  • error: float or None
    Average absolute residual value of the bin. None if there are no entries in the bin.
  • normalized_error: float or None
    Normalized average absolute residual value of the bin. None if there are no entries in the bin.
  • frequency: int or None
    Indicates number of values averaged in bin.

Calendar event is a dict containing the following:

  • name: string
    Name of the calendar event.
  • date: datetime
    Date of the calendar event.
  • series_id: string or None
    The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.
Attributes:
project_id: string

The project ID.

model_id: string

The model ID.

forecast_distances: list of int

A list of forecast distances that were retrieved.

resolution: string

The resolution that is used for binning. One of datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION

start_date: datetime.datetime

The datetime of the start of the chart data (inclusive).

end_date: datetime.datetime

The datetime of the end of the chart data (exclusive).

bins: list of dict

List of plot bins. See bin info in Notes for more details.

calendar_events: list of dict

List of calendar events for the plot. See calendar events info in Notes for more details.

class datarobot.models.datetime_trend_plots.ForecastVsActualPlotPreview(project_id, model_id, start_date, end_date, bins)

Forecast vs Actual plot preview for datetime model.

New in version v2.25.

Notes

Bin is a dict containing the following:

  • start_date: datetime.datetime
    The datetime of the start of the bin (inclusive).
  • end_date: datetime.datetime
    The datetime of the end of the bin (exclusive).
  • actual: float or None
    Average actual value of the target in the bin. None if there are no entries in the bin.
  • predicted: float or None
    Average prediction of the model in the bin. None if there are no entries in the bin.
Attributes:
project_id: string

The project ID.

model_id: string

The model ID.

start_date: datetime.datetime

The datetime of the start of the chart data (inclusive).

end_date: datetime.datetime

The datetime of the end of the chart data (exclusive).

bins: list of dict

List of plot bins. See bin info in Notes for more details.

class datarobot.models.datetime_trend_plots.AnomalyOverTimePlotsMetadata(project_id, model_id, resolutions, backtest_metadata, holdout_metadata, backtest_statuses, holdout_statuses)

Anomaly over Time metadata for datetime model.

New in version v2.25.

Notes

Backtest/holdout status is a dict containing the following:

  • training: string
    Status backtest/holdout training. One of datarobot.enums.DATETIME_TREND_PLOTS_STATUS
  • validation: string
    Status backtest/holdout validation. One of datarobot.enums.DATETIME_TREND_PLOTS_STATUS

Backtest/holdout metadata is a dict containing the following:

  • training: dict
    Start and end dates for the backtest/holdout training.
  • validation: dict
    Start and end dates for the backtest/holdout validation.

Each dict in the training and validation in backtest/holdout metadata is structured like:

  • start_date: datetime.datetime or None
    The datetime of the start of the chart data (inclusive). None if chart data is not computed.
  • end_date: datetime.datetime or None
    The datetime of the end of the chart data (exclusive). None if chart data is not computed.
Attributes:
project_id: string

The project ID.

model_id: string

The model ID.

resolutions: list of string

A list of datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION, which represents available time resolutions for which plots can be retrieved.

backtest_metadata: list of dict

List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details.

holdout_metadata: dict

Holdout metadata dict. See backtest/holdout metadata info in Notes for more details.

backtest_statuses: list of dict

List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details.

holdout_statuses: dict

Holdout status dict. See backtest/holdout status info in Notes for more details.

class datarobot.models.datetime_trend_plots.AnomalyOverTimePlot(project_id, model_id, start_date, end_date, resolution, bins, calendar_events)

Anomaly over Time plot for datetime model.

New in version v2.25.

Notes

Bin is a dict containing the following:

  • start_date: datetime.datetime
    The datetime of the start of the bin (inclusive).
  • end_date: datetime.datetime
    The datetime of the end of the bin (exclusive).
  • predicted: float or None
    Average prediction of the model in the bin. None if there are no entries in the bin.
  • frequency: int or None
    Indicates number of values averaged in bin.

Calendar event is a dict containing the following:

  • name: string
    Name of the calendar event.
  • date: datetime
    Date of the calendar event.
  • series_id: string or None
    The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.
Attributes:
project_id: string

The project ID.

model_id: string

The model ID.

resolution: string

The resolution that is used for binning. One of datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION

start_date: datetime.datetime

The datetime of the start of the chart data (inclusive).

end_date: datetime.datetime

The datetime of the end of the chart data (exclusive).

bins: list of dict

List of plot bins. See bin info in Notes for more details.

calendar_events: list of dict

List of calendar events for the plot. See calendar events info in Notes for more details.

class datarobot.models.datetime_trend_plots.AnomalyOverTimePlotPreview(project_id, model_id, prediction_threshold, start_date, end_date, bins)

Anomaly over Time plot preview for datetime model.

New in version v2.25.

Notes

Bin is a dict containing the following:

  • start_date: datetime.datetime
    The datetime of the start of the bin (inclusive).
  • end_date: datetime.datetime
    The datetime of the end of the bin (exclusive).
Attributes:
project_id: string

The project ID.

model_id: string

The model ID.

prediction_threshold: float

Only bins with predictions exceeding this threshold are returned in the response.

start_date: datetime.datetime

The datetime of the start of the chart data (inclusive).

end_date: datetime.datetime

The datetime of the end of the chart data (exclusive).

bins: list of dict

List of plot bins. See bin info in Notes for more details.

Deployment

class datarobot.models.Deployment(id: str, label: Optional[str] = None, description: Optional[str] = None, status: Optional[str] = None, default_prediction_server: Optional[PredictionServer] = None, model: Optional[ModelDict] = None, capabilities: Optional[Dict[str, Any]] = None, prediction_usage: Optional[PredictionUsage] = None, permissions: Optional[List[str]] = None, service_health: Optional[Health] = None, model_health: Optional[Health] = None, accuracy_health: Optional[Health] = None, importance: Optional[str] = None, fairness_health: Optional[Health] = None, governance: Optional[Dict[str, Any]] = None, owners: Optional[Dict[str, Any]] = None, prediction_environment: Optional[Dict[str, Any]] = None)

A deployment created from a DataRobot model.

Attributes:
id : str

the id of the deployment

label : str

the label of the deployment

description : str

the description of the deployment

status : str

(New in version v2.29) deployment status

default_prediction_server : dict

Information about the default prediction server for the deployment. Accepts the following values:

  • id: str. Prediction server ID.
  • url: str, optional. Prediction server URL.
  • datarobot-key: str. Corresponds the to the PredictionServer’s “snake_cased” datarobot_key parameter that allows you to verify and access the prediction server.
importance : str, optional

deployment importance

model : dict

information on the model of the deployment

capabilities : dict

information on the capabilities of the deployment

prediction_usage : dict

information on the prediction usage of the deployment

permissions : list

(New in version v2.18) user’s permissions on the deployment

service_health : dict

information on the service health of the deployment

model_health : dict

information on the model health of the deployment

accuracy_health : dict

information on the accuracy health of the deployment

fairness_health : dict

information on the fairness health of a deployment

governance : dict

information on approval and change requests of a deployment

owners : dict

information on the owners of a deployment

prediction_environment : dict

information on the prediction environment of a deployment

classmethod create_from_learning_model(model_id: str, label: str, description: Optional[str] = None, default_prediction_server_id: Optional[str] = None, importance: Optional[str] = None, prediction_threshold: Optional[float] = None, status: Optional[str] = None) → TDeployment

Create a deployment from a DataRobot model.

New in version v2.17.

Parameters:
model_id : str

id of the DataRobot model to deploy

label : str

a human-readable label of the deployment

description : str, optional

a human-readable description of the deployment

default_prediction_server_id : str, optional

an identifier of a prediction server to be used as the default prediction server

importance : str, optional

deployment importance

prediction_threshold : float, optional

threshold used for binary classification in predictions

status : str, optional

deployment status

Returns:
deployment : Deployment

The created deployment

Examples

from datarobot import Project, Deployment
project = Project.get('5506fcd38bd88f5953219da0')
model = project.get_models()[0]
deployment = Deployment.create_from_learning_model(model.id, 'New Deployment')
deployment
>>> Deployment('New Deployment')
classmethod create_from_custom_model_version(custom_model_version_id: str, label: str, description: Optional[str] = None, default_prediction_server_id: Optional[str] = None, max_wait: int = 600, importance: Optional[str] = None) → TDeployment

Create a deployment from a DataRobot custom model image.

Parameters:
custom_model_version_id : str

id of the DataRobot custom model version to deploy The version must have a base_environment_id.

label : str

a human readable label of the deployment

description : str, optional

a human readable description of the deployment

default_prediction_server_id : str, optional

an identifier of a prediction server to be used as the default prediction server

max_wait : int, optional

seconds to wait for successful resolution of a deployment creation job. Deployment supports making predictions only after a deployment creating job has successfully finished

importance : str, optional

deployment importance

Returns:
deployment : Deployment

The created deployment

classmethod list(order_by: Optional[str] = None, search: Optional[str] = None, filters: Optional[datarobot.models.deployment.DeploymentListFilters] = None) → List[TDeployment]

List all deployments a user can view.

New in version v2.17.

Parameters:
order_by : str, optional

(New in version v2.18) the order to sort the deployment list by, defaults to label

Allowed attributes to sort by are:

  • label
  • serviceHealth
  • modelHealth
  • accuracyHealth
  • recentPredictions
  • lastPredictionTimestamp

If the sort attribute is preceded by a hyphen, deployments will be sorted in descending order, otherwise in ascending order.

For health related sorting, ascending means failing, warning, passing, unknown.

search : str, optional

(New in version v2.18) case insensitive search against deployment’s label and description.

filters : datarobot.models.deployment.DeploymentListFilters, optional

(New in version v2.20) an object containing all filters that you’d like to apply to the resulting list of deployments. See DeploymentListFilters for details on usage.

Returns:
deployments : list

a list of deployments the user can view

Examples

from datarobot import Deployment
deployments = Deployment.list()
deployments
>>> [Deployment('New Deployment'), Deployment('Previous Deployment')]
from datarobot import Deployment
from datarobot.enums import DEPLOYMENT_SERVICE_HEALTH_STATUS
filters = DeploymentListFilters(
    role='OWNER',
    service_health=[DEPLOYMENT_SERVICE_HEALTH.FAILING]
)
filtered_deployments = Deployment.list(filters=filters)
filtered_deployments
>>> [Deployment('Deployment I Own w/ Failing Service Health')]
classmethod get(deployment_id: str) → TDeployment

Get information about a deployment.

New in version v2.17.

Parameters:
deployment_id : str

the id of the deployment

Returns:
deployment : Deployment

the queried deployment

Examples

from datarobot import Deployment
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
deployment.id
>>>'5c939e08962d741e34f609f0'
deployment.label
>>>'New Deployment'
predict_batch(source: Union[str, pandas.core.frame.DataFrame, io.IOBase], passthrough_columns: Optional[List[str]] = None, download_timeout: Optional[int] = None, download_read_timeout: Optional[int] = None, upload_read_timeout: Optional[int] = None) → pandas.core.frame.DataFrame

Using a deployment, make batch predictions and return results as a DataFrame.

If a DataFrame is passed as source, then the prediction results are merged with the original DataFrame and a new DataFrame is returned.

New in version v3.0.

Parameters:
source: str, pd.DataFrame or file object

Pass a filepath, file, or DataFrame for making batch predictions.

passthrough_columns : list[string] (optional)

Keep these columns from the scoring dataset in the scored dataset. This is useful for correlating predictions with source data.

download_timeout: int, optional

Wait this many seconds for the download to become available. See datarobot.models.BatchPredictionJob.score().

download_read_timeout: int, optional

Wait this many seconds for the server to respond between chunks. See datarobot.models.BatchPredictionJob.score().

upload_read_timeout: int, optional

Wait this many seconds for the server to respond after a whole dataset upload. See datarobot.models.BatchPredictionJob.score().

Returns:
pd.DataFrame

Prediction results in a pandas DataFrame.

Raises:
InvalidUsageError

If the source parameter cannot be determined to be a filepath, file, or DataFrame.

Examples

from datarobot.models.deployment import Deployment

deployment = Deployment.get("<MY_DEPLOYMENT_ID>")
prediction_results_as_dataframe = deployment.predict_batch(
    source="./my_local_file.csv",
)
get_uri() → str
Returns:
url : str

Deployment’s overview URI

update(label: Optional[str] = None, description: Optional[str] = None, importance: Optional[str] = None) → None

Update the label and description of this deployment.

New in version v2.19.

delete() → None

Delete this deployment.

New in version v2.17.

activate(max_wait: int = 600) → None

Activates this deployment. When succeeded, deployment status become active.

New in version v2.29.

Parameters:
max_wait : int, optional

The maximum time to wait for deployment activation to complete before erroring

deactivate(max_wait: int = 600) → None

Deactivates this deployment. When succeeded, deployment status become inactive.

New in version v2.29.

Parameters:
max_wait : int, optional

The maximum time to wait for deployment deactivation to complete before erroring

replace_model(new_model_id: str, reason: str, max_wait: int = 600) → None
Replace the model used in this deployment. To confirm model replacement eligibility, use
validate_replacement_model() beforehand.

New in version v2.17.

Model replacement is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.

Predictions made against this deployment will start using the new model as soon as the request is completed. There will be no interruption for predictions throughout the process.

Parameters:
new_model_id : str

The id of the new model to use. If replacing the deployment’s model with a CustomInferenceModel, a specific CustomModelVersion ID must be used.

reason : MODEL_REPLACEMENT_REASON

The reason for the model replacement. Must be one of ‘ACCURACY’, ‘DATA_DRIFT’, ‘ERRORS’, ‘SCHEDULED_REFRESH’, ‘SCORING_SPEED’, or ‘OTHER’. This value will be stored in the model history to keep track of why a model was replaced

max_wait : int, optional

(new in version 2.22) The maximum time to wait for model replacement job to complete before erroring

Examples

from datarobot import Deployment
from datarobot.enums import MODEL_REPLACEMENT_REASON
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
deployment.model['id'], deployment.model['type']
>>>('5c0a979859b00004ba52e431', 'Decision Tree Classifier (Gini)')

deployment.replace_model('5c0a969859b00004ba52e41b', MODEL_REPLACEMENT_REASON.ACCURACY)
deployment.model['id'], deployment.model['type']
>>>('5c0a969859b00004ba52e41b', 'Support Vector Classifier (Linear Kernel)')
validate_replacement_model(new_model_id: str) → Tuple[str, str, Dict[str, Any]]

Validate a model can be used as the replacement model of the deployment.

New in version v2.17.

Parameters:
new_model_id : str

the id of the new model to validate

Returns:
status : str

status of the validation, will be one of ‘passing’, ‘warning’ or ‘failing’. If the status is passing or warning, use replace_model() to perform a model replacement. If the status is failing, refer to checks for more detail on why the new model cannot be used as a replacement.

message : str

message for the validation result

checks : dict

explain why the new model can or cannot replace the deployment’s current model

get_features() → List[FeatureDict]

Retrieve the list of features needed to make predictions on this deployment.

Returns:
features: list

a list of feature dict

Notes

Each feature dict contains the following structure:

  • name : str, feature name
  • feature_type : str, feature type
  • importance : float, numeric measure of the relationship strength between the feature and target (independent of model or other features)
  • date_format : str or None, the date format string for how this feature was interpreted, null if not a date feature, compatible with https://docs.python.org/2/library/time.html#time.strftime.
  • known_in_advance : bool, whether the feature was selected as known in advance in a time series model, false for non-time series models.

Examples

from datarobot import Deployment
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
features = deployment.get_features()
features[0]['feature_type']
>>>'Categorical'
features[0]['importance']
>>>0.133
submit_actuals(data: Union[pd.DataFrame, List[Actual]], batch_size: int = 10000) → None

Submit actuals for processing. The actuals submitted will be used to calculate accuracy metrics.

Parameters:
data: list or pandas.DataFrame
batch_size: the max number of actuals in each request
If `data` is a list, each item should be a dict-like object with the following keys and
values; if `data` is a pandas.DataFrame, it should contain the following columns:
- association_id: str, a unique identifier used with a prediction,

max length 128 characters

- actual_value: str or int or float, the actual value of a prediction;

should be numeric for deployments with regression models or string for deployments with classification model

- was_acted_on: bool, optional, indicates if the prediction was acted on in a way that

could have affected the actual outcome

- timestamp: datetime or string in RFC3339 format, optional. If the datetime provided

does not have a timezone, we assume it is UTC.

Raises:
ValueError

if input data is not a list of dict-like objects or a pandas.DataFrame if input data is empty

Examples

from datarobot import Deployment, AccuracyOverTime
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
data = [{
    'association_id': '439917',
    'actual_value': 'True',
    'was_acted_on': True
}]
deployment.submit_actuals(data)
get_predictions_by_forecast_date_settings() → ForecastDateSettings

Retrieve predictions by forecast date settings of this deployment.

New in version v2.27.

Returns:
settings : dict

Predictions by forecast date settings of the deployment is a dict with the following format:

enabled : bool

Is ‘’True’’ if predictions by forecast date is enabled for this deployment. To update this setting, see update_predictions_by_forecast_date_settings()

column_name : string

The column name in prediction datasets to be used as forecast date.

datetime_format : string

The datetime format of the forecast date column in prediction datasets.

update_predictions_by_forecast_date_settings(enable_predictions_by_forecast_date: bool, forecast_date_column_name: Optional[str] = None, forecast_date_format: Optional[str] = None, max_wait: int = 600) → None

Update predictions by forecast date settings of this deployment.

New in version v2.27.

Updating predictions by forecast date setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.

Parameters:
enable_predictions_by_forecast_date : bool

set to ‘’True’’ if predictions by forecast date is to be turned on or set to ‘’False’’ if predictions by forecast date is to be turned off.

forecast_date_column_name: string, optional

The column name in prediction datasets to be used as forecast date. If ‘’enable_predictions_by_forecast_date’’ is set to ‘’False’’, then the parameter will be ignored.

forecast_date_format: string, optional

The datetime format of the forecast date column in prediction datasets. If ‘’enable_predictions_by_forecast_date’’ is set to ‘’False’’, then the parameter will be ignored.

max_wait : int, optional

seconds to wait for successful

Examples

# To set predictions by forecast date settings to the same default settings you see when using
# the DataRobot web application, you use your 'Deployment' object like this:
deployment.update_predictions_by_forecast_date_settings(
   enable_predictions_by_forecast_date=True,
   forecast_date_column_name="date (actual)",
   forecast_date_format="%Y-%m-%d",
)
get_challenger_models_settings() → ChallengerModelsSettings

Retrieve challenger models settings of this deployment.

New in version v2.27.

Returns:
settings : dict

Challenger models settings of the deployment is a dict with the following format:

enabled : bool

Is ‘’True’’ if challenger models is enabled for this deployment. To update existing ‘’challenger_models’’ settings, see update_challenger_models_settings()

update_challenger_models_settings(challenger_models_enabled: bool, max_wait: int = 600) → None

Update challenger models settings of this deployment.

New in version v2.27.

Updating challenger models setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.

Parameters:
challenger_models_enabled : bool

set to ‘’True’’ if challenger models is to be turned on or set to ‘’False’’ if challenger models is to be turned off

max_wait : int, optional

seconds to wait for successful resolution

get_segment_analysis_settings() → SegmentAnalysisSettings

Retrieve segment analysis settings of this deployment.

New in version v2.27.

Returns:
settings : dict

Segment analysis settings of the deployment containing two items with keys enabled and attributes, which are further described below.

enabled : bool

Set to ‘’True’’ if segment analysis is enabled for this deployment. To update existing setting, see update_segment_analysis_settings()

attributes : list

To create or update existing segment analysis attributes, see update_segment_analysis_settings()

update_segment_analysis_settings(segment_analysis_enabled: bool, segment_analysis_attributes: Optional[List[str]] = None, max_wait: int = 600) → None

Update segment analysis settings of this deployment.

New in version v2.27.

Updating segment analysis setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.

Parameters:
segment_analysis_enabled : bool

set to ‘’True’’ if segment analysis is to be turned on or set to ‘’False’’ if segment analysis is to be turned off

segment_analysis_attributes: list, optional

A list of strings that gives the segment attributes selected for tracking.

max_wait : int, optional

seconds to wait for successful resolution

get_drift_tracking_settings() → DriftTrackingSettings

Retrieve drift tracking settings of this deployment.

New in version v2.17.

Returns:
settings : dict

Drift tracking settings of the deployment containing two nested dicts with key target_drift and feature_drift, which are further described below.

Target drift setting contains:

enabled : bool

If target drift tracking is enabled for this deployment. To create or update existing ‘’target_drift’’ settings, see update_drift_tracking_settings()

Feature drift setting contains:

enabled : bool

If feature drift tracking is enabled for this deployment. To create or update existing ‘’feature_drift’’ settings, see update_drift_tracking_settings()

update_drift_tracking_settings(target_drift_enabled: Optional[bool] = None, feature_drift_enabled: Optional[bool] = None, max_wait: int = 600) → None

Update drift tracking settings of this deployment.

New in version v2.17.

Updating drift tracking setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.

Parameters:
target_drift_enabled : bool, optional

if target drift tracking is to be turned on

feature_drift_enabled : bool, optional

if feature drift tracking is to be turned on

max_wait : int, optional

seconds to wait for successful resolution

get_association_id_settings() → str

Retrieve association ID setting for this deployment.

New in version v2.19.

Returns:
association_id_settings : dict in the following format:
column_names : list[string], optional

name of the columns to be used as association ID,

required_in_prediction_requests : bool, optional

whether the association ID column is required in prediction requests

update_association_id_settings(column_names: Optional[List[str]] = None, required_in_prediction_requests: Optional[bool] = None, max_wait: int = 600) → None

Update association ID setting for this deployment.

New in version v2.19.

Parameters:
column_names : list[string], optional

name of the columns to be used as association ID, currently only support a list of one string

required_in_prediction_requests : bool, optional

whether the association ID column is required in prediction requests

max_wait : int, optional

seconds to wait for successful resolution

get_predictions_data_collection_settings() → Dict[str, bool]

Retrieve predictions data collection settings of this deployment.

New in version v2.21.

Returns:
predictions_data_collection_settings : dict in the following format:
enabled : bool

If predictions data collection is enabled for this deployment. To update existing ‘’predictions_data_collection’’ settings, see update_predictions_data_collection_settings()

update_predictions_data_collection_settings(enabled: bool, max_wait: int = 600) → None

Update predictions data collection settings of this deployment.

New in version v2.21.

Updating predictions data collection setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.

Parameters:
enabled: bool

if predictions data collection is to be turned on

max_wait : int, optional

seconds to wait for successful resolution

get_prediction_warning_settings() → PredictionWarningSettings

Retrieve prediction warning settings of this deployment.

New in version v2.19.

Returns:
settings : dict in the following format:
enabled : bool

If target prediction_warning is enabled for this deployment. To create or update existing ‘’prediction_warning’’ settings, see update_prediction_warning_settings()

custom_boundaries : dict or None
If None default boundaries for a model are used. Otherwise has following keys:
upper : float

All predictions greater than provided value are considered anomalous

lower : float

All predictions less than provided value are considered anomalous

update_prediction_warning_settings(prediction_warning_enabled: bool, use_default_boundaries: Optional[bool] = None, lower_boundary: Optional[float] = None, upper_boundary: Optional[float] = None, max_wait: int = 600) → None

Update prediction warning settings of this deployment.

New in version v2.19.

Parameters:
prediction_warning_enabled : bool

If prediction warnings should be turned on.

use_default_boundaries : bool, optional

If default boundaries of the model should be used for the deployment.

upper_boundary : float, optional

All predictions greater than provided value will be considered anomalous

lower_boundary : float, optional

All predictions less than provided value will be considered anomalous

max_wait : int, optional

seconds to wait for successful resolution

get_prediction_intervals_settings() → PredictionIntervalsSettings

Retrieve prediction intervals settings for this deployment.

New in version v2.19.

Returns:
dict in the following format:
enabled : bool

Whether prediction intervals are enabled for this deployment

percentiles : list[int]

List of enabled prediction intervals’ sizes for this deployment. Currently we only support one percentile at a time.

Notes

Note that prediction intervals are only supported for time series deployments.

update_prediction_intervals_settings(percentiles: List[int], enabled: bool = True, max_wait: int = 600) → None

Update prediction intervals settings for this deployment.

New in version v2.19.

Parameters:
percentiles : list[int]

The prediction intervals percentiles to enable for this deployment. Currently we only support setting one percentile at a time.

enabled : bool, optional (defaults to True)

Whether to enable showing prediction intervals in the results of predictions requested using this deployment.

max_wait : int, optional

seconds to wait for successful resolution

Raises:
AssertionError

If percentiles is in an invalid format

AsyncFailureError

If any of the responses from the server are unexpected

AsyncProcessUnsuccessfulError

If the prediction intervals calculation job has failed or has been cancelled.

AsyncTimeoutError

If the prediction intervals calculation job did not resolve in time

Notes

Updating prediction intervals settings is an asynchronous process, which means some preparatory work may be performed before the settings request is completed. This function will not return until all work is fully finished.

Note that prediction intervals are only supported for time series deployments.

get_service_stats(model_id: Optional[str] = None, start_time: Optional[datetime.datetime] = None, end_time: Optional[datetime.datetime] = None, execution_time_quantile: Optional[float] = None, response_time_quantile: Optional[float] = None, slow_requests_threshold: Optional[float] = None) → datarobot.models.service_stats.ServiceStats

Retrieve value of service stat metrics over a certain time period.

New in version v2.18.

Parameters:
model_id : str, optional

the id of the model

start_time : datetime, optional

start of the time period

end_time : datetime, optional

end of the time period

execution_time_quantile : float, optional

quantile for executionTime, defaults to 0.5

response_time_quantile : float, optional

quantile for responseTime, defaults to 0.5

slow_requests_threshold : float, optional

threshold for slowRequests, defaults to 1000

Returns:
service_stats : ServiceStats

the queried service stats metrics information

get_service_stats_over_time(metric: Optional[str] = None, model_id: Optional[str] = None, start_time: Optional[datetime.datetime] = None, end_time: Optional[datetime.datetime] = None, bucket_size: Optional[str] = None, quantile: Optional[float] = None, threshold: Optional[int] = None) → datarobot.models.service_stats.ServiceStatsOverTime

Retrieve information about how a service stat metric changes over a certain time period.

New in version v2.18.

Parameters:
metric : SERVICE_STAT_METRIC, optional

the service stat metric to retrieve

model_id : str, optional

the id of the model

start_time : datetime, optional

start of the time period

end_time : datetime, optional

end of the time period

bucket_size : str, optional

time duration of a bucket, in ISO 8601 time duration format

quantile : float, optional

quantile for ‘executionTime’ or ‘responseTime’, ignored when querying other metrics

threshold : int, optional

threshold for ‘slowQueries’, ignored when querying other metrics

Returns:
service_stats_over_time : ServiceStatsOverTime

the queried service stats metric over time information

get_target_drift(model_id: Optional[str] = None, start_time: Optional[datetime.datetime] = None, end_time: Optional[datetime.datetime] = None, metric: Optional[str] = None) → datarobot.models.data_drift.TargetDrift

Retrieve target drift information over a certain time period.

New in version v2.21.

Parameters:
model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

metric : str

(New in version v2.22) metric used to calculate the drift score

Returns:
target_drift : TargetDrift

the queried target drift information

get_feature_drift(model_id: Optional[str] = None, start_time: Optional[datetime.datetime] = None, end_time: Optional[datetime.datetime] = None, metric: Optional[str] = None) → List[datarobot.models.data_drift.FeatureDrift]

Retrieve drift information for deployment’s features over a certain time period.

New in version v2.21.

Parameters:
model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

metric : str

(New in version v2.22) The metric used to calculate the drift score. Allowed values include psi, kl_divergence, dissimilarity, hellinger, and js_divergence.

Returns:
feature_drift_data : [FeatureDrift]

the queried feature drift information

get_accuracy(model_id: Optional[str] = None, start_time: Optional[datetime.datetime] = None, end_time: Optional[datetime.datetime] = None, start: Optional[datetime.datetime] = None, end: Optional[datetime.datetime] = None, target_classes: Optional[List[str]] = None) → datarobot.models.accuracy.Accuracy

Retrieve values of accuracy metrics over a certain time period.

New in version v2.18.

Parameters:
model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

target_classes : list[str], optional

Optional list of target class strings

Returns:
accuracy : Accuracy

the queried accuracy metrics information

get_accuracy_over_time(metric: Optional[datarobot.enums.ACCURACY_METRIC] = None, model_id: Optional[str] = None, start_time: Optional[datetime.datetime] = None, end_time: Optional[datetime.datetime] = None, bucket_size: Optional[str] = None, target_classes: Optional[List[str]] = None) → datarobot.models.accuracy.AccuracyOverTime

Retrieve information about how an accuracy metric changes over a certain time period.

New in version v2.18.

Parameters:
metric : ACCURACY_METRIC

the accuracy metric to retrieve

model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

bucket_size : str

time duration of a bucket, in ISO 8601 time duration format

target_classes : list[str], optional

Optional list of target class strings

Returns:
accuracy_over_time : AccuracyOverTime

the queried accuracy metric over time information

update_secondary_dataset_config(secondary_dataset_config_id: str, credential_ids: Optional[List[str]] = None) → str

Update the secondary dataset config used by Feature discovery model for a given deployment.

New in version v2.23.

Parameters:
secondary_dataset_config_id: str

Id of the secondary dataset config

credential_ids: list or None

List of DatasetsCredentials used by the secondary datasets

Examples

from datarobot import Deployment
deployment = Deployment(deployment_id='5c939e08962d741e34f609f0')
config = deployment.update_secondary_dataset_config('5df109112ca582033ff44084')
config
>>> '5df109112ca582033ff44084'
get_secondary_dataset_config() → str

Get the secondary dataset config used by Feature discovery model for a given deployment.

New in version v2.23.

Returns:
secondary_dataset_config : SecondaryDatasetConfigurations

Id of the secondary dataset config

Examples

from datarobot import Deployment
deployment = Deployment(deployment_id='5c939e08962d741e34f609f0')
deployment.update_secondary_dataset_config('5df109112ca582033ff44084')
config = deployment.get_secondary_dataset_config()
config
>>> '5df109112ca582033ff44084'
get_prediction_results(model_id: Optional[str] = None, start_time: Optional[datetime.datetime] = None, end_time: Optional[datetime.datetime] = None, actuals_present: Optional[bool] = None, offset: Optional[int] = None, limit: Optional[int] = None) → List[Dict[str, Any]]

Retrieve a list of prediction results of the deployment.

New in version v2.24.

Parameters:
model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

actuals_present : bool

filters predictions results to only those who have actuals present or with missing actuals

offset : int

this many results will be skipped

limit : int

at most this many results are returned

Returns:
prediction_results: list[dict]

a list of prediction results

Examples

from datarobot import Deployment
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
results = deployment.get_prediction_results()
download_prediction_results(filepath: str, model_id: Optional[str] = None, start_time: Optional[datetime.datetime] = None, end_time: Optional[datetime.datetime] = None, actuals_present: Optional[bool] = None, offset: Optional[int] = None, limit: Optional[int] = None) → None

Download prediction results of the deployment as a CSV file.

New in version v2.24.

Parameters:
filepath : str

path of the csv file

model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

actuals_present : bool

filters predictions results to only those who have actuals present or with missing actuals

offset : int

this many results will be skipped

limit : int

at most this many results are returned

Examples

from datarobot import Deployment
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
results = deployment.download_prediction_results('path_to_prediction_results.csv')
download_scoring_code(filepath: str, source_code: bool = False, include_agent: bool = False, include_prediction_explanations: bool = False, include_prediction_intervals: bool = False) → None

Retrieve scoring code of the current deployed model.

New in version v2.24.

Parameters:
filepath : str

path of the scoring code file

source_code : bool

whether source code or binary of the scoring code will be retrieved

include_agent : bool

whether the scoring code retrieved will include tracking agent

include_prediction_explanations : bool

whether the scoring code retrieved will include prediction explanations

include_prediction_intervals : bool

whether the scoring code retrieved will support prediction intervals

Notes

When setting include_agent or include_predictions_explanations or include_prediction_intervals to True, it can take a considerably longer time to download the scoring code.

Examples

from datarobot import Deployment
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
results = deployment.download_scoring_code('path_to_scoring_code.jar')
delete_monitoring_data(model_id: str, start_time: Optional[datetime.datetime] = None, end_time: Optional[datetime.datetime] = None, max_wait: int = 600) → None

Delete deployment monitoring data.

Parameters:
model_id : str

id of the model to delete monitoring data

start_time : datetime, optional

start of the time period to delete monitoring data

end_time : datetime, optional

end of the time period to delete monitoring data

max_wait : int, optional

seconds to wait for successful resolution

classmethod from_data(data: Union[Dict[str, Any], List[Dict[str, Any]]]) → T

Instantiate an object of this class using a dict.

Parameters:
data : dict

Correctly snake_cased keys and their values.

classmethod from_server_data(data: Union[Dict[str, Any], List[Dict[str, Any]]], keep_attrs: Optional[Iterable[str]] = None) → T

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters:
data : dict

The directly translated dict of JSON from the server. No casing fixes have taken place

keep_attrs : iterable

List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

open_in_browser() → None

Opens class’ relevant web browser location. If default browser is not available the URL is logged.

Note: If text-mode browsers are used, the calling process will block until the user exits the browser.

class datarobot.models.deployment.DeploymentListFilters(role: Optional[str] = None, service_health: Optional[List[str]] = None, model_health: Optional[List[str]] = None, accuracy_health: Optional[List[str]] = None, execution_environment_type: Optional[List[str]] = None, importance: Optional[List[str]] = None)
class datarobot.models.ServiceStats(period: Optional[Period] = None, metrics: Optional[Metrics] = None, model_id: Optional[str] = None)

Deployment service stats information.

Attributes:
model_id : str

the model used to retrieve service stats metrics

period : dict

the time period used to retrieve service stats metrics

metrics : dict

the service stats metrics

classmethod get(deployment_id: str, model_id: Optional[str] = None, start_time: Optional[datetime.datetime] = None, end_time: Optional[datetime.datetime] = None, execution_time_quantile: Optional[float] = None, response_time_quantile: Optional[float] = None, slow_requests_threshold: Optional[float] = None) → datarobot.models.service_stats.ServiceStats

Retrieve value of service stat metrics over a certain time period.

New in version v2.18.

Parameters:
deployment_id : str

the id of the deployment

model_id : str, optional

the id of the model

start_time : datetime, optional

start of the time period

end_time : datetime, optional

end of the time period

execution_time_quantile : float, optional

quantile for executionTime, defaults to 0.5

response_time_quantile : float, optional

quantile for responseTime, defaults to 0.5

slow_requests_threshold : float, optional

threshold for slowRequests, defaults to 1000

Returns:
service_stats : ServiceStats

the queried service stats metrics

class datarobot.models.ServiceStatsOverTime(buckets: Optional[List[Bucket]] = None, summary: Optional[Bucket] = None, metric: Optional[str] = None, model_id: Optional[str] = None)

Deployment service stats over time information.

Attributes:
model_id : str

the model used to retrieve accuracy metric

metric : str

the service stat metric being retrieved

buckets : dict

how the service stat metric changes over time

summary : dict

summary for the service stat metric

classmethod get(deployment_id: str, metric: Optional[str] = None, model_id: Optional[str] = None, start_time: Optional[datetime.datetime] = None, end_time: Optional[datetime.datetime] = None, bucket_size: Optional[str] = None, quantile: Optional[float] = None, threshold: Optional[int] = None) → datarobot.models.service_stats.ServiceStatsOverTime

Retrieve information about how a service stat metric changes over a certain time period.

New in version v2.18.

Parameters:
deployment_id : str

the id of the deployment

metric : SERVICE_STAT_METRIC, optional

the service stat metric to retrieve

model_id : str, optional

the id of the model

start_time : datetime, optional

start of the time period

end_time : datetime, optional

end of the time period

bucket_size : str, optional

time duration of a bucket, in ISO 8601 time duration format

quantile : float, optional

quantile for ‘executionTime’ or ‘responseTime’, ignored when querying other metrics

threshold : int, optional

threshold for ‘slowQueries’, ignored when querying other metrics

Returns:
service_stats_over_time : ServiceStatsOverTime

the queried service stat over time information

bucket_values

The metric value for all time buckets, keyed by start time of the bucket.

Returns:
bucket_values: OrderedDict
class datarobot.models.TargetDrift(period=None, metric=None, model_id=None, target_name=None, drift_score=None, sample_size=None, baseline_sample_size=None)

Deployment target drift information.

Attributes:
model_id : str

the model used to retrieve target drift metric

period : dict

the time period used to retrieve target drift metric

metric : str

the data drift metric

target_name : str

name of the target

drift_score : float

target drift score

sample_size : int

count of data points for comparison

baseline_sample_size : int

count of data points for baseline

classmethod get(deployment_id: str, model_id: Optional[str] = None, start_time: Optional[datetime.datetime] = None, end_time: Optional[datetime.datetime] = None, metric: Optional[str] = None) → datarobot.models.data_drift.TargetDrift

Retrieve target drift information over a certain time period.

New in version v2.21.

Parameters:
deployment_id : str

the id of the deployment

model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

metric : str

(New in version v2.22) metric used to calculate the drift score

Returns:
target_drift : TargetDrift

the queried target drift information

Examples

from datarobot import Deployment, TargetDrift
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
target_drift = TargetDrift.get(deployment.id)
target_drift.period['end']
>>>'2019-08-01 00:00:00+00:00'
target_drift.drift_score
>>>0.03423
accuracy.target_name
>>>'readmitted'
class datarobot.models.FeatureDrift(period=None, metric=None, model_id=None, name=None, drift_score=None, feature_impact=None, sample_size=None, baseline_sample_size=None)

Deployment feature drift information.

Attributes:
model_id : str

the model used to retrieve feature drift metric

period : dict

the time period used to retrieve feature drift metric

metric : str

the data drift metric

name : str

name of the feature

drift_score : float

feature drift score

sample_size : int

count of data points for comparison

baseline_sample_size : int

count of data points for baseline

classmethod list(deployment_id: str, model_id: Optional[str] = None, start_time: Optional[datetime.datetime] = None, end_time: Optional[datetime.datetime] = None, metric: Optional[str] = None) → List[datarobot.models.data_drift.FeatureDrift]

Retrieve drift information for deployment’s features over a certain time period.

New in version v2.21.

Parameters:
deployment_id : str

the id of the deployment

model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

metric : str

(New in version v2.22) metric used to calculate the drift score

Returns:
feature_drift_data : [FeatureDrift]

the queried feature drift information

Examples

from datarobot import Deployment, TargetDrift
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
feature_drift = FeatureDrift.list(deployment.id)[0]
feature_drift.period
>>>'2019-08-01 00:00:00+00:00'
feature_drift.drift_score
>>>0.252
feature_drift.name
>>>'age'
class datarobot.models.Accuracy(period: Optional[Period] = None, metrics: Optional[Dict[str, Metric]] = None, model_id: Optional[str] = None)

Deployment accuracy information.

Attributes:
model_id : str

the model used to retrieve accuracy metrics

period : dict

the time period used to retrieve accuracy metrics

metrics : dict

the accuracy metrics

classmethod get(deployment_id: str, model_id: Optional[str] = None, start_time: Optional[datetime.datetime] = None, end_time: Optional[datetime.datetime] = None, target_classes: Optional[List[str]] = None) → datarobot.models.accuracy.Accuracy

Retrieve values of accuracy metrics over a certain time period.

New in version v2.18.

Parameters:
deployment_id : str

the id of the deployment

model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

target_classes : list[str], optional

Optional list of target class strings

Returns:
accuracy : Accuracy

the queried accuracy metrics information

Examples

from datarobot import Deployment, Accuracy
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
accuracy = Accuracy.get(deployment.id)
accuracy.period['end']
>>>'2019-08-01 00:00:00+00:00'
accuracy.metric['LogLoss']['value']
>>>0.7533
accuracy.metric_values['LogLoss']
>>>0.7533
metric_values

The value for all metrics, keyed by metric name.

Returns:
metric_values: Dict
metric_baselines

The baseline value for all metrics, keyed by metric name.

Returns:
metric_baselines: Dict
percent_changes

The percent change of value over baseline for all metrics, keyed by metric name.

Returns:
percent_changes: Dict
class datarobot.models.AccuracyOverTime(buckets: Optional[List[Bucket]] = None, summary: Optional[Summary] = None, baseline: Optional[Bucket] = None, metric: Optional[str] = None, model_id: Optional[str] = None)

Deployment accuracy over time information.

Attributes:
model_id : str

the model used to retrieve accuracy metric

metric : str

the accuracy metric being retrieved

buckets : dict

how the accuracy metric changes over time

summary : dict

summary for the accuracy metric

baseline : dict

baseline for the accuracy metric

classmethod get(deployment_id: str, metric: Optional[datarobot.enums.ACCURACY_METRIC] = None, model_id: Optional[str] = None, start_time: Optional[datetime.datetime] = None, end_time: Optional[datetime.datetime] = None, bucket_size: Optional[str] = None, target_classes: Optional[List[str]] = None) → datarobot.models.accuracy.AccuracyOverTime

Retrieve information about how an accuracy metric changes over a certain time period.

New in version v2.18.

Parameters:
deployment_id : str

the id of the deployment

metric : ACCURACY_METRIC

the accuracy metric to retrieve

model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

bucket_size : str

time duration of a bucket, in ISO 8601 time duration format

target_classes : list[str], optional

Optional list of target class strings

Returns:
accuracy_over_time : AccuracyOverTime

the queried accuracy metric over time information

Examples

from datarobot import Deployment, AccuracyOverTime
from datarobot.enums import ACCURACY_METRICS
deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0')
accuracy_over_time = AccuracyOverTime.get(deployment.id, metric=ACCURACY_METRIC.LOGLOSS)
accuracy_over_time.metric
>>>'LogLoss'
accuracy_over_time.metric_values
>>>{datetime.datetime(2019, 8, 1): 0.73, datetime.datetime(2019, 8, 2): 0.55}
classmethod get_as_dataframe(deployment_id: str, metrics: Optional[List[datarobot.enums.ACCURACY_METRIC]] = None, model_id: Optional[str] = None, start_time: Optional[datetime.datetime] = None, end_time: Optional[datetime.datetime] = None, bucket_size: Optional[str] = None) → pandas.core.frame.DataFrame

Retrieve information about how a list of accuracy metrics change over a certain time period as pandas DataFrame.

In the returned DataFrame, the columns corresponds to the metrics being retrieved; the rows are labeled with the start time of each bucket.

Parameters:
deployment_id : str

the id of the deployment

metrics : [ACCURACY_METRIC]

the accuracy metrics to retrieve

model_id : str

the id of the model

start_time : datetime

start of the time period

end_time : datetime

end of the time period

bucket_size : str

time duration of a bucket, in ISO 8601 time duration format

Returns:
accuracy_over_time: pd.DataFrame
bucket_values

The metric value for all time buckets, keyed by start time of the bucket.

Returns:
bucket_values: Dict
bucket_sample_sizes

The sample size for all time buckets, keyed by start time of the bucket.

Returns:
bucket_sample_sizes: Dict

External Scores and Insights

class datarobot.ExternalScores(project_id: str, scores: List[Score], model_id: Optional[str] = None, dataset_id: Optional[str] = None, actual_value_column: Optional[str] = None)

Metric scores on prediction dataset with target or actual value column in unsupervised case. Contains project metrics for supervised and special classification metrics set for unsupervised projects.

New in version v2.21.

Examples

List all scores for a dataset

import datarobot as dr
scores = dr.Scores.list(project_id, dataset_id=dataset_id)
Attributes:
project_id: str

id of the project the model belongs to

model_id: str

id of the model

dataset_id: str

id of the prediction dataset with target or actual value column for unsupervised case

actual_value_column: str, optional

For unsupervised projects only. Actual value column which was used to calculate the classification metrics and insights on the prediction dataset.

scores: list of dicts in a form of {‘label’: metric_name, ‘value’: score}

Scores on the dataset.

classmethod create(project_id: str, model_id: str, dataset_id: str, actual_value_column: Optional[str] = None) → Job

Compute an external dataset insights for the specified model.

Parameters:
project_id : str

id of the project the model belongs to

model_id : str

id of the model for which insights is requested

dataset_id : str

id of the dataset for which insights is requested

actual_value_column : str, optional

actual values column label, for unsupervised projects only

Returns:
job : Job

an instance of created async job

classmethod list(project_id: str, model_id: Optional[str] = None, dataset_id: Optional[str] = None, offset: int = 0, limit: int = 100) → List[datarobot.models.external_dataset_scores_insights.external_scores.ExternalScores]

Fetch external scores list for the project and optionally for model and dataset.

Parameters:
project_id: str

id of the project

model_id: str, optional

if specified, only scores for this model will be retrieved

dataset_id: str, optional

if specified, only scores for this dataset will be retrieved

offset: int, optional

this many results will be skipped, default: 0

limit: int, optional

at most this many results are returned, default: 100, max 1000. To return all results, specify 0

Returns:
A list of : py:class:External Scores <datarobot.ExternalScores> objects
classmethod get(project_id: str, model_id: str, dataset_id: str) → datarobot.models.external_dataset_scores_insights.external_scores.ExternalScores

Retrieve external scores for the project, model and dataset.

Parameters:
project_id: str

id of the project

model_id: str

if specified, only scores for this model will be retrieved

dataset_id: str

if specified, only scores for this dataset will be retrieved

Returns:
External Scores object
class datarobot.ExternalLiftChart(dataset_id: str, bins: List[Bin])

Lift chart for the model and prediction dataset with target or actual value column in unsupervised case.

New in version v2.21.

LiftChartBin is a dict containing the following:

  • actual (float) Sum of actual target values in bin
  • predicted (float) Sum of predicted target values in bin
  • bin_weight (float) The weight of the bin. For weighted projects, it is the sum of the weights of the rows in the bin. For unweighted projects, it is the number of rows in the bin.
Attributes:
dataset_id: str

id of the prediction dataset with target or actual value column for unsupervised case

bins: list of dict

List of dicts with schema described as LiftChartBin above.

classmethod list(project_id: str, model_id: str, dataset_id: Optional[str] = None, offset: int = 0, limit: int = 100) → List[datarobot.models.external_dataset_scores_insights.external_lift_chart.ExternalLiftChart]

Retrieve list of the lift charts for the model.

Parameters:
project_id: str

id of the project

model_id: str

if specified, only lift chart for this model will be retrieved

dataset_id: str, optional

if specified, only lift chart for this dataset will be retrieved

offset: int, optional

this many results will be skipped, default: 0

limit: int, optional

at most this many results are returned, default: 100, max 1000. To return all results, specify 0

Returns:
A list of : py:class:ExternalLiftChart <datarobot.ExternalLiftChart> objects
classmethod get(project_id: str, model_id: str, dataset_id: str) → datarobot.models.external_dataset_scores_insights.external_lift_chart.ExternalLiftChart

Retrieve lift chart for the model and prediction dataset.

Parameters:
project_id: str

project id

model_id: str

model id

dataset_id: str

prediction dataset id with target or actual value column for unsupervised case

Returns:
ExternalLiftChart object
class datarobot.ExternalRocCurve(dataset_id: str, roc_points: List[EstimatedMetric], negative_class_predictions: List[float], positive_class_predictions: List[float])

ROC curve data for the model and prediction dataset with target or actual value column in unsupervised case.

New in version v2.21.

Attributes:
dataset_id: str

id of the prediction dataset with target or actual value column for unsupervised case

roc_points: list of dict

List of precalculated metrics associated with thresholds for ROC curve.

negative_class_predictions: list of float

List of predictions from example for negative class

positive_class_predictions: list of float

List of predictions from example for positive class

classmethod list(project_id: str, model_id: str, dataset_id: Optional[str] = None, offset: int = 0, limit: int = 100) → List[datarobot.models.external_dataset_scores_insights.external_roc_curve.ExternalRocCurve]

Retrieve list of the roc curves for the model.

Parameters:
project_id: str

id of the project

model_id: str

if specified, only lift chart for this model will be retrieved

dataset_id: str, optional

if specified, only lift chart for this dataset will be retrieved

offset: int, optional

this many results will be skipped, default: 0

limit: int, optional

at most this many results are returned, default: 100, max 1000. To return all results, specify 0

Returns:
A list of : py:class:ExternalRocCurve <datarobot.ExternalRocCurve> objects
classmethod get(project_id: str, model_id: str, dataset_id: str) → datarobot.models.external_dataset_scores_insights.external_roc_curve.ExternalRocCurve

Retrieve ROC curve chart for the model and prediction dataset.

Parameters:
project_id: str

project id

model_id: str

model id

dataset_id: str

prediction dataset id with target or actual value column for unsupervised case

Returns:
ExternalRocCurve object

Feature

class datarobot.models.Feature(id, project_id=None, name=None, feature_type=None, importance=None, low_information=None, unique_count=None, na_count=None, date_format=None, min=None, max=None, mean=None, median=None, std_dev=None, time_series_eligible=None, time_series_eligibility_reason=None, time_step=None, time_unit=None, target_leakage=None, feature_lineage_id=None, key_summary=None, multilabel_insights=None)

A feature from a project’s dataset

These are features either included in the originally uploaded dataset or added to it via feature transformations. In time series projects, these will be distinct from the ModelingFeature s created during partitioning; otherwise, they will correspond to the same features. For more information about input and modeling features, see the time series documentation.

The min, max, mean, median, and std_dev attributes provide information about the distribution of the feature in the EDA sample data. For non-numeric features or features created prior to these summary statistics becoming available, they will be None. For features where the summary statistics are available, they will be in a format compatible with the data type, i.e. date type features will have their summary statistics expressed as ISO-8601 formatted date strings.

Attributes:
id : int

the id for the feature - note that name is used to reference the feature instead of id

project_id : str

the id of the project the feature belongs to

name : str

the name of the feature

feature_type : str

the type of the feature, e.g. ‘Categorical’, ‘Text’

importance : float or None

numeric measure of the strength of relationship between the feature and target (independent of any model or other features); may be None for non-modeling features such as partition columns

low_information : bool

whether a feature is considered too uninformative for modeling (e.g. because it has too few values)

unique_count : int

number of unique values

na_count : int or None

number of missing values

date_format : str or None

For Date features, the date format string for how this feature was interpreted, compatible with https://docs.python.org/2/library/time.html#time.strftime . For other feature types, None.

min : str, int, float, or None

The minimum value of the source data in the EDA sample

max : str, int, float, or None

The maximum value of the source data in the EDA sample

mean : str, int, or, float

The arithmetic mean of the source data in the EDA sample

median : str, int, float, or None

The median of the source data in the EDA sample

std_dev : str, int, float, or None

The standard deviation of the source data in the EDA sample

time_series_eligible : bool

Whether this feature can be used as the datetime partition column in a time series project.

time_series_eligibility_reason : str

Why the feature is ineligible for the datetime partition column in a time series project, or ‘suitable’ when it is eligible.

time_step : int or None

For time series eligible features, a positive integer determining the interval at which windows can be specified. If used as the datetime partition column on a time series project, the feature derivation and forecast windows must start and end at an integer multiple of this value. None for features that are not time series eligible.

time_unit : str or None

For time series eligible features, the time unit covered by a single time step, e.g. ‘HOUR’, or None for features that are not time series eligible.

target_leakage : str

Whether a feature is considered to have target leakage or not. A value of ‘SKIPPED_DETECTION’ indicates that target leakage detection was not run on the feature. ‘FALSE’ indicates no leakage, ‘MODERATE’ indicates a moderate risk of target leakage, and ‘HIGH_RISK’ indicates a high risk of target leakage

feature_lineage_id : str

id of a lineage for automatically discovered features or derived time series features.

key_summary: list of dict

Statistics for top 50 keys (truncated to 103 characters) of Summarized Categorical column example:

{‘key’:’DataRobot’, ‘summary’:{‘min’:0, ‘max’:29815.0, ‘stdDev’:6498.029, ‘mean’:1490.75, ‘median’:0.0, ‘pctRows’:5.0}}

where,
key: string or None

name of the key

summary: dict

statistics of the key

max: maximum value of the key. min: minimum value of the key. mean: mean value of the key. median: median value of the key. stdDev: standard deviation of the key. pctRows: percentage occurrence of key in the EDA sample of the feature.

multilabel_insights_key : str or None

For multicategorical columns this will contain a key for multilabel insights. The key is unique for a project, feature and EDA stage combination. This will be the key for the most recent, finished EDA stage.

classmethod get(project_id, feature_name)

Retrieve a single feature

Parameters:
project_id : str

The ID of the project the feature is associated with.

feature_name : str

The name of the feature to retrieve

Returns:
feature : Feature

The queried instance

get_multiseries_properties(multiseries_id_columns, max_wait=600)

Retrieve time series properties for a potential multiseries datetime partition column

Multiseries time series projects use multiseries id columns to model multiple distinct series within a single project. This function returns the time series properties (time step and time unit) of this column if it were used as a datetime partition column with the specified multiseries id columns, running multiseries detection automatically if it had not previously been successfully ran.

Parameters:
multiseries_id_columns : list of str

the name(s) of the multiseries id columns to use with this datetime partition column. Currently only one multiseries id column is supported.

max_wait : int, optional

if a multiseries detection task is run, the maximum amount of time to wait for it to complete before giving up

Returns:
properties : dict

A dict with three keys:

  • time_series_eligible : bool, whether the column can be used as a partition column
  • time_unit : str or null, the inferred time unit if used as a partition column
  • time_step : int or null, the inferred time step if used as a partition column
get_cross_series_properties(datetime_partition_column, cross_series_group_by_columns, max_wait=600)

Retrieve cross-series properties for multiseries ID column.

This function returns the cross-series properties (eligibility as group-by column) of this column if it were used with specified datetime partition column and with current multiseries id column, running cross-series group-by validation automatically if it had not previously been successfully ran.

Parameters:
datetime_partition_column : datetime partition column
cross_series_group_by_columns : list of str

the name(s) of the columns to use with this multiseries ID column. Currently only one cross-series group-by column is supported.

max_wait : int, optional

if a multiseries detection task is run, the maximum amount of time to wait for it to complete before giving up

Returns:
properties : dict

A dict with three keys:

  • name : str, column name
  • eligibility : str, reason for column eligibility
  • isEligible : bool, is column eligible as cross-series group-by
get_multicategorical_histogram()

Retrieve multicategorical histogram for this feature

New in version v2.24.

Returns:
datarobot.models.MulticategoricalHistogram
Raises:
datarobot.errors.InvalidUsageError

if this method is called on a unsuited feature

ValueError

if no multilabel_insights_key is present for this feature

get_pairwise_correlations()

Retrieve pairwise label correlation for multicategorical features

New in version v2.24.

Returns:
datarobot.models.PairwiseCorrelations
Raises:
datarobot.errors.InvalidUsageError

if this method is called on a unsuited feature

ValueError

if no multilabel_insights_key is present for this feature

get_pairwise_joint_probabilities()

Retrieve pairwise label joint probabilities for multicategorical features

New in version v2.24.

Returns:
datarobot.models.PairwiseJointProbabilities
Raises:
datarobot.errors.InvalidUsageError

if this method is called on a unsuited feature

ValueError

if no multilabel_insights_key is present for this feature

get_pairwise_conditional_probabilities()

Retrieve pairwise label conditional probabilities for multicategorical features

New in version v2.24.

Returns:
datarobot.models.PairwiseConditionalProbabilities
Raises:
datarobot.errors.InvalidUsageError

if this method is called on a unsuited feature

ValueError

if no multilabel_insights_key is present for this feature

classmethod from_data(data: Union[Dict[str, Any], List[Dict[str, Any]]]) → T

Instantiate an object of this class using a dict.

Parameters:
data : dict

Correctly snake_cased keys and their values.

classmethod from_server_data(data: Union[Dict[str, Any], List[Dict[str, Any]]], keep_attrs: Optional[Iterable[str]] = None) → T

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters:
data : dict

The directly translated dict of JSON from the server. No casing fixes have taken place

keep_attrs : iterable

List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

get_histogram(bin_limit=None)

Retrieve a feature histogram

Parameters:
bin_limit : int or None

Desired max number of histogram bins. If omitted, by default endpoint will use 60.

Returns:
featureHistogram : FeatureHistogram

The requested histogram with desired number or bins

class datarobot.models.ModelingFeature(project_id=None, name=None, feature_type=None, importance=None, low_information=None, unique_count=None, na_count=None, date_format=None, min=None, max=None, mean=None, median=None, std_dev=None, parent_feature_names=None, key_summary=None, is_restored_after_reduction=None)

A feature used for modeling

In time series projects, a new set of modeling features is created after setting the partitioning options. These features are automatically derived from those in the project’s dataset and are the features used for modeling. Modeling features are only accessible once the target and partitioning options have been set. In projects that don’t use time series modeling, once the target has been set, ModelingFeatures and Features will behave the same.

For more information about input and modeling features, see the time series documentation.

As with the Feature object, the min, max, `mean, median, and std_dev attributes provide information about the distribution of the feature in the EDA sample data. For non-numeric features, they will be None. For features where the summary statistics are available, they will be in a format compatible with the data type, i.e. date type features will have their summary statistics expressed as ISO-8601 formatted date strings.

Attributes:
project_id : str

the id of the project the feature belongs to

name : str

the name of the feature

feature_type : str

the type of the feature, e.g. ‘Categorical’, ‘Text’

importance : float or None

numeric measure of the strength of relationship between the feature and target (independent of any model or other features); may be None for non-modeling features such as partition columns

low_information : bool

whether a feature is considered too uninformative for modeling (e.g. because it has too few values)

unique_count : int

number of unique values

na_count : int or None

number of missing values

date_format : str or None

For Date features, the date format string for how this feature was interpreted, compatible with https://docs.python.org/2/library/time.html#time.strftime . For other feature types, None.

min : str, int, float, or None

The minimum value of the source data in the EDA sample

max : str, int, float, or None

The maximum value of the source data in the EDA sample

mean : str, int, or, float

The arithmetic mean of the source data in the EDA sample

median : str, int, float, or None

The median of the source data in the EDA sample

std_dev : str, int, float, or None

The standard deviation of the source data in the EDA sample

parent_feature_names : list of str

A list of the names of input features used to derive this modeling feature. In cases where the input features and modeling features are the same, this will simply contain the feature’s name. Note that if a derived feature was used to create this modeling feature, the values here will not necessarily correspond to the features that must be supplied at prediction time.

key_summary: list of dict

Statistics for top 50 keys (truncated to 103 characters) of Summarized Categorical column example:

{‘key’:’DataRobot’, ‘summary’:{‘min’:0, ‘max’:29815.0, ‘stdDev’:6498.029, ‘mean’:1490.75, ‘median’:0.0, ‘pctRows’:5.0}}

where,
key: string or None

name of the key

summary: dict

statistics of the key

max: maximum value of the key. min: minimum value of the key. mean: mean value of the key. median: median value of the key. stdDev: standard deviation of the key. pctRows: percentage occurrence of key in the EDA sample of the feature.

classmethod get(project_id, feature_name)

Retrieve a single modeling feature

Parameters:
project_id : str

The ID of the project the feature is associated with.

feature_name : str

The name of the feature to retrieve

Returns:
feature : ModelingFeature

The requested feature

class datarobot.models.DatasetFeature(id_, dataset_id=None, dataset_version_id=None, name=None, feature_type=None, low_information=None, unique_count=None, na_count=None, date_format=None, min_=None, max_=None, mean=None, median=None, std_dev=None, time_series_eligible=None, time_series_eligibility_reason=None, time_step=None, time_unit=None, target_leakage=None, target_leakage_reason=None)

A feature from a project’s dataset

These are features either included in the originally uploaded dataset or added to it via feature transformations.

The min, max, mean, median, and std_dev attributes provide information about the distribution of the feature in the EDA sample data. For non-numeric features or features created prior to these summary statistics becoming available, they will be None. For features where the summary statistics are available, they will be in a format compatible with the data type, i.e. date type features will have their summary statistics expressed as ISO-8601 formatted date strings.

Attributes:
id : int

the id for the feature - note that name is used to reference the feature instead of id

dataset_id : str

the id of the dataset the feature belongs to

dataset_version_id : str

the id of the dataset version the feature belongs to

name : str

the name of the feature

feature_type : str, optional

the type of the feature, e.g. ‘Categorical’, ‘Text’

low_information : bool, optional

whether a feature is considered too uninformative for modeling (e.g. because it has too few values)

unique_count : int, optional

number of unique values

na_count : int, optional

number of missing values

date_format : str, optional

For Date features, the date format string for how this feature was interpreted, compatible with https://docs.python.org/2/library/time.html#time.strftime . For other feature types, None.

min : str, int, float, optional

The minimum value of the source data in the EDA sample

max : str, int, float, optional

The maximum value of the source data in the EDA sample

mean : str, int, float, optional

The arithmetic mean of the source data in the EDA sample

median : str, int, float, optional

The median of the source data in the EDA sample

std_dev : str, int, float, optional

The standard deviation of the source data in the EDA sample

time_series_eligible : bool, optional

Whether this feature can be used as the datetime partition column in a time series project.

time_series_eligibility_reason : str, optional

Why the feature is ineligible for the datetime partition column in a time series project, or ‘suitable’ when it is eligible.

time_step : int, optional

For time series eligible features, a positive integer determining the interval at which windows can be specified. If used as the datetime partition column on a time series project, the feature derivation and forecast windows must start and end at an integer multiple of this value. None for features that are not time series eligible.

time_unit : str, optional

For time series eligible features, the time unit covered by a single time step, e.g. ‘HOUR’, or None for features that are not time series eligible.

target_leakage : str, optional

Whether a feature is considered to have target leakage or not. A value of ‘SKIPPED_DETECTION’ indicates that target leakage detection was not run on the feature. ‘FALSE’ indicates no leakage, ‘MODERATE’ indicates a moderate risk of target leakage, and ‘HIGH_RISK’ indicates a high risk of target leakage

target_leakage_reason: string, optional

The descriptive text explaining the reason for target leakage, if any.

get_histogram(bin_limit=None)

Retrieve a feature histogram

Parameters:
bin_limit : int or None

Desired max number of histogram bins. If omitted, by default endpoint will use 60.

Returns:
featureHistogram : DatasetFeatureHistogram

The requested histogram with desired number or bins

class datarobot.models.DatasetFeatureHistogram(plot)
classmethod get(dataset_id, feature_name, bin_limit=None, key_name=None)

Retrieve a single feature histogram

Parameters:
dataset_id : str

The ID of the Dataset the feature is associated with.

feature_name : str

The name of the feature to retrieve

bin_limit : int or None

Desired max number of histogram bins. If omitted, by default the endpoint will use 60.

key_name: string or None

(Only required for summarized categorical feature) Name of the top 50 keys for which plot to be retrieved

Returns:
featureHistogram : FeatureHistogram

The queried instance with plot attribute in it.

class datarobot.models.FeatureHistogram(plot)
classmethod get(project_id, feature_name, bin_limit=None, key_name=None)

Retrieve a single feature histogram

Parameters:
project_id : str

The ID of the project the feature is associated with.

feature_name : str

The name of the feature to retrieve

bin_limit : int or None

Desired max number of histogram bins. If omitted, by default endpoint will use 60.

key_name: string or None

(Only required for summarized categorical feature) Name of the top 50 keys for which plot to be retrieved

Returns:
featureHistogram : FeatureHistogram

The queried instance with plot attribute in it.

class datarobot.models.InteractionFeature(rows, source_columns, bars, bubbles)

Interaction feature data

New in version v2.21.

Attributes:
rows: int

Total number of rows

source_columns: list(str)

names of two categorical features which were combined into this one

bars: list(dict)

dictionaries representing frequencies of each independent value from the source columns

bubbles: list(dict)

dictionaries representing frequencies of each combined value in the interaction feature.

classmethod get(project_id, feature_name)

Retrieve a single Interaction feature

Parameters:
project_id : str

The id of the project the feature belongs to

feature_name : str

The name of the Interaction feature to retrieve

Returns:
feature : InteractionFeature

The queried instance

class datarobot.models.MulticategoricalHistogram(feature_name, histogram)

Histogram for Multicategorical feature.

New in version v2.24.

Notes

HistogramValues contains:

  • values.[].label : string - Label name
  • values.[].plot : list - Histogram for label
  • values.[].plot.[].label_relevance : int - Label relevance value
  • values.[].plot.[].row_count : int - Row count where label has given relevance
  • values.[].plot.[].row_pct : float - Percentage of rows where label has given relevance
Attributes:
feature_name : str

Name of the feature

values : list(dict)

List of Histogram values with a schema described as HistogramValues

classmethod get(multilabel_insights_key)

Retrieves multicategorical histogram

You might find it more convenient to use Feature.get_multicategorical_histogram instead.

Parameters:
multilabel_insights_key: string

Key for multilabel insights, unique for a project, feature and EDA stage combination. The multilabel_insights_key can be retrieved via Feature.multilabel_insights_key.

Returns:
MulticategoricalHistogram

The multicategorical histogram for multilabel_insights_key

to_dataframe()

Convenience method to get all the information from this multicategorical_histogram instance in form of a pandas.DataFrame.

Returns:
pandas.DataFrame

Histogram information as a multicategorical_histogram. The dataframe will contain these columns: feature_name, label, label_relevance, row_count and row_pct

class datarobot.models.PairwiseCorrelations(*args, **kwargs)

Correlation of label pairs for multicategorical feature.

New in version v2.24.

Notes

CorrelationValues contain:

  • values.[].label_configuration : list of length 2 - Configuration of the label pair
  • values.[].label_configuration.[].label : str – Label name
  • values.[].statistic_value : float – Statistic value
Attributes:
feature_name : str

Name of the feature

values : list(dict)

List of correlation values with a schema described as CorrelationValues

statistic_dataframe : pandas.DataFrame

Correlation values for all label pairs as a DataFrame

classmethod get(multilabel_insights_key)

Retrieves pairwise correlations

You might find it more convenient to use Feature.get_pairwise_correlations instead.

Parameters:
multilabel_insights_key: string

Key for multilabel insights, unique for a project, feature and EDA stage combination. The multilabel_insights_key can be retrieved via Feature.multilabel_insights_key.

Returns:
PairwiseCorrelations

The pairwise label correlations

as_dataframe()

The pairwise label correlations as a (num_labels x num_labels) DataFrame.

Returns:
pandas.DataFrame

The pairwise label correlations. Index and column names allow the interpretation of the values.

class datarobot.models.PairwiseJointProbabilities(*args, **kwargs)

Joint probabilities of label pairs for multicategorical feature.

New in version v2.24.

Notes

ProbabilityValues contain:

  • values.[].label_configuration : list of length 2 - Configuration of the label pair
  • values.[].label_configuration.[].relevance : int – 0 for absence of the labels, 1 for the presence of labels
  • values.[].label_configuration.[].label : str – Label name
  • values.[].statistic_value : float – Statistic value
Attributes:
feature_name : str

Name of the feature

values : list(dict)

List of joint probability values with a schema described as ProbabilityValues

statistic_dataframes : dict(pandas.DataFrame)

Joint Probability values as DataFrames for different relevance combinations.

E.g. The probability P(A=0,B=1) can be retrieved via: pairwise_joint_probabilities.statistic_dataframes[(0,1)].loc['A', 'B']

classmethod get(multilabel_insights_key)

Retrieves pairwise joint probabilities

You might find it more convenient to use Feature.get_pairwise_joint_probabilities instead.

Parameters:
multilabel_insights_key: string

Key for multilabel insights, unique for a project, feature and EDA stage combination. The multilabel_insights_key can be retrieved via Feature.multilabel_insights_key.

Returns:
PairwiseJointProbabilities

The pairwise joint probabilities

as_dataframe(relevance_configuration)

Joint probabilities of label pairs as a (num_labels x num_labels) DataFrame.

Parameters:
relevance_configuration: tuple of length 2

Valid options are (0, 0), (0, 1), (1, 0) and (1, 1). Values of 0 indicate absence of labels and 1 indicates presence of labels. The first value describes the presence for the labels in axis=0 and the second value describes the presence for the labels in axis=1.

For example the matrix values for a relevance configuration of (0, 1) describe the probabilities of absent labels in the index axis and present labels in the column axis.

E.g. The probability P(A=0,B=1) can be retrieved via: pairwise_joint_probabilities.as_dataframe((0,1)).loc['A', 'B']

Returns:
pandas.DataFrame

The joint probabilities for the requested relevance_configuration. Index and column names allow the interpretation of the values.

class datarobot.models.PairwiseConditionalProbabilities(*args, **kwargs)

Conditional probabilities of label pairs for multicategorical feature.

New in version v2.24.

Notes

ProbabilityValues contain:

  • values.[].label_configuration : list of length 2 - Configuration of the label pair
  • values.[].label_configuration.[].relevance : int – 0 for absence of the labels, 1 for the presence of labels
  • values.[].label_configuration.[].label : str – Label name
  • values.[].statistic_value : float – Statistic value
Attributes:
feature_name : str

Name of the feature

values : list(dict)

List of conditional probability values with a schema described as ProbabilityValues

statistic_dataframes : dict(pandas.DataFrame)

Conditional Probability values as DataFrames for different relevance combinations. The label names in the columns are the events, on which we condition. The label names in the index are the events whose conditional probability given the indexes is in the dataframe.

E.g. The probability P(A=0|B=1) can be retrieved via: pairwise_conditional_probabilities.statistic_dataframes[(0,1)].loc['A', 'B']

classmethod get(multilabel_insights_key)

Retrieves pairwise conditional probabilities

You might find it more convenient to use Feature.get_pairwise_conditional_probabilities instead.

Parameters:
multilabel_insights_key: string

Key for multilabel insights, unique for a project, feature and EDA stage combination. The multilabel_insights_key can be retrieved via Feature.multilabel_insights_key.

Returns:
PairwiseConditionalProbabilities

The pairwise conditional probabilities

as_dataframe(relevance_configuration)

Conditional probabilities of label pairs as a (num_labels x num_labels) DataFrame. The label names in the columns are the events, on which we condition. The label names in the index are the events whose conditional probability given the indexes is in the dataframe.

E.g. The probability P(A=0|B=1) can be retrieved via: pairwise_conditional_probabilities.as_dataframe((0, 1)).loc['A', 'B']

Parameters:
relevance_configuration: tuple of length 2

Valid options are (0, 0), (0, 1), (1, 0) and (1, 1). Values of 0 indicate absence of labels and 1 indicates presence of labels. The first value describes the presence for the labels in axis=0 and the second value describes the presence for the labels in axis=1.

For example the matrix values for a relevance configuration of (0, 1) describe the probabilities of absent labels in the index axis given the presence of labels in the column axis.

Returns:
pandas.DataFrame

The conditional probabilities for the requested relevance_configuration. Index and column names allow the interpretation of the values.

Feature Association

class datarobot.models.FeatureAssociationMatrix(strengths: Optional[List[Strength]] = None, features: Optional[List[Feature]] = None, project_id: Optional[str] = None)

Feature association statistics for a project.

Note

Projects created prior to v2.17 are not supported by this feature.

Examples

import datarobot as dr

# retrieve feature association matrix
feature_association_matrix = dr.FeatureAssociationMatrix.get(project_id)
feature_association_matrix.strengths
feature_association_matrix.features

# retrieve feature association matrix for a metric, association type or a feature list
feature_association_matrix = dr.FeatureAssociationMatrix.get(
    project_id,
    metric=enums.FEATURE_ASSOCIATION_METRIC.SPEARMAN,
    association_type=enums.FEATURE_ASSOCIATION_TYPE.CORRELATION,
    featurelist_id=featurelist_id,
)
Attributes:
project_id : str

Id of the associated project.

strengths : list of dict

Pairwise statistics for the available features as structured below.

features : list of dict

Metadata for each feature and where it goes in the matrix.

classmethod get(project_id: str, metric: Optional[str] = None, association_type: Optional[str] = None, featurelist_id: Optional[str] = None) → datarobot.models.feature_association_matrix.feature_association_matrix.FeatureAssociationMatrix

Get feature association statistics.

Parameters:
project_id : str

Id of the project that contains the requested associations.

metric : enums.FEATURE_ASSOCIATION_METRIC

The name of a metric to get pairwise data for. Since ‘v2.19’ this is optional and defaults to enums.FEATURE_ASSOCIATION_METRIC.MUTUAL_INFO.

association_type : enums.FEATURE_ASSOCIATION_TYPE

The type of dependence for the data. Since ‘v2.19’ this is optional and defaults to enums.FEATURE_ASSOCIATION_TYPE.ASSOCIATION.

featurelist_id : str or None

Optional, the feature list to lookup FAM data for. By default, depending on the type of the project “Informative Features” or “Timeseries Informative Features” list will be used. (New in version v2.19)

Returns:
FeatureAssociationMatrix

Feature association pairwise metric strength data, feature clustering data, and ordering data for Feature Association Matrix visualization.

Feature Association Matrix Details

class datarobot.models.FeatureAssociationMatrixDetails(project_id: Optional[str] = None, chart_type: Optional[str] = None, values: Optional[List[Tuple[Any, Any, float]]] = None, features: Optional[List[str]] = None, types: Optional[List[str]] = None, featurelist_id: Optional[str] = None)

Plotting details for a pair of passed features present in the feature association matrix.

Note

Projects created prior to v2.17 are not supported by this feature.

Attributes:
project_id : str

Id of the project that contains the requested associations.

chart_type : str

Which type of plotting the pair of features gets in the UI. e.g. ‘HORIZONTAL_BOX’, ‘VERTICAL_BOX’, ‘SCATTER’ or ‘CONTINGENCY’

values : list

The data triplets for pairwise plotting e.g. {“values”: [[460.0, 428.5, 0.001], [1679.3, 259.0, 0.001], …] The first entry of each list is a value of feature1, the second entry of each list is a value of feature2, and the third is the relative frequency of the pair of datapoints in the sample.

features : list

A list of the requested features, [feature1, feature2]

types : list

The type of feature1 and feature2. Possible values: “CATEGORICAL”, “NUMERIC”

featurelist_id : str

Id of the feature list to lookup FAM details for.

classmethod get(project_id: str, feature1: str, feature2: str, featurelist_id: Optional[str] = None) → datarobot.models.feature_association_matrix.feature_association_matrix_details.FeatureAssociationMatrixDetails

Get a sample of the actual values used to measure the association between a pair of features

New in version v2.17.

Parameters:
project_id : str

Id of the project of interest.

feature1 : str

Feature name for the first feature of interest.

feature2 : str

Feature name for the second feature of interest.

featurelist_id : str

Optional, the feature list to lookup FAM data for. By default, depending on the type of the project “Informative Features” or “Timeseries Informative Features” list will be used.

Returns:
FeatureAssociationMatrixDetails

The feature association plotting for provided pair of features.

Feature Association Featurelists

class datarobot.models.FeatureAssociationFeaturelists(project_id: Optional[str] = None, featurelists: Optional[List[FeatureListType]] = None)

Featurelists with feature association matrix availability flags for a project.

Attributes:
project_id : str

Id of the project that contains the requested associations.

featurelists : list fo dict

The featurelists with the featurelist_id, title and the has_fam flag.

classmethod get(project_id: str) → datarobot.models.feature_association_matrix.feature_association_featurelists.FeatureAssociationFeaturelists

Get featurelists with feature association status for each.

Parameters:
project_id : str

Id of the project of interest.

Returns:
FeatureAssociationFeaturelists

Featurelist with feature association status for each.

Feature Discovery

Relationships Configuration

class datarobot.models.RelationshipsConfiguration(id, dataset_definitions=None, relationships=None, feature_discovery_mode=None, feature_discovery_settings=None)

A Relationships configuration specifies a set of secondary datasets as well as the relationships among them. It is used to configure Feature Discovery for a project to generate features automatically from these datasets.

Attributes:
id : string

Id of the created relationships configuration

dataset_definitions: list

Each element is a dataset_definitions for a dataset.

relationships: list

Each element is a relationship between two datasets

feature_discovery_mode: str

Mode of feature discovery. Supported values are ‘default’ and ‘manual’

feature_discovery_settings: list

List of feature discovery settings used to customize the feature discovery process

The `dataset_definitions` structure is
identifier: string

Alias of the dataset (used directly as part of the generated feature names)

catalog_id: str, or None

Identifier of the catalog item

catalog_version_id: str

Identifier of the catalog item version

primary_temporal_key: string, optional

Name of the column indicating time of record creation

feature_list_id: string, optional

Identifier of the feature list. This decides which columns in the dataset are used for feature generation

snapshot_policy: str

Policy to use when creating a project or making predictions. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)

feature_lists: list

List of feature list info

data_source: dict

Data source info if the dataset is from data source

data_sources: list

List of Data source details for a JDBC datasets

is_deleted: bool, optional

Whether the dataset is deleted or not

The `data source info` structured is
data_store_id: str

Id of the data store.

data_store_name : str

User-friendly name of the data store.

url : str

Url used to connect to the data store.

dbtable : str

Name of table from the data store.

schema: str

Schema definition of the table from the data store

catalog: str

Catalog name of the data source.

The `feature list info` structure is
id : str

Id of the featurelist

name : str

Name of the featurelist

features : list of str

Names of all the Features in the featurelist

dataset_id : str

Project the featurelist belongs to

creation_date : datetime.datetime

When the featurelist was created

user_created : bool

Whether the featurelist was created by a user or by DataRobot automation

created_by: str

Name of user who created it

description : str

Description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.

dataset_id: str

Dataset which is associated with the feature list

dataset_version_id: str or None

Version of the dataset which is associated with feature list. Only relevant for Informative features

The `relationships` schema is
dataset1_identifier: str or None

Identifier of the first dataset in this relationship. This is specified in the identifier field of dataset_definition structure. If None, then the relationship is with the primary dataset.

dataset2_identifier: str

Identifier of the second dataset in this relationship. This is specified in the identifier field of dataset_definition schema.

dataset1_keys: list of str (max length: 10 min length: 1)

Column(s) from the first dataset which are used to join to the second dataset

dataset2_keys: list of str (max length: 10 min length: 1)

Column(s) from the second dataset that are used to join to the first dataset

time_unit: str, or None

Time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR. If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_start: int, or None

How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_end: int, or None

How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_time_unit: int or None

Time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR If present, time-aware joins will be used. Only applicable when dataset1Identifier is not provided.

feature_derivation_windows: list of dict, or None

List of feature derivation windows settings. If present, time-aware joins will be used. Only allowed when feature_derivation_window_start, feature_derivation_window_end and feature_derivation_window_time_unit are not provided.

prediction_point_rounding: int, or None

Closest value of prediction_point_rounding_time_unit to round the prediction point into the past when applying the feature derivation window. Will be a positive integer, if present.Only applicable when dataset1_identifier is not provided.

prediction_point_rounding_time_unit: str, or None

time unit of the prediction point rounding. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR Only applicable when dataset1_identifier is not provided.

The `feature_derivation_windows` is a list of dictionary with schema:
start: int

How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin.

end: int

How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end.

unit: string

Time unit of the feature derivation window. One of datarobot.enums.AllowedTimeUnitsSAFER.

The `feature_discovery_settings` structure is:
name: str

Name of the feature discovery setting

value: bool

Value of the feature discovery setting

To see the list of possible settings, create a RelationshipConfiguration without specifying
settings and check its `feature_discovery_settings` attribute, which is a list of possible
settings with their default values.
classmethod create(dataset_definitions, relationships, feature_discovery_settings=None)

Create a Relationships Configuration

Parameters:
dataset_definitions: list of dataset definitions

Each element is a datarobot.helpers.feature_discovery.DatasetDefinition

relationships: list of relationships

Each element is a datarobot.helpers.feature_discovery.Relationship

feature_discovery_settings : list of feature discovery settings, optional

Each element is a dictionary or a datarobot.helpers.feature_discovery.FeatureDiscoverySetting. If not provided, default settings will be used.

Returns:
relationships_configuration: RelationshipsConfiguration

Created relationships configuration

Examples

dataset_definition = dr.DatasetDefinition(
    identifier='profile',
    catalog_id='5fd06b4af24c641b68e4d88f',
    catalog_version_id='5fd06b4af24c641b68e4d88f'
)
relationship = dr.Relationship(
    dataset2_identifier='profile',
    dataset1_keys=['CustomerID'],
    dataset2_keys=['CustomerID'],
    feature_derivation_window_start=-14,
    feature_derivation_window_end=-1,
    feature_derivation_window_time_unit='DAY',
    prediction_point_rounding=1,
    prediction_point_rounding_time_unit='DAY'
)
dataset_definitions = [dataset_definition]
relationships = [relationship]
relationship_config = dr.RelationshipsConfiguration.create(
    dataset_definitions=dataset_definitions,
    relationships=relationships,
    feature_discovery_settings = [
        {'name': 'enable_categorical_statistics', 'value': True},
        {'name': 'enable_numeric_skewness', 'value': True},
    ]
)
>>> relationship_config.id
'5c88a37770fc42a2fcc62759'
get()

Retrieve the Relationships configuration for a given id

Returns:
relationships_configuration: RelationshipsConfiguration

The requested relationships configuration

Raises:
ClientError

Raised if an invalid relationships config id is provided.

Examples

relationships_config = dr.RelationshipsConfiguration(valid_config_id)
result = relationships_config.get()
>>> result.id
'5c88a37770fc42a2fcc62759'
replace(dataset_definitions, relationships, feature_discovery_settings=None)

Update the Relationships Configuration which is not used in the feature discovery Project

Parameters:
dataset_definitions: list of dataset definition

Each element is a datarobot.helpers.feature_discovery.DatasetDefinition

relationships: list of relationships

Each element is a datarobot.helpers.feature_discovery.Relationship

feature_discovery_settings : list of feature discovery settings, optional

Each element is a dictionary or a datarobot.helpers.feature_discovery.FeatureDiscoverySetting. If not provided, default settings will be used.

Returns:
relationships_configuration: RelationshipsConfiguration

the updated relationships configuration

delete()

Delete the Relationships configuration

Raises:
ClientError

Raised if an invalid relationships config id is provided.

Examples

# Deleting with a valid id
relationships_config = dr.RelationshipsConfiguration(valid_config_id)
status_code = relationships_config.delete()
status_code
>>> 204
relationships_config.get()
>>> ClientError: Relationships Configuration not found

Dataset Definition

class datarobot.helpers.feature_discovery.DatasetDefinition(identifier: str, catalog_id: Optional[str], catalog_version_id: str, snapshot_policy: str = 'latest', feature_list_id: Optional[str] = None, primary_temporal_key: Optional[str] = None)

Dataset definition for the Feature Discovery

New in version v2.25.

Examples

import datarobot as dr
dataset_definition = dr.DatasetDefinition(
    identifier='profile',
    catalog_id='5ec4aec1f072bc028e3471ae',
    catalog_version_id='5ec4aec2f072bc028e3471b1',
)

dataset_definition = dr.DatasetDefinition(
    identifier='transaction',
    catalog_id='5ec4aec1f072bc028e3471ae',
    catalog_version_id='5ec4aec2f072bc028e3471b1',
    primary_temporal_key='Date'
)
Attributes:
identifier: string

Alias of the dataset (used directly as part of the generated feature names)

catalog_id: string, optional

Identifier of the catalog item

catalog_version_id: string

Identifier of the catalog item version

primary_temporal_key: string, optional

Name of the column indicating time of record creation

feature_list_id: string, optional

Identifier of the feature list. This decides which columns in the dataset are used for feature generation

snapshot_policy: string, optional

Policy to use when creating a project or making predictions. If omitted, by default endpoint will use ‘latest’. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)

Relationship

class datarobot.helpers.feature_discovery.Relationship(dataset2_identifier: str, dataset1_keys: List[str], dataset2_keys: List[str], dataset1_identifier: Optional[str] = None, feature_derivation_window_start: Optional[int] = None, feature_derivation_window_end: Optional[int] = None, feature_derivation_window_time_unit: Optional[int] = None, feature_derivation_windows: Optional[List[Dict[str, Union[int, str]]]] = None, prediction_point_rounding: Optional[int] = None, prediction_point_rounding_time_unit: Optional[str] = None)

Relationship between dataset defined in DatasetDefinition

New in version v2.25.

Examples

import datarobot as dr
relationship = dr.Relationship(
    dataset1_identifier='profile',
    dataset2_identifier='transaction',
    dataset1_keys=['CustomerID'],
    dataset2_keys=['CustomerID']
)

relationship = dr.Relationship(
    dataset2_identifier='profile',
    dataset1_keys=['CustomerID'],
    dataset2_keys=['CustomerID'],
    feature_derivation_window_start=-14,
    feature_derivation_window_end=-1,
    feature_derivation_window_time_unit='DAY',
    prediction_point_rounding=1,
    prediction_point_rounding_time_unit='DAY'
)
Attributes:
dataset1_identifier: string, optional

Identifier of the first dataset in this relationship. This is specified in the identifier field of dataset_definition structure. If None, then the relationship is with the primary dataset.

dataset2_identifier: string

Identifier of the second dataset in this relationship. This is specified in the identifier field of dataset_definition schema.

dataset1_keys: list of string (max length: 10 min length: 1)

Column(s) from the first dataset which are used to join to the second dataset

dataset2_keys: list of string (max length: 10 min length: 1)

Column(s) from the second dataset that are used to join to the first dataset

feature_derivation_window_start: int, or None

How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_end: int, optional

How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_time_unit: int, optional

Time unit of the feature derivation window. One of datarobot.enums.AllowedTimeUnitsSAFER If present, time-aware joins will be used. Only applicable when dataset1_identifier is not provided.

feature_derivation_windows: list of dict, or None

List of feature derivation windows settings. If present, time-aware joins will be used. Only allowed when feature_derivation_window_start, feature_derivation_window_end and feature_derivation_window_time_unit are not provided.

prediction_point_rounding: int, optional

Closest value of prediction_point_rounding_time_unit to round the prediction point into the past when applying the feature derivation window. Will be a positive integer, if present.Only applicable when dataset1_identifier is not provided.

prediction_point_rounding_time_unit: string, optional

Time unit of the prediction point rounding. One of datarobot.enums.AllowedTimeUnitsSAFER Only applicable when dataset1_identifier is not provided.

The `feature_derivation_windows` is a list of dictionary with schema:
start: int

How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin.

end: int

How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end.

unit: string

Time unit of the feature derivation window. One of datarobot.enums.AllowedTimeUnitsSAFER.

Feature Lineage

class datarobot.models.FeatureLineage(steps=None)

Lineage of an automatically engineered feature.

Attributes:
steps: list

list of steps which were applied to build the feature.

`steps` structure is:
id : int

step id starting with 0.

step_type: str

one of the data/action/json/generatedData.

name: str

name of the step.

description: str

description of the step.

parents: list[int]

references to other steps id.

is_time_aware: bool

indicator of step being time aware. Mandatory only for action and join steps. action step provides additional information about feature derivation window in the timeInfo field.

catalog_id: str

id of the catalog for a data step.

catalog_version_id: str

id of the catalog version for a data step.

group_by: list[str]

list of columns which this action step aggregated by.

columns: list

names of columns involved into the feature generation. Available only for data steps.

time_info: dict

description of the feature derivation window which was applied to this action step.

join_info: list[dict]

join step details.

`columns` structure is
data_type: str

the type of the feature, e.g. ‘Categorical’, ‘Text’

is_input: bool

indicates features which provided data to transform in this lineage.

name: str

feature name.

is_cutoff: bool

indicates a cutoff column.

`time_info` structure is:
latest: dict

end of the feature derivation window applied.

duration: dict

size of the feature derivation window applied.

`latest` and `duration` structure is:
time_unit: str

time unit name like ‘MINUTE’, ‘DAY’, ‘MONTH’ etc.

duration: int

value/size of this duration object.

`join_info` structure is:
join_type: str

kind of join, left/right.

left_table: dict

information about a dataset which was considered as left.

right_table: str

information about a dataset which was considered as right.

`left_table` and `right_table` structure is:
columns: list[str]

list of columns which datasets were joined by.

datasteps: list[int]

list of data steps id which brought the columns into the current step dataset.

classmethod get(project_id, id)

Retrieve a single FeatureLineage.

Parameters:
project_id : str

The id of the project the feature belongs to

id : str

id of a feature lineage to retrieve

Returns:
lineage : FeatureLineage

The queried instance

Secondary Dataset Configurations

class datarobot.models.SecondaryDatasetConfigurations(id: str, project_id: str, config: Optional[List[DatasetConfiguration]] = None, secondary_datasets: Optional[List[SecondaryDataset]] = None, name: Optional[str] = None, creator_full_name: Optional[str] = None, creator_user_id: Optional[str] = None, created: Optional[datetime] = None, featurelist_id: Optional[str] = None, credential_ids: Optional[StoredCredentials] = None, is_default: Optional[bool] = None, project_version: Optional[str] = None)

Create secondary dataset configurations for a given project

New in version v2.20.

Attributes:
id : str

Id of this secondary dataset configuration

project_id : str

Id of the associated project.

config: list of DatasetConfiguration (Deprecated in version v2.23)

List of secondary dataset configurations

secondary_datasets: list of SecondaryDataset (new in v2.23)

List of secondary datasets (secondaryDataset)

name: str

Verbose name of the SecondaryDatasetConfig. null if it wasn’t specified.

created: datetime.datetime

DR-formatted datetime. null for legacy (before DR 6.0) db records.

creator_user_id: str

Id of the user created this config.

creator_full_name: str

fullname or email of the user created this config.

featurelist_id: str, optional

Id of the feature list. null if it wasn’t specified.

credential_ids: list of DatasetsCredentials, optional

credentials used by the secondary datasets if the datasets used in the configuration are from datasource

is_default: bool, optional

Boolean flag if default config created during feature discovery aim

project_version: str, optional

Version of project when its created (Release version)

classmethod create(project_id: str, secondary_datasets: List[datarobot.helpers.feature_discovery.SecondaryDataset], name: str, featurelist_id: Optional[str] = None) → datarobot.models.secondary_dataset.SecondaryDatasetConfigurations

create secondary dataset configurations

New in version v2.20.

Parameters:
project_id : str

id of the associated project.

secondary_datasets: list of SecondaryDataset (New in version v2.23)

list of secondary datasets used by the configuration each element is a datarobot.helpers.feature_discovery.SecondaryDataset

name: str (New in version v2.23)

Name of the secondary datasets configuration

featurelist_id: str, or None (New in version v2.23)

Id of the featurelist

Returns:
an instance of SecondaryDatasetConfigurations
Raises:
ClientError

raised if incorrect configuration parameters are provided

Examples

profile_secondary_dataset = dr.SecondaryDataset(
    identifier='profile',
    catalog_id='5ec4aec1f072bc028e3471ae',
    catalog_version_id='5ec4aec2f072bc028e3471b1',
    snapshot_policy='latest'
)

transaction_secondary_dataset = dr.SecondaryDataset(
    identifier='transaction',
    catalog_id='5ec4aec268f0f30289a03901',
    catalog_version_id='5ec4aec268f0f30289a03900',
    snapshot_policy='latest'
)

secondary_datasets = [profile_secondary_dataset, transaction_secondary_dataset]
new_secondary_dataset_config = dr.SecondaryDatasetConfigurations.create(
    project_id=project.id,
    name='My config',
    secondary_datasets=secondary_datasets
)

>>> new_secondary_dataset_config.id
'5fd1e86c589238a4e635e93d'
delete() → None

Removes the Secondary datasets configuration

New in version v2.21.

Raises:
ClientError

Raised if an invalid or already deleted secondary dataset config id is provided

Examples

# Deleting with a valid secondary_dataset_config id
status_code = dr.SecondaryDatasetConfigurations.delete(some_config_id)
status_code
>>> 204
get() → datarobot.models.secondary_dataset.SecondaryDatasetConfigurations

Retrieve a single secondary dataset configuration for a given id

New in version v2.21.

Returns:
secondary_dataset_configurations : SecondaryDatasetConfigurations

The requested secondary dataset configurations

Examples

config_id = '5fd1e86c589238a4e635e93d'
secondary_dataset_config = dr.SecondaryDatasetConfigurations(id=config_id).get()
>>> secondary_dataset_config
{
     'created': datetime.datetime(2020, 12, 9, 6, 16, 22, tzinfo=tzutc()),
     'creator_full_name': u'[email protected]',
     'creator_user_id': u'asdf4af1gf4bdsd2fba1de0a',
     'credential_ids': None,
     'featurelist_id': None,
     'id': u'5fd1e86c589238a4e635e93d',
     'is_default': True,
     'name': u'My config',
     'project_id': u'5fd06afce2456ec1e9d20457',
     'project_version': None,
     'secondary_datasets': [
            {
                'snapshot_policy': u'latest',
                'identifier': u'profile',
                'catalog_version_id': u'5fd06b4af24c641b68e4d88f',
                'catalog_id': u'5fd06b4af24c641b68e4d88e'
            },
            {
                'snapshot_policy': u'dynamic',
                'identifier': u'transaction',
                'catalog_version_id': u'5fd1e86c589238a4e635e98e',
                'catalog_id': u'5fd1e86c589238a4e635e98d'
            }
     ]
}
classmethod list(project_id: str, featurelist_id: Optional[str] = None, limit: Optional[int] = None, offset: Optional[int] = None) → List[datarobot.models.secondary_dataset.SecondaryDatasetConfigurations]

Returns list of secondary dataset configurations.

New in version v2.23.

Parameters:
project_id: str

The Id of project

featurelist_id: str, optional

Id of the feature list to filter the secondary datasets configurations

Returns:
secondary_dataset_configurations : list of SecondaryDatasetConfigurations

The requested list of secondary dataset configurations for a given project

Examples

pid = '5fd06afce2456ec1e9d20457'
secondary_dataset_configs = dr.SecondaryDatasetConfigurations.list(pid)
>>> secondary_dataset_configs[0]
    {
         'created': datetime.datetime(2020, 12, 9, 6, 16, 22, tzinfo=tzutc()),
         'creator_full_name': u'[email protected]',
         'creator_user_id': u'asdf4af1gf4bdsd2fba1de0a',
         'credential_ids': None,
         'featurelist_id': None,
         'id': u'5fd1e86c589238a4e635e93d',
         'is_default': True,
         'name': u'My config',
         'project_id': u'5fd06afce2456ec1e9d20457',
         'project_version': None,
         'secondary_datasets': [
                {
                    'snapshot_policy': u'latest',
                    'identifier': u'profile',
                    'catalog_version_id': u'5fd06b4af24c641b68e4d88f',
                    'catalog_id': u'5fd06b4af24c641b68e4d88e'
                },
                {
                    'snapshot_policy': u'dynamic',
                    'identifier': u'transaction',
                    'catalog_version_id': u'5fd1e86c589238a4e635e98e',
                    'catalog_id': u'5fd1e86c589238a4e635e98d'
                }
         ]
    }

Secondary Dataset

class datarobot.helpers.feature_discovery.SecondaryDataset(identifier: str, catalog_id: str, catalog_version_id: str, snapshot_policy: str = 'latest')

A secondary dataset to be used for feature discovery

New in version v2.25.

Examples

import datarobot as dr
dataset_definition = dr.SecondaryDataset(
    identifier='profile',
    catalog_id='5ec4aec1f072bc028e3471ae',
    catalog_version_id='5ec4aec2f072bc028e3471b1',
)
Attributes:
identifier: string

Alias of the dataset (used directly as part of the generated feature names)

catalog_id: string

Identifier of the catalog item

catalog_version_id: string

Identifier of the catalog item version

snapshot_policy: string, optional

Policy to use while creating a project or making predictions. If omitted, by default endpoint will use ‘latest’. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)

Feature Effects

class datarobot.models.FeatureEffects(project_id, model_id, source, feature_effects, backtest_index=None)

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Notes

featureEffects is a dict containing the following:

  • feature_name (string) Name of the feature
  • feature_type (string) dr.enums.FEATURE_TYPE, Feature type either numeric, categorical or datetime
  • feature_impact_score (float) Feature impact score
  • weight_label (string) optional, Weight label if configured for the project else null
  • partial_dependence (List) Partial dependence results
  • predicted_vs_actual (List) optional, Predicted versus actual results, may be omitted if there are insufficient qualified samples
partial_dependence is a dict containing the following:
  • is_capped (bool) Indicates whether the data for computation is capped
  • data (List) partial dependence results in the following format
data is a list of dict containing the following:
  • label (string) Contains label for categorical and numeric features as string
  • dependence (float) Value of partial dependence
predicted_vs_actual is a dict containing the following:
  • is_capped (bool) Indicates whether the data for computation is capped
  • data (List) pred vs actual results in the following format
data is a list of dict containing the following:
  • label (string) Contains label for categorical features for numeric features contains range or numeric value.
  • bin (List) optional, For numeric features contains labels for left and right bin limits
  • predicted (float) Predicted value
  • actual (float) Actual value. Actual value is null for unsupervised timeseries models
  • row_count (int or float) Number of rows for the label and bin. Type is float if weight or exposure is set for the project.
Attributes:
project_id: string

The project that contains requested model

model_id: string

The model to retrieve Feature Effects for

source: string

The source to retrieve Feature Effects for

feature_effects: list

Feature Effects for every feature

backtest_index: string, required only for DatetimeModels,

The backtest index to retrieve Feature Effects for.

classmethod from_server_data(data, *args, **kwargs)

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing.

Parameters:
data : dict

The directly translated dict of JSON from the server. No casing fixes have taken place

class datarobot.models.FeatureEffectMetadata(status, sources)

Feature Effect Metadata for model, contains status and available model sources.

Notes

source is expected parameter to retrieve Feature Fit. One of provided sources shall be used.

class datarobot.models.FeatureEffectMetadataDatetime(data)

Feature Effect Metadata for datetime model, contains list of feature effect metadata per backtest.

Notes

feature effect metadata per backtest contains:
  • status : string.
  • backtest_index : string.
  • sources : list(string).

source is expected parameter to retrieve Feature Fit. One of provided sources shall be used.

backtest_index is expected parameter to submit compute request and retrieve Feature Effect. One of provided backtest indexes shall be used.

Attributes:
data : list[FeatureEffectMetadataDatetimePerBacktest]

List feature effect metadata per backtest

class datarobot.models.FeatureEffectMetadataDatetimePerBacktest(ff_metadata_datetime_per_backtest)

Convert dictionary into feature effect metadata per backtest which contains backtest_index, status and sources.

Feature Fit

class datarobot.models.FeatureFit(project_id, model_id, source, feature_fit, backtest_index=None)

Feature Fit provides partial dependence and predicted vs actual values for top-500 features ordered by feature importance score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Notes

featureFit is a dict containing the following:

  • feature_name (string) Name of the feature
  • feature_type (string) dr.enums.FEATURE_TYPE, Feature type either numeric, categorical or datetime
  • feature_importance_score (float) Feature importance score
  • weight_label (string) optional, Weight label if configured for the project else null
  • partial_dependence (List) Partial dependence results
  • predicted_vs_actual (List) optional, Predicted versus actual results, may be omitted if there are insufficient qualified samples
partial_dependence is a dict containing the following:
  • is_capped (bool) Indicates whether the data for computation is capped
  • data (List) partial dependence results in the following format
data is a list of dict containing the following:
  • label (string) Contains label for categorical and numeric features as string
  • dependence (float) Value of partial dependence
predicted_vs_actual is a dict containing the following:
  • is_capped (bool) Indicates whether the data for computation is capped
  • data (List) pred vs actual results in the following format
data is a list of dict containing the following:
  • label (string) Contains label for categorical features for numeric features contains range or numeric value.
  • bin (List) optional, For numeric features contains labels for left and right bin limits
  • predicted (float) Predicted value
  • actual (float) Actual value. Actual value is null for unsupervised timeseries models
  • row_count (int or float) Number of rows for the label and bin. Type is float if weight or exposure is set for the project.
Attributes:
project_id: string

The project that contains requested model

model_id: string

The model to retrieve Feature Fit for

source: string

The source to retrieve Feature Fit for

feature_fit: list

Feature Fit data for every feature

backtest_index: string, required only for DatetimeModels,

The backtest index to retrieve Feature Fit for.

classmethod from_server_data(data, *args, **kwargs)

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing.

Parameters:
data : dict

The directly translated dict of JSON from the server. No casing fixes have taken place

class datarobot.models.FeatureFitMetadata(status, sources)

Feature Fit Metadata for model, contains status and available model sources.

Notes

source is expected parameter to retrieve Feature Fit. One of provided sources shall be used.

class datarobot.models.FeatureFitMetadataDatetime(data)

Feature Fit Metadata for datetime model, contains list of feature fit metadata per backtest.

Notes

feature fit metadata per backtest contains:

  • status : string.
  • backtest_index : string.
  • sources : list(string).

source is expected parameter to retrieve Feature Fit. One of provided sources shall be used.

backtest_index is expected parameter to submit compute request and retrieve Feature Fit. One of provided backtest indexes shall be used.

Attributes:
data : list[FeatureFitMetadataDatetimePerBacktest]

list feature fit metadata per backtest

class datarobot.models.FeatureFitMetadataDatetimePerBacktest(ff_metadata_datetime_per_backtest)

Convert dictionary into feature fit metadata per backtest which contains backtest_index, status and sources.

Feature List

class datarobot.DatasetFeaturelist(id: Optional[str] = None, name: Optional[str] = None, features: Optional[List[str]] = None, dataset_id: Optional[str] = None, dataset_version_id: Optional[str] = None, creation_date: Optional[datetime.datetime] = None, created_by: Optional[str] = None, user_created: Optional[bool] = None, description: Optional[str] = None)

A set of features attached to a dataset in the AI Catalog

Attributes:
id : str

the id of the dataset featurelist

dataset_id : str

the id of the dataset the featurelist belongs to

dataset_version_id: str, optional

the version id of the dataset this featurelist belongs to

name : str

the name of the dataset featurelist

features : list of str

a list of the names of features included in this dataset featurelist

creation_date : datetime.datetime

when the featurelist was created

created_by : str

the user name of the user who created this featurelist

user_created : bool

whether the featurelist was created by a user or by DataRobot automation

description : str, optional

the description of the featurelist. Only present on DataRobot-created featurelists.

classmethod get(dataset_id: str, featurelist_id: str) → TDatasetFeaturelist

Retrieve a dataset featurelist

Parameters:
dataset_id : str

the id of the dataset the featurelist belongs to

featurelist_id : str

the id of the dataset featurelist to retrieve

Returns:
featurelist : DatasetFeatureList

the specified featurelist

delete() → None

Delete a dataset featurelist

Featurelists configured into the dataset as a default featurelist cannot be deleted.

update(name: Optional[str] = None) → None

Update the name of an existing featurelist

Note that only user-created featurelists can be renamed, and that names must not conflict with names used by other featurelists.

Parameters:
name : str, optional

the new name for the featurelist

class datarobot.models.Featurelist(id: Optional[str] = None, name: Optional[str] = None, features: Optional[List[str]] = None, project_id: Optional[str] = None, created: Optional[datetime.datetime] = None, is_user_created: Optional[bool] = None, num_models: Optional[int] = None, description: Optional[str] = None)

A set of features used in modeling

Attributes:
id : str

the id of the featurelist

name : str

the name of the featurelist

features : list of str

the names of all the Features in the featurelist

project_id : str

the project the featurelist belongs to

created : datetime.datetime

(New in version v2.13) when the featurelist was created

is_user_created : bool

(New in version v2.13) whether the featurelist was created by a user or by DataRobot automation

num_models : int

(New in version v2.13) the number of models currently using this featurelist. A model is considered to use a featurelist if it is used to train the model or as a monotonic constraint featurelist, or if the model is a blender with at least one component model using the featurelist.

description : str

(New in version v2.13) the description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.

classmethod from_data(data: ServerDataDictType) → TFeaturelist

Overrides the parent method to ensure description is always populated

Parameters:
data : dict

the data from the server, having gone through processing

classmethod get(project_id: str, featurelist_id: str) → TFeaturelist

Retrieve a known feature list

Parameters:
project_id : str

The id of the project the featurelist is associated with

featurelist_id : str

The ID of the featurelist to retrieve

Returns:
featurelist : Featurelist

The queried instance

Raises:
ValueError

passed project_id parameter value is of not supported type

delete(dry_run: bool = False, delete_dependencies: bool = False) → DeleteFeatureListResult

Delete a featurelist, and any models and jobs using it

All models using a featurelist, whether as the training featurelist or as a monotonic constraint featurelist, will also be deleted when the deletion is executed and any queued or running jobs using it will be cancelled. Similarly, predictions made on these models will also be deleted. All the entities that are to be deleted with a featurelist are described as “dependencies” of it. To preview the results of deleting a featurelist, call delete with dry_run=True

When deleting a featurelist with dependencies, users must specify delete_dependencies=True to confirm they want to delete the featurelist and all its dependencies. Without that option, only featurelists with no dependencies may be successfully deleted and others will error.

Featurelists configured into the project as a default featurelist or as a default monotonic constraint featurelist cannot be deleted.

Featurelists used in a model deployment cannot be deleted until the model deployment is deleted.

Parameters:
dry_run : bool, optional

specify True to preview the result of deleting the featurelist, instead of actually deleting it.

delete_dependencies : bool, optional

specify True to successfully delete featurelists with dependencies; if left False by default, featurelists without dependencies can be successfully deleted and those with dependencies will error upon attempting to delete them.

Returns:
result : dict
A dictionary describing the result of deleting the featurelist, with the following keys
  • dry_run : bool, whether the deletion was a dry run or an actual deletion
  • can_delete : bool, whether the featurelist can actually be deleted
  • deletion_blocked_reason : str, why the featurelist can’t be deleted (if it can’t)
  • num_affected_models : int, the number of models using this featurelist
  • num_affected_jobs : int, the number of jobs using this featurelist
classmethod from_server_data(data: Union[Dict[str, Any], List[Dict[str, Any]]], keep_attrs: Optional[Iterable[str]] = None) → T

Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing

Parameters:
data : dict

The directly translated dict of JSON from the server. No casing fixes have taken place

keep_attrs : iterable

List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None

update(name: Optional[str] = None, description: Optional[str] = None) → None

Update the name or description of an existing featurelist

Note that only user-created featurelists can be renamed, and that names must not conflict with names used by other featurelists.

Parameters:
name : str, optional

the new name for the featurelist

description : str, optional

the new description for the featurelist

class datarobot.models.ModelingFeaturelist(id: Optional[str] = None, name: Optional[str] = None, features: Optional[List[str]] = None, project_id: Optional[str] = None, created: Optional[datetime.datetime] = None, is_user_created: Optional[bool] = None, num_models: Optional[int] = None, description: Optional[str] = None)

A set of features that can be used to build a model

In time series projects, a new set of modeling features is created after setting the partitioning options. These features are automatically derived from those in the project’s dataset and are the features used for modeling. Modeling features are only accessible once the target and partitioning options have been set. In projects that don’t use time series modeling, once the target has been set, ModelingFeaturelists and Featurelists will behave the same.

For more information about input and modeling features, see the time series documentation.

Attributes:
id : str

the id of the modeling featurelist

project_id : str

the id of the project the modeling featurelist belongs to

name : str

the name of the modeling featurelist

features : list of str

a list of the names of features included in this modeling featurelist

created : datetime.datetime

(New in version v2.13) when the featurelist was created

is_user_created : bool

(New in version v2.13) whether the featurelist was created by a user or by DataRobot automation

num_models : int

(New in version v2.13) the number of models currently using this featurelist. A model is considered to use a featurelist if it is used to train the model or as a monotonic constraint featurelist, or if the model is a blender with at least one component model using the featurelist.

description : str

(New in version v2.13) the description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.

classmethod get(project_id: str, featurelist_id: str) → TModelingFeaturelist

Retrieve a modeling featurelist

Modeling featurelists can only be retrieved once the target and partitioning options have been set.

Parameters:
project_id : str

the id of the project the modeling featurelist belongs to

featurelist_id : str

the id of the modeling featurelist to retrieve

Returns:
featurelist : ModelingFeaturelist

the specified featurelist

delete(dry_run: bool = False, delete_dependencies: bool = False) → DeleteFeatureListResult

Delete a featurelist, and any models and jobs using it

All models using a featurelist, whether as the training featurelist or as a monotonic constraint featurelist, will also be deleted when the deletion is executed and any queued or running jobs using it will be cancelled. Similarly, predictions made on these models will also be deleted. All the entities that are to be deleted with a featurelist are described as “dependencies” of it. To preview the results of deleting a featurelist, call delete with dry_run=True

When deleting a featurelist with dependencies, users must specify delete_dependencies=True to confirm they want to delete the featurelist and all its dependencies. Without that option, only featurelists with no dependencies may be successfully deleted and others will error.

Featurelists configured into the project as a default featurelist or as a default monotonic constraint featurelist cannot be deleted.

Featurelists used in a model deployment cannot be deleted until the model deployment is deleted.

Parameters:
dry_run : bool, optional

specify True to preview the result of deleting the featurelist, instead of actually deleting it.

delete_dependencies : bool, optional

specify True to successfully delete featurelists with dependencies; if left False by default, featurelists without dependencies can be successfully deleted and those with dependencies will error upon attempting to delete them.

Returns:
result : dict
A dictionary describing the result of deleting the featurelist, with the following keys
  • dry_run : bool, whether the deletion was a dry run or an actual deletion
  • can_delete : bool, whether the featurelist can actually be deleted
  • deletion_blocked_reason : str, why the featurelist can’t be deleted (if it can’t)
  • num_affected_models : int, the number of models using this featurelist
  • num_affected_jobs : int, the number of jobs using this featurelist
update(name: Optional[str] = None, description: Optional[str] = None) → None

Update the name or description of an existing featurelist

Note that only user-created featurelists can be renamed, and that names must not conflict with names used by other featurelists.

Parameters:
name : str, optional

the new name for the featurelist

description : str, optional

the new description for the featurelist

Restoring Discarded Features

class datarobot.models.restore_discarded_features.DiscardedFeaturesInfo(total_restore_limit: int, remaining_restore_limit: int, count: int, features: List[str])

An object containing information about time series features which were reduced during time series feature generation process. These features can be restored back to the project. They will be included into All Time Series Features and can be used to create new feature lists.

New in version v2.27.

Attributes:
total_restore_limit : int

The total limit indicating how many features can be restored in this project.

remaining_restore_limit : int

The remaining available number of the features which can be restored in this project.

features : list of strings

Discarded features which can be restored.

count : int

Discarded features count.

classmethod restore(project_id: str, features_to_restore: List[str], max_wait: int = 600) → datarobot.models.restore_discarded_features.FeatureRestorationStatus

Restore discarded during time series feature generation process features back to the project. After restoration features will be included into All Time Series Features.

New in version v2.27.

Parameters:
project_id: string
features_to_restore: list of strings

List of the feature names to restore

max_wait: int, optional

max time to wait for features to be restored. Defaults to 10 min

Returns:
status: FeatureRestorationStatus

information about features which were restored and which were not.

classmethod retrieve(project_id: str) → datarobot.models.restore_discarded_features.DiscardedFeaturesInfo

Retrieve the discarded features information for a given project.

New in version v2.27.

Parameters:
project_id: string
Returns:
info: DiscardedFeaturesInfo

information about features which were discarded during feature generation process and limits how many features can be restored.

Job

class datarobot.models.Job(data: Dict[str, Any], completed_resource_url: Optional[str] = None)

Tracks asynchronous work being done within a project

Attributes:
id : int

the id of the job

project_id : str

the id of the project the job belongs to

status : str

the status of the job - will be one of datarobot.enums.QUEUE_STATUS

job_type : str

what kind of work the job is doing - will be one of datarobot.enums.JOB_TYPE

is_blocked : bool

if true, the job is blocked (cannot be executed) until its dependencies are resolved

classmethod get(project_id: str, job_id: str) → datarobot.models.job.Job

Fetches one job.

Parameters:
project_id : str

The identifier of the project in which the job resides

job_id : str

The job id

Returns:
job : Job

The job

Raises:
AsyncFailureError

Querying this resource gave a status code other than 200 or 303

cancel()

Cancel this job. If this job has not finished running, it will be removed and canceled.

get_result(params=None)
Parameters:
params : dict or None

Query parameters to be added to request to get results.

For featureEffects and featureFit, source param is required to define source,
otherwise the default is `training`
Returns:
result : object
Return type depends on the job type:
  • for model jobs, a Model is returned
  • for predict jobs, a pandas.DataFrame (with predictions) is returned
  • for featureImpact jobs, a list of dicts by default (see with_metadata parameter of the FeatureImpactJob class and its get() method).
  • for primeRulesets jobs, a list of Rulesets
  • for primeModel jobs, a PrimeModel
  • for primeDownloadValidation jobs, a PrimeFile
  • for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
  • for predictionExplanations jobs, a PredictionExplanations
  • for featureEffects, a FeatureEffects
  • for featureFit, a FeatureFit
Raises:
JobNotFinished

If the job is not finished, the result is not available.

AsyncProcessUnsuccessfulError

If the job errored or was aborted

get_result_when_complete(max_wait=600, params=None)
Parameters:
max_wait : int, optional

How long to wait for the job to finish.

params : dict, optional

Query parameters to be added to request.

Returns:
result: object

Return type is the same as would be returned by Job.get_result.

Raises:
AsyncTimeoutError

If the job does not finish in time

AsyncProcessUnsuccessfulError

If the job errored or was aborted

refresh()

Update this object with the latest job data from the server.

wait_for_completion(max_wait: int = 600) → None

Waits for job to complete.

Parameters:
max_wait : int, optional

How long to wait for the job to finish.

class datarobot.models.TrainingPredictionsJob(data, model_id, data_subset, **kwargs)
classmethod get(project_id, job_id, model_id=None, data_subset=None)

Fetches one training predictions job.

The resulting TrainingPredictions object will be annotated with model_id and data_subset.

Parameters:
project_id : str

The identifier of the project in which the job resides

job_id : str

The job id

model_id : str

The identifier of the model used for computing training predictions

data_subset : dr.enums.DATA_SUBSET, optional

Data subset used for computing training predictions

Returns:
job : TrainingPredictionsJob

The job

refresh()

Update this object with the latest job data from the server.

cancel()

Cancel this job. If this job has not finished running, it will be removed and canceled.

get_result(params=None)
Parameters:
params : dict or None

Query parameters to be added to request to get results.

For featureEffects and featureFit, source param is required to define source,
otherwise the default is `training`
Returns:
result : object
Return type depends on the job type:
  • for model jobs, a Model is returned
  • for predict jobs, a pandas.DataFrame (with predictions) is returned
  • for featureImpact jobs, a list of dicts by default (see with_metadata parameter of the FeatureImpactJob class and its get() method).
  • for primeRulesets jobs, a list of Rulesets
  • for primeModel jobs, a PrimeModel
  • for primeDownloadValidation jobs, a PrimeFile
  • for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
  • for predictionExplanations jobs, a PredictionExplanations
  • for featureEffects, a FeatureEffects
  • for featureFit, a FeatureFit
Raises:
JobNotFinished

If the job is not finished, the result is not available.

AsyncProcessUnsuccessfulError

If the job errored or was aborted

get_result_when_complete(max_wait=600, params=None)
Parameters:
max_wait : int, optional

How long to wait for the job to finish.

params : dict, optional

Query parameters to be added to request.

Returns:
result: object

Return type is the same as would be returned by Job.get_result.

Raises:
AsyncTimeoutError

If the job does not finish in time

AsyncProcessUnsuccessfulError

If the job errored or was aborted

wait_for_completion(max_wait: int = 600) → None

Waits for job to complete.

Parameters:
max_wait : int, optional

How long to wait for the job to finish.

class datarobot.models.ShapMatrixJob(data: Dict[str, Any], model_id: Optional[str] = None, dataset_id: Optional[str] = None, **kwargs)
classmethod get(project_id: str, job_id: str, model_id: Optional[str] = None, dataset_id: Optional[str] = None) → datarobot.models.shap_matrix_job.ShapMatrixJob

Fetches one SHAP matrix job.

Parameters:
project_id : str

The identif