API Reference¶
API Object¶
- class datarobot.models.api_object.APIObject¶
- classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters
- datadict
Correctly snake_cased keys and their values.
- Return type
TypeVar
(T
, bound=APIObject
)
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
TypeVar
(T
, bound=APIObject
)
Advanced Options¶
- class datarobot.helpers.AdvancedOptions(weights=None, response_cap=None, blueprint_threshold=None, seed=None, smart_downsampled=None, majority_downsampling_rate=None, offset=None, exposure=None, accuracy_optimized_mb=None, scaleout_modeling_mode=None, events_count=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, only_include_monotonic_blueprints=None, allowed_pairwise_interaction_groups=None, blend_best_models=None, scoring_code_only=None, prepare_model_for_deployment=None, consider_blenders_in_recommendation=None, min_secondary_validation_model_count=None, shap_only_mode=None, autopilot_data_sampling_method=None, run_leakage_removed_feature_list=None, autopilot_with_feature_discovery=False, feature_discovery_supervised_feature_reduction=None, exponentially_weighted_moving_alpha=None, external_time_series_baseline_dataset_id=None, use_supervised_feature_reduction=True, primary_location_column=None, protected_features=None, preferable_target_value=None, fairness_metrics_set=None, fairness_threshold=None, bias_mitigation_feature_name=None, bias_mitigation_technique=None, include_bias_mitigation_feature_as_predictor_variable=None, default_monotonic_increasing_featurelist_id=None, default_monotonic_decreasing_featurelist_id=None)¶
Used when setting the target of a project to set advanced options of modeling process.
- Parameters
- weightsstring, optional
The name of a column indicating the weight of each row
- response_capbool or float in [0.5, 1), optional
Defaults to none here, but server defaults to False. If specified, it is the quantile of the response distribution to use for response capping.
- blueprint_thresholdint, optional
Number of hours models are permitted to run before being excluded from later autopilot stages Minimum 1
- seedint, optional
a seed to use for randomization
- smart_downsampledbool, optional
whether to use smart downsampling to throw away excess rows of the majority class. Only applicable to classification and zero-boosted regression projects.
- majority_downsampling_ratefloat, optional
the percentage between 0 and 100 of the majority rows that should be kept. Specify only if using smart downsampling. May not cause the majority class to become smaller than the minority class.
- offsetlist of str, optional
(New in version v2.6) the list of the names of the columns containing the offset of each row
- exposurestring, optional
(New in version v2.6) the name of a column containing the exposure of each row
- accuracy_optimized_mbbool, optional
(New in version v2.6) Include additional, longer-running models that will be run by the autopilot and available to run manually.
- scaleout_modeling_modestring, optional
(Deprecated in 2.28. Will be removed in 2.30) DataRobot no longer supports scaleout models. Please remove any usage of this parameter as it will be removed from the API soon.
- events_countstring, optional
(New in version v2.8) the name of a column specifying events count.
- monotonic_increasing_featurelist_idstring, optional
(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overriden at model submission time if desired.
- monotonic_decreasing_featurelist_idstring, optional
(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overriden at model submission time if desired.
- only_include_monotonic_blueprintsbool, optional
(new in version 2.11) when true, only blueprints that support enforcing monotonic constraints will be available in the project or selected for the autopilot.
- allowed_pairwise_interaction_groupslist of tuple, optional
(New in version v2.19) For GA2M models - specify groups of columns for which pairwise interactions will be allowed. E.g. if set to [(A, B, C), (C, D)] then GA2M models will allow interactions between columns AxB, BxC, AxC, CxD. All others (AxD, BxD) will not be considered.
- blend_best_models: bool, optional
(New in version v2.19) blend best models during Autopilot run.
- scoring_code_only: bool, optional
(New in version v2.19) Keep only models that can be converted to scorable java code during Autopilot run
- shap_only_mode: bool, optional
(New in version v2.21) Keep only models that support SHAP values during Autopilot run. Use SHAP-based insights wherever possible. Defaults to False.
- prepare_model_for_deployment: bool, optional
(New in version v2.19) Prepare model for deployment during Autopilot run. The preparation includes creating reduced feature list models, retraining best model on higher sample size, computing insights and assigning “RECOMMENDED FOR DEPLOYMENT” label.
- consider_blenders_in_recommendation: bool, optional
(New in version 2.22.0) Include blenders when selecting a model to prepare for deployment in an Autopilot Run. Defaults to False.
- min_secondary_validation_model_count: int, optional
(New in version v2.19) Compute “All backtest” scores (datetime models) or cross validation scores for the specified number of the highest ranking models on the Leaderboard, if over the Autopilot default.
- autopilot_data_sampling_method: str, optional
(New in version v2.23) one of
datarobot.enums.DATETIME_AUTOPILOT_DATA_SAMPLING_METHOD
. Applicable for OTV projects only, defines if autopilot uses “random” or “latest” sampling when iteratively building models on various training samples. Defaults to “random” for duration-based projects and to “latest” for row-based projects.- run_leakage_removed_feature_list: bool, optional
(New in version v2.23) Run Autopilot on Leakage Removed feature list (if exists).
- autopilot_with_feature_discovery: bool, default ``False``, optional
(New in version v2.23) If true, autopilot will run on a feature list that includes features found via search for interactions.
- feature_discovery_supervised_feature_reduction: bool, optional
(New in version v2.23) Run supervised feature reduction for feature discovery projects.
- exponentially_weighted_moving_alpha: float, optional
(New in version v2.26) defaults to None, value between 0 and 1 (inclusive), indicates alpha parameter used in exponentially weighted moving average within feature derivation window.
- external_time_series_baseline_dataset_id: str, optional
(New in version v2.26) If provided, will generate metrics scaled by external model predictions metric for time series projects. The external predictions catalog must be validated before autopilot starts, see
Project.validate_external_time_series_baseline
and external baseline predictions documentation for further explanation.- use_supervised_feature_reduction: bool, default ``True` optional
Time Series only. When true, during feature generation DataRobot runs a supervised algorithm to retain only qualifying features. Setting to false can severely impact autopilot duration, especially for datasets with many features.
- primary_location_column: str, optional.
The name of primary location column.
- protected_features: list of str, optional.
(New in version v2.24) A list of project features to mark as protected for Bias and Fairness testing calculations. Max number of protected features allowed is 10.
- preferable_target_value: str, optional.
(New in version v2.24) A target value that should be treated as a favorable outcome for the prediction. For example, if we want to check gender discrimination for giving a loan and our target is named
is_bad
, then the positive outcome for the prediction would beNo
, which means that the loan is good and that’s what we treat as a favorable result for the loaner.- fairness_metrics_set: str, optional.
(New in version v2.24) Metric to use for calculating fairness. Can be one of
proportionalParity
,equalParity
,predictionBalance
,trueFavorableAndUnfavorableRateParity
orfavorableAndUnfavorablePredictiveValueParity
. Used and required only if Bias & Fairness in AutoML feature is enabled.- fairness_threshold: str, optional.
(New in version v2.24) Threshold value for the fairness metric. Can be in a range of
[0.0, 1.0]
. If the relative (i.e. normalized) fairness score is below the threshold, then the user will see a visual indication on the- bias_mitigation_feature_namestr, optional
The feature from protected features that will be used in a bias mitigation task to mitigate bias
- bias_mitigation_techniquestr, optional
One of datarobot.enums.BiasMitigationTechnique Options: - ‘preprocessingReweighing’ - ‘postProcessingRejectionOptionBasedClassification’ The technique by which we’ll mitigate bias, which will inform which bias mitigation task we insert into blueprints
- include_bias_mitigation_feature_as_predictor_variablebool, optional
Whether we should also use the mitigation feature as in input to the modeler just like any other categorical used for training, i.e. do we want the model to “train on” this feature in addition to using it for bias mitigation
- default_monotonic_increasing_featurelist_idstr, optional
Returned from server on Project GET request - not able to be updated by user
- default_monotonic_decreasing_featurelist_idstr, optional
Returned from server on Project GET request - not able to be updated by user
Examples
import datarobot as dr advanced_options = dr.AdvancedOptions( weights='weights_column', offset=['offset_column'], exposure='exposure_column', response_cap=0.7, blueprint_threshold=2, smart_downsampled=True, majority_downsampling_rate=75.0)
- update_individual_options(**kwargs)¶
Update individual attributes of an instance of
AdvancedOptions
.- Return type
None
Anomaly Assessment¶
- class datarobot.models.anomaly_assessment.AnomalyAssessmentRecord(status, status_details, start_date, end_date, prediction_threshold, preview_location, delete_location, latest_explanations_location, **record_kwargs)¶
Object which keeps metadata about anomaly assessment insight for the particular subset, backtest and series and the links to proceed to get the anomaly assessment data.
New in version v2.25.
Notes
Record
contains:record_id
: the ID of the record.project_id
: the project ID of the record.model_id
: the model ID of the record.backtest
: the backtest of the record.source
: the source of the record.series_id
: the series id of the record for the multiseries projects.status
: the status of the insight.status_details
: the explanation of the status.start_date
: the ISO-formatted timestamp of the first prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.end_date
: the ISO-formatted timestamp of the last prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.prediction_threshold
: the threshold, all rows with anomaly scores greater or equal to it have shap explanations computed. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.preview_location
: URL to retrieve predictions preview for the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.latest_explanations_location
: the URL to retrieve the latest predictions with the shap explanations. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.delete_location
: the URL to delete anomaly assessment record and relevant insight data.
- Attributes
- record_id: str
The ID of the record.
- project_id: str
The ID of the project record belongs to.
- model_id: str
The ID of the model record belongs to.
- backtest: int or “holdout”
The backtest of the record.
- source: “training” or “validation”
The source of the record
- series_id: str or None
The series id of the record for the multiseries projects. Defined only for the multiseries projects.
- status: str
The status of the insight. One of
datarobot.enums.AnomalyAssessmentStatus
- status_details: str
The explanation of the status.
- start_date: str or None
See start_date info in Notes for more details.
- end_date: str or None
See end_date info in Notes for more details.
- prediction_threshold: float or None
See prediction_threshold info in Notes for more details.
- preview_location: str or None
See preview_location info in Notes for more details.
- latest_explanations_location: str or None
See latest_explanations_location info in Notes for more details.
- delete_location: str
The URL to delete anomaly assessment record and relevant insight data.
- classmethod list(project_id, model_id, backtest=None, source=None, series_id=None, limit=100, offset=0, with_data_only=False)¶
Retrieve the list of the anomaly assessment records for the project and model. Output can be filtered and limited.
- Parameters
- project_id: str
The ID of the project record belongs to.
- model_id: str
The ID of the model record belongs to.
- backtest: int or “holdout”
The backtest to filter records by.
- source: “training” or “validation”
The source to filter records by.
- series_id: str, optional
The series id to filter records by. Can be specified for multiseries projects.
- limit: int, optional
100 by default. At most this many results are returned.
- offset: int, optional
This many results will be skipped.
- with_data_only: bool, False by default
Filter by status == AnomalyAssessmentStatus.COMPLETED. If True, records with no data or not supported will be omitted.
- Returns
- AnomalyAssessmentRecord
The anomaly assessment record.
- Return type
List
[AnomalyAssessmentRecord
]
- classmethod compute(project_id, model_id, backtest, source, series_id=None)¶
Request anomaly assessment insight computation on the specified subset.
- Parameters
- project_id: str
The ID of the project to compute insight for.
- model_id: str
The ID of the model to compute insight for.
- backtest: int or “holdout”
The backtest to compute insight for.
- source: “training” or “validation”
The source to compute insight for.
- series_id: str, optional
The series id to compute insight for. Required for multiseries projects.
- Returns
- AnomalyAssessmentRecord
The anomaly assessment record.
- Return type
- delete()¶
Delete anomaly assessment record with preview and explanations.
- Return type
None
- get_predictions_preview()¶
Retrieve aggregated predictions statistics for the anomaly assessment record.
- Returns
- AnomalyAssessmentPredictionsPreview
- Return type
- get_latest_explanations()¶
Retrieve latest predictions along with shap explanations for the most anomalous records.
- Returns
- AnomalyAssessmentExplanations
- Return type
- get_explanations(start_date=None, end_date=None, points_count=None)¶
Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.
- Parameters
- start_date: str, optional
The start of the date range to get explanations in. Example:
2020-01-01T00:00:00.000000Z
- end_date: str, optional
The end of the date range to get explanations in. Example:
2020-10-01T00:00:00.000000Z
- points_count: int, optional
The number of the rows to return.
- Returns
- AnomalyAssessmentExplanations
- Return type
- get_explanations_data_in_regions(regions, prediction_threshold=0.0)¶
Get predictions along with explanations for the specified regions, sorted by predictions in descending order.
- Parameters
- regions: list of preview_bins
For each region explanations will be retrieved and merged.
- prediction_threshold: float, optional
If specified, only points with score greater or equal to the threshold will be returned.
- Returns
- dict in a form of {‘explanations’: explanations, ‘shap_base_value’: shap_base_value}
- Return type
- class datarobot.models.anomaly_assessment.AnomalyAssessmentExplanations(shap_base_value, data, start_date, end_date, count, **record_kwargs)¶
Object which keeps predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points.
New in version v2.25.
Notes
AnomalyAssessmentExplanations
contains:record_id
: the id of the corresponding anomaly assessment record.project_id
: the project ID of the corresponding anomaly assessment record.model_id
: the model ID of the corresponding anomaly assessment record.backtest
: the backtest of the corresponding anomaly assessment record.source
: the source of the corresponding anomaly assessment record.series_id
: the series id of the corresponding anomaly assessment record for the multiseries projects.start_date
: the ISO-formatted first timestamp in the response. Will be None of there is no data in the specified range.end_date
: the ISO-formatted last timestamp in the response. Will be None of there is no data in the specified range.count
: The number of points in the response.shap_base_value
: the shap base value.data
: list of DataPoint objects in the specified date range.
DataPoint
contains:shap_explanation
: None or an array of up to 10 ShapleyFeatureContribution objects. Only rows with the highest anomaly scores have Shapley explanations calculated. Value is None if prediction is lower than prediction_threshold.timestamp
(str) : ISO-formatted timestamp for the row.prediction
(float) : The output of the model for this row.
ShapleyFeatureContribution
contains:feature_value
(str) : the feature value for this row. First 50 characters are returned.strength
(float) : the shap value for this feature and row.feature
(str) : the feature name.
- Attributes
- record_id: str
The ID of the record.
- project_id: str
The ID of the project record belongs to.
- model_id: str
The ID of the model record belongs to.
- backtest: int or “holdout”
The backtest of the record.
- source: “training” or “validation”
The source of the record.
- series_id: str or None
The series id of the record for the multiseries projects. Defined only for the multiseries projects.
- start_date: str or None
The ISO-formatted datetime of the first row in the
data
.- end_date: str or None
The ISO-formatted datetime of the last row in the
data
.- data: array of `data_point` objects or None
See data info in Notes for more details.
- shap_base_value: float
Shap base value.
- count: int
The number of points in the
data
.
- classmethod get(project_id, record_id, start_date=None, end_date=None, points_count=None)¶
Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.
- Parameters
- project_id: str
The ID of the project.
- record_id: str
The ID of the anomaly assessment record.
- start_date: str, optional
The start of the date range to get explanations in. Example:
2020-01-01T00:00:00.000000Z
- end_date: str, optional
The end of the date range to get explanations in. Example:
2020-10-01T00:00:00.000000Z
- points_count: int, optional
The number of the rows to return.
- Returns
- AnomalyAssessmentExplanations
- Return type
- class datarobot.models.anomaly_assessment.AnomalyAssessmentPredictionsPreview(start_date, end_date, preview_bins, **record_kwargs)¶
Aggregated predictions over time for the corresponding anomaly assessment record. Intended to find the bins with highest anomaly scores.
New in version v2.25.
Notes
AnomalyAssessmentPredictionsPreview
contains:record_id
: the id of the corresponding anomaly assessment record.project_id
: the project ID of the corresponding anomaly assessment record.model_id
: the model ID of the corresponding anomaly assessment record.backtest
: the backtest of the corresponding anomaly assessment record.source
: the source of the corresponding anomaly assessment record.series_id
: the series id of the corresponding anomaly assessment record for the multiseries projects.start_date
: the ISO-formatted timestamp of the first prediction in the subset.end_date
: the ISO-formatted timestamp of the last prediction in the subset.preview_bins
: list of PreviewBin objects. The aggregated predictions for the subset. Bins boundaries may differ from actual start/end dates because this is an aggregation.
PreviewBin
contains:start_date
(str) : the ISO-formatted datetime of the start of the bin.end_date
(str) : the ISO-formatted datetime of the end of the bin.avg_predicted
(float or None) : the average prediction of the model in the bin. None if there are no entries in the bin.max_predicted
(float or None) : the maximum prediction of the model in the bin. None if there are no entries in the bin.frequency
(int) : the number of the rows in the bin.
- Attributes
- record_id: str
The ID of the record.
- project_id: str
The ID of the project record belongs to.
- model_id: str
The ID of the model record belongs to.
- backtest: int or “holdout”
The backtest of the record.
- source: “training” or “validation”
The source of the record
- series_id: str or None
The series id of the record for the multiseries projects. Defined only for the multiseries projects.
- start_date: str
the ISO-formatted timestamp of the first prediction in the subset.
- end_date: str
the ISO-formatted timestamp of the last prediction in the subset.
- preview_bins: list of preview_bin objects.
The aggregated predictions for the subset. See more info in Notes.
- classmethod get(project_id, record_id)¶
Retrieve aggregated predictions over time.
- Parameters
- project_id: str
The ID of the project.
- record_id: str
The ID of the anomaly assessment record.
- Returns
- AnomalyAssessmentPredictionsPreview
- Return type
- find_anomalous_regions(max_prediction_threshold=0.0)¶
- Sort preview bins by max_predicted value and select those with max predicted value
greater or equal to max prediction threshold. Sort the result by max predicted value in descending order.
- Parameters
- max_prediction_threshold: float, optional
Return bins with maximum anomaly score greater or equal to max_prediction_threshold.
- Returns
- preview_bins: list of preview_bin
Filtered and sorted preview bins
- Return type
Application¶
- class datarobot.Application(id, application_type_id, user_id, model_deployment_id, name, created_by, created_at, updated_at, datasets, cloud_provider, deployment_ids, pool_used, permissions, has_custom_logo, org_id, deployment_status_id=None, description=None, related_entities=None, application_template_type=None, deployment_name=None, deactivation_status_id=None, created_first_name=None, creator_last_name=None, creator_userhash=None, deployments=None)¶
An entity associated with a DataRobot Application.
- Attributes
- idstr
The ID of the created application.
- application_type_idstr
The ID of the type of the application.
- user_idstr
The ID of the user which created the application.
- model_deployment_idstr
The ID of the associated model deployment.
- deactivation_status_idstr or None
The ID of the status object to track the asynchronous app deactivation process status. Will be None if the app was never deactivated.
- namestr
The name of the application.
- created_bystr
The username of the user created the application.
- created_atstr
The timestamp when the application was created.
- updated_atstr
The timestamp when the application was updated.
- datasetsList[str]
The list of datasets IDs associated with the application.
- creator_first_nameOptional[str]
Application creator first name. Optional.
- creator_last_nameOptional[str]
Application creator last name. Optional.
- creator_userhashOptional[str]
Application creator userhash. Optional.
- deployment_status_idstr
The ID of the status object to track the asynchronous deployment process status.
- descriptionstr
A description of the application.
- cloud_providerstr
The host of this application.
- deploymentsOptional[List[ApplicationDeployment]]
A list of deployment details. Optional.
- deployment_idsList[str]
A list of deployment IDs for this app.
- deployment_nameOptional[str]
Name of the deployment. Optional.
- application_template_typeOptional[str]
Application template type, purpose. Optional.
- pool_usedbool
Whether the pool where used for last app deployment.
- permissionsList[str]
The list of permitted actions, which the authenticated user can perform on this application. Permissions should be ApplicationPermission options.
- has_custom_logobool
Whether the app has a custom logo.
- related_entitiesOptional[ApplcationRelatedEntity]
IDs of entities, related to app for easy search.
- org_idstr
ID of the app’s organization.
- classmethod list(offset=None, limit=None, use_cases=None)¶
Retrieve a list of user applications.
- Parameters
- offsetOptional[int]
Optional. Retrieve applications in a list after this number.
- limitOptional[int]
Optional. Retrieve only this number of applications.
- use_cases: Optional[Union[UseCase, List[UseCase], str, List[str]]]
Optional. Filter available Applications by a specific Use Case or Use Cases. Accepts either the entity or the ID.
- Returns
- applicationsList[Application]
The requested list of user applications.
- Return type
List
[Application
]
- classmethod get(application_id)¶
Retrieve a single application.
- Parameters
- application_idstr
The ID of the application to retrieve.
- Returns
- applicationApplication
The requested application.
- Return type
Batch Predictions¶
- class datarobot.models.BatchPredictionJob(data, completed_resource_url=None)¶
A Batch Prediction Job is used to score large data sets on prediction servers using the Batch Prediction API.
- Attributes
- idstr
the id of the job
- classmethod score(deployment, intake_settings=None, output_settings=None, csv_settings=None, timeseries_settings=None, num_concurrent=None, chunk_size=None, passthrough_columns=None, passthrough_columns_set=None, max_explanations=None, max_ngram_explanations=None, threshold_high=None, threshold_low=None, prediction_warning_enabled=None, include_prediction_status=False, skip_drift_tracking=False, prediction_instance=None, abort_on_error=True, column_names_remapping=None, include_probabilities=True, include_probabilities_classes=None, download_timeout=120, download_read_timeout=660, upload_read_timeout=600, explanations_mode=None)¶
Create new batch prediction job, upload the scoring dataset and return a batch prediction job.
The default intake and output options are both localFile which requires the caller to pass the file parameter and either download the results using the download() method afterwards or pass a path to a file where the scored data will be downloaded to afterwards.
- Returns
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes
- deploymentDeployment or string ID
Deployment which will be used for scoring.
- intake_settingsdict (optional)
A dict configuring how data is coming from. Supported options:
type : string, either localFile, s3, azure, gcp, dataset, jdbc snowflake, synapse or bigquery
Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a dr.Dataset object.
To score from a local file, add the this parameter to the settings:
file : file-like object, string path to file or a pandas.DataFrame of scoring data
To score from S3, add the next parameters to the settings:
url : string, the URL to score (e.g.: s3://bucket/key)
credential_id : string (optional)
endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default)
To score from JDBC, add the next parameters to the settings:
data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).
query : string (optional if table, schema and/or catalog is specified), a self-supplied SELECT statement of the data set you wish to predict.
table : string (optional if query is specified), the name of specified database table.
schema : string (optional if query is specified), the name of specified database schema.
catalog : string (optional if query is specified), (new in v2.22) the name of specified database catalog.
fetch_size : int (optional), Changing the fetchSize can be used to balance throughput and memory usage.
credential_id : string (optional) the ID of the credentials holding information about a user with read-access to the JDBC data source (see Credentials).
- output_settingsdict (optional)
A dict configuring how scored data is to be saved. Supported options:
type : string, either localFile, s3, azure, gcp, jdbc, snowflake, synapse or bigquery
To save scored data to a local file, add this parameters to the settings:
path : string (optional), path to save the scored data as CSV. If a path is not specified, you must download the scored data yourself with job.download(). If a path is specified, the call will block until the job is done. if there are no other jobs currently processing for the targeted prediction instance, uploading, scoring, downloading will happen in parallel without waiting for a full job to complete. Otherwise, it will still block, but start downloading the scored data as soon as it starts generating data. This is the fastest method to get predictions.
To save scored data to S3, add the next parameters to the settings:
url : string, the URL for storing the results (e.g.: s3://bucket/key)
credential_id : string (optional)
endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default)
To save scored data to JDBC, add the next parameters to the settings:
data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).
table : string, the name of specified database table.
schema : string (optional), the name of specified database schema.
catalog : string (optional), (new in v2.22) the name of specified database catalog.
statement_type : string, the type of insertion statement to create, one of
datarobot.enums.AVAILABLE_STATEMENT_TYPES
.update_columns : list(string) (optional), a list of strings containing those column names to be updated in case statement_type is set to a value related to update or upsert.
where_columns : list(string) (optional), a list of strings containing those column names to be selected in case statement_type is set to a value related to insert or update.
credential_id : string, the ID of the credentials holding information about a user with write-access to the JDBC data source (see Credentials).
create_table_if_not_exists : bool (optional), If no existing table is detected, attempt to create it before writing data with the strategy defined in the statementType parameter.
- csv_settingsdict (optional)
CSV intake and output settings. Supported options:
delimiter : string (optional, default ,), fields are delimited by this character. Use the string tab to denote TSV (TAB separated values). Must be either a one-character string or the string tab.
quotechar : string (optional, default “), fields containing the delimiter must be quoted using this character.
encoding : string (optional, default utf-8), encoding for the CSV files. For example (but not limited to): shift_jis, latin_1 or mskanji.
- timeseries_settingsdict (optional)
Configuration for time-series scoring. Supported options:
type : string, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point or rows in the dataset without target. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.
forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions, by default value will be inferred from the dataset. May be passed if
timeseries_settings.type=forecast
.predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- num_concurrentint (optional)
Number of concurrent chunks to score simultaneously. Defaults to the available number of cores of the deployment. Lower it to leave resources for real-time scoring.
- chunk_sizestring or int (optional)
Which strategy should be used to determine the chunk size. Can be either a named strategy or a fixed size in bytes. - auto: use fixed or dynamic based on flipper - fixed: use 1MB for explanations, 5MB for regular requests - dynamic: use dynamic chunk sizes - int: use this many bytes per chunk
- passthrough_columnslist[string] (optional)
Keep these columns from the scoring dataset in the scored dataset. This is useful for correlating predictions with source data.
- passthrough_columns_setstring (optional)
To pass through every column from the scoring dataset, set this to all. Takes precedence over passthrough_columns if set.
- max_explanationsint (optional)
Compute prediction explanations for this amount of features.
- max_ngram_explanationsint or str (optional)
Compute text explanations for this amount of ngrams. Set to all to return all ngram explanations, or set to a positive integer value to limit the amount of ngram explanations returned. By default no ngram explanations will be computed and returned.
- threshold_highfloat (optional)
Only compute prediction explanations for predictions above this threshold. Can be combined with threshold_low.
- threshold_lowfloat (optional)
Only compute prediction explanations for predictions below this threshold. Can be combined with threshold_high.
- explanations_modePredictionExplanationsMode, optional
Mode of prediction explanations calculation for multiclass and clustering models, if not specified - server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).
- prediction_warning_enabledboolean (optional)
Add prediction warnings to the scored data. Currently only supported for regression models.
- include_prediction_statusboolean (optional)
Include the prediction_status column in the output, defaults to False.
- skip_drift_trackingboolean (optional)
Skips drift tracking on any predictions made from this job. This is useful when running non-production workloads to not affect drift tracking and cause unnecessary alerts. Defaults to False.
- prediction_instancedict (optional)
Defaults to instance specified by deployment or system configuration. Supported options:
hostName : string
sslEnabled : boolean (optional, default true). Set to false to run prediction requests from the batch prediction job without SSL.
datarobotKey : string (optional), if running a job against a prediction instance in the Managed AI Cloud, you must provide the organization level DataRobot-Key
apiKey : string (optional), by default, prediction requests will use the API key of the user that created the job. This allows you to make requests on behalf of other users.
- abort_on_errorboolean (optional)
Default behavior is to abort the job if too many rows fail scoring. This will free up resources for other jobs that may score successfully. Set to false to unconditionally score every row no matter how many errors are encountered. Defaults to True.
- column_names_remappingdict (optional)
Mapping with column renaming for output table. Defaults to {}.
- include_probabilitiesboolean (optional)
Flag that enables returning of all probability columns. Defaults to True.
- include_probabilities_classeslist (optional)
List the subset of classes if a user doesn’t want all the classes. Defaults to [].
- download_timeoutint (optional)
New in version 2.22.
If using localFile output, wait this many seconds for the download to become available. See download().
- download_read_timeoutint (optional, default 660)
New in version 2.22.
If using localFile output, wait this many seconds for the server to respond between chunks.
- upload_read_timeout: int (optional, default 600)
New in version 2.28.
If using localFile intake, wait this many seconds for the server to respond after whole dataset upload.
- Return type
- classmethod apply_time_series_data_prep_and_score(deployment, intake_settings, timeseries_settings, **kwargs)¶
Prepare the dataset with time series data prep, create new batch prediction job, upload the scoring dataset, and return a batch prediction job.
The supported intake_settings are of type localFile or dataset.
For timeseries_settings of type forecast the forecast_point must be specified.
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters.New in version v3.1.
- Returns
- BatchPredictionJob
Instance of BatchPredictionJob
- Raises
- InvalidUsageError
If the deployment does not support time series data prep. If the intake type is not supported for time series data prep.
- Attributes
- deploymentDeployment
Deployment which will be used for scoring.
- intake_settingsdict
A dict configuring where data is coming from. Supported options:
type : string, either localFile, dataset
Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a
Dataset
object.To score from a local file, add this parameter to the settings:
file : file-like object, string path to file or a pandas.DataFrame of scoring data.
- timeseries_settingsdict
Configuration for time-series scoring. Supported options:
type : string, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.
forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions. Must be passed if
timeseries_settings.type=forecast
.predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- Return type
- classmethod score_to_file(deployment, intake_path, output_path, **kwargs)¶
Create new batch prediction job, upload the scoring dataset and download the scored CSV file concurrently.
Will block until the entire file is scored.
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters.- Returns
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes
- deploymentDeployment or string ID
Deployment which will be used for scoring.
- intake_pathfile-like object/string path to file/pandas.DataFrame
Scoring data
- output_pathstr
Filename to save the result under
- classmethod apply_time_series_data_prep_and_score_to_file(deployment, intake_path, output_path, timeseries_settings, **kwargs)¶
Prepare the input dataset with time series data prep. Then, create a new batch prediction job using the prepared AI catalog item as input and concurrently download the scored CSV file.
The function call will return when the entire file is scored.
For timeseries_settings of type forecast the forecast_point must be specified.
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters.New in version v3.1.
- Returns
- BatchPredictionJob
Instance of BatchPredictionJob.
- Raises
- InvalidUsageError
If the deployment does not support time series data prep.
- Attributes
- deploymentDeployment
The deployment which will be used for scoring.
- intake_pathfile-like object/string path to file/pandas.DataFrame
The scoring data.
- output_pathstr
The filename under which you save the result.
- timeseries_settingsdict
Configuration for time-series scoring. Supported options:
type : string, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.
forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions. Must be passed if
timeseries_settings.type=forecast
.predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- Return type
- classmethod score_s3(deployment, source_url, destination_url, credential=None, endpoint_url=None, **kwargs)¶
Create new batch prediction job, with a scoring dataset from S3 and writing the result back to S3.
This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job)
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters.- Returns
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes
- deploymentDeployment or string ID
Deployment which will be used for scoring.
- source_urlstring
The URL for the prediction dataset (e.g.: s3://bucket/key)
- destination_urlstring
The URL for the scored dataset (e.g.: s3://bucket/key)
- credentialstring or Credential (optional)
The AWS Credential object or credential id
- endpoint_urlstring (optional)
Any non-default endpoint URL for S3 access (omit to use the default)
- classmethod score_azure(deployment, source_url, destination_url, credential=None, **kwargs)¶
Create new batch prediction job, with a scoring dataset from Azure blob storage and writing the result back to Azure blob storage.
This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job).
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters.- Returns
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes
- deploymentDeployment or string ID
Deployment which will be used for scoring.
- source_urlstring
The URL for the prediction dataset (e.g.: https://storage_account.blob.endpoint/container/blob_name)
- destination_urlstring
The URL for the scored dataset (e.g.: https://storage_account.blob.endpoint/container/blob_name)
- credentialstring or Credential (optional)
The Azure Credential object or credential id
- classmethod score_gcp(deployment, source_url, destination_url, credential=None, **kwargs)¶
Create new batch prediction job, with a scoring dataset from Google Cloud Storage and writing the result back to one.
This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job).
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters.- Returns
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes
- deploymentDeployment or string ID
Deployment which will be used for scoring.
- source_urlstring
The URL for the prediction dataset (e.g.: http(s)://storage.googleapis.com/[bucket]/[object])
- destination_urlstring
The URL for the scored dataset (e.g.: http(s)://storage.googleapis.com/[bucket]/[object])
- credentialstring or Credential (optional)
The GCP Credential object or credential id
- classmethod score_from_existing(batch_prediction_job_id)¶
Create a new batch prediction job based on the settings from a previously created one
- Returns
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes
- batch_prediction_job_id: str
ID of the previous batch prediction job
- Return type
- classmethod score_pandas(deployment, df, read_timeout=660, **kwargs)¶
Run a batch prediction job, with a scoring dataset from a pandas dataframe. The output from the prediction will be joined to the passed DataFrame and returned.
Use columnNamesRemapping to drop or rename columns in the output
This method blocks until the job has completed or raises an exception on errors.
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters.- Returns
- BatchPredictionJob
Instance of BatchPredictonJob
- pandas.DataFrame
The original dataframe merged with the predictions
- Attributes
- deploymentDeployment or string ID
Deployment which will be used for scoring.
- dfpandas.DataFrame
The dataframe to score
- Return type
Tuple
[BatchPredictionJob
,DataFrame
]
- classmethod get(batch_prediction_job_id)¶
Get batch prediction job
- Returns
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes
- batch_prediction_job_id: str
ID of batch prediction job
- Return type
- download(fileobj, timeout=120, read_timeout=660)¶
Downloads the CSV result of a prediction job
- Attributes
- fileobj: A file-like object where the CSV prediction results will be
written to. Examples include an in-memory buffer (e.g., io.BytesIO) or a file on disk (opened for binary writing).
- timeoutint (optional, default 120)
New in version 2.22.
Seconds to wait for the download to become available.
The download will not be available before the job has started processing. In case other jobs are occupying the queue, processing may not start immediately.
If the timeout is reached, the job will be aborted and RuntimeError is raised.
Set to -1 to wait infinitely.
- read_timeoutint (optional, default 660)
New in version 2.22.
Seconds to wait for the server to respond between chunks.
- Return type
None
- delete(ignore_404_errors=False)¶
Cancel this job. If this job has not finished running, it will be removed and canceled.
- Return type
None
- get_status()¶
Get status of batch prediction job
- Returns
- BatchPredictionJob status data
Dict with job status
- classmethod list_by_status(statuses=None)¶
Get jobs collection for specific set of statuses
- Returns
- BatchPredictionJob statuses
List of job statuses dicts with specific statuses
- Attributes
- statuses
List of statuses to filter jobs ([ABORTED|COMPLETED…]) if statuses is not provided, returns all jobs for user
- Return type
List
[BatchPredictionJob
]
- class datarobot.models.BatchPredictionJobDefinition(id=None, name=None, enabled=None, schedule=None, batch_prediction_job=None, created=None, updated=None, created_by=None, updated_by=None, last_failed_run_time=None, last_successful_run_time=None, last_started_job_status=None, last_scheduled_run_time=None)¶
- classmethod get(batch_prediction_job_definition_id)¶
Get batch prediction job definition
- Returns
- BatchPredictionJobDefinition
Instance of BatchPredictionJobDefinition
Examples
>>> import datarobot as dr >>> definition = dr.BatchPredictionJobDefinition.get('5a8ac9ab07a57a0001be501f') >>> definition BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
- Attributes
- batch_prediction_job_definition_id: str
ID of batch prediction job definition
- Return type
- classmethod list()¶
Get job all definitions
- Returns
- List[BatchPredictionJobDefinition]
List of job definitions the user has access to see
Examples
>>> import datarobot as dr >>> definition = dr.BatchPredictionJobDefinition.list() >>> definition [ BatchPredictionJobDefinition(60912e09fd1f04e832a575c1), BatchPredictionJobDefinition(6086ba053f3ef731e81af3ca) ]
- Return type
- classmethod create(enabled, batch_prediction_job, name=None, schedule=None)¶
Creates a new batch prediction job definition to be run either at scheduled interval or as a manual run.
- Returns
- BatchPredictionJobDefinition
Instance of BatchPredictionJobDefinition
Examples
>>> import datarobot as dr >>> job_spec = { ... "num_concurrent": 4, ... "deployment_id": "foobar", ... "intake_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ... "output_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ...} >>> schedule = { ... "day_of_week": [ ... 1 ... ], ... "month": [ ... "*" ... ], ... "hour": [ ... 16 ... ], ... "minute": [ ... 0 ... ], ... "day_of_month": [ ... 1 ... ] ...} >>> definition = BatchPredictionJobDefinition.create( ... enabled=False, ... batch_prediction_job=job_spec, ... name="some_definition_name", ... schedule=schedule ... ) >>> definition BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
- Attributes
- enabledbool (default False)
Whether or not the definition should be active on a scheduled basis. If True, schedule is required.
- batch_prediction_job: dict
The job specifications for your batch prediction job. It requires the same job input parameters as used with
score()
, only it will not initialize a job scoring, only store it as a definition for later use.- namestring (optional)
The name you want your job to be identified with. Must be unique across the organization’s existing jobs. If you don’t supply a name, a random one will be generated for you.
- scheduledict (optional)
The
schedule
payload defines at what intervals the job should run, which can be combined in various ways to construct complex scheduling terms if needed. In all of the elements in the objects, you can supply either an asterisk["*"]
denoting “every” time denomination or an array of integers (e.g.[1, 2, 3]
) to define a specific interval.The
schedule
payload is split up in the following items:Minute:
The minute(s) of the day that the job will run. Allowed values are either
["*"]
meaning every minute of the day or[0 ... 59]
Hour: The hour(s) of the day that the job will run. Allowed values are either
["*"]
meaning every hour of the day or[0 ... 23]
.Day of Month: The date(s) of the month that the job will run. Allowed values are either
[1 ... 31]
or["*"]
for all days of the month. This field is additive withdayOfWeek
, meaning the job will run both on the date(s) defined in this field and the day specified bydayOfWeek
(for example, dates 1st, 2nd, 3rd, plus every Tuesday). IfdayOfMonth
is set to["*"]
anddayOfWeek
is defined, the scheduler will trigger on every day of the month that matchesdayOfWeek
(for example, Tuesday the 2nd, 9th, 16th, 23rd, 30th). Invalid dates such as February 31st are ignored.Month: The month(s) of the year that the job will run. Allowed values are either
[1 ... 12]
or["*"]
for all months of the year. Strings, either 3-letter abbreviations or the full name of the month, can be used interchangeably (e.g., “jan” or “october”). Months that are not compatible withdayOfMonth
are ignored, for example{"dayOfMonth": [31], "month":["feb"]}
Day of Week: The day(s) of the week that the job will run. Allowed values are
[0 .. 6]
, where (Sunday=0), or["*"]
, for all days of the week. Strings, either 3-letter abbreviations or the full name of the day, can be used interchangeably (e.g., “sunday”, “Sunday”, “sun”, or “Sun”, all map to[0]
. This field is additive withdayOfMonth
, meaning the job will run both on the date specified bydayOfMonth
and the day defined in this field.
- Return type
- update(enabled, batch_prediction_job=None, name=None, schedule=None)¶
Updates a job definition with the changed specs.
Takes the same input as
create()
- Returns
- BatchPredictionJobDefinition
Instance of the updated BatchPredictionJobDefinition
Examples
>>> import datarobot as dr >>> job_spec = { ... "num_concurrent": 5, ... "deployment_id": "foobar_new", ... "intake_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ... "output_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ...} >>> schedule = { ... "day_of_week": [ ... 1 ... ], ... "month": [ ... "*" ... ], ... "hour": [ ... "*" ... ], ... "minute": [ ... 30, 59 ... ], ... "day_of_month": [ ... 1, 2, 6 ... ] ...} >>> definition = BatchPredictionJobDefinition.create( ... enabled=False, ... batch_prediction_job=job_spec, ... name="updated_definition_name", ... schedule=schedule ... ) >>> definition BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
- Attributes
- Return type
- run_on_schedule(schedule)¶
Sets the run schedule of an already created job definition.
If the job was previously not enabled, this will also set the job to enabled.
- Returns
- BatchPredictionJobDefinition
Instance of the updated BatchPredictionJobDefinition with the new / updated schedule.
Examples
>>> import datarobot as dr >>> definition = dr.BatchPredictionJobDefinition.create('...') >>> schedule = { ... "day_of_week": [ ... 1 ... ], ... "month": [ ... "*" ... ], ... "hour": [ ... "*" ... ], ... "minute": [ ... 30, 59 ... ], ... "day_of_month": [ ... 1, 2, 6 ... ] ...} >>> definition.run_on_schedule(schedule) BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
- Attributes
- scheduledict
Same as
schedule
increate()
.
- Return type
- run_once()¶
Manually submits a batch prediction job to the queue, based off of an already created job definition.
- Returns
- BatchPredictionJob
Instance of BatchPredictionJob
Examples
>>> import datarobot as dr >>> definition = dr.BatchPredictionJobDefinition.create('...') >>> job = definition.run_once() >>> job.wait_for_completion()
- Return type
- delete()¶
Deletes the job definition and disables any future schedules of this job if any. If a scheduled job is currently running, this will not be cancelled.
Examples
>>> import datarobot as dr >>> definition = dr.BatchPredictionJobDefinition.get('5a8ac9ab07a57a0001be501f') >>> definition.delete()
- Return type
None
Batch Monitoring¶
- class datarobot.models.BatchMonitoringJob(data, completed_resource_url=None)¶
A Batch Monitoring Job is used to monitor data sets outside DataRobot app.
- Attributes
- idstr
the id of the job
- classmethod get(project_id, job_id)¶
Get batch monitoring job
- Returns
- BatchMonitoringJob
Instance of BatchMonitoringJob
- Attributes
- job_id: str
ID of batch job
- Return type
- download(fileobj, timeout=120, read_timeout=660)¶
Downloads the results of a monitoring job as a CSV.
- Attributes
- fileobj: A file-like object where the CSV monitoring results will be
written to. Examples include an in-memory buffer (e.g., io.BytesIO) or a file on disk (opened for binary writing).
- timeoutint (optional, default 120)
Seconds to wait for the download to become available.
The download will not be available before the job has started processing. In case other jobs are occupying the queue, processing may not start immediately.
If the timeout is reached, the job will be aborted and RuntimeError is raised.
Set to -1 to wait infinitely.
- read_timeoutint (optional, default 660)
Seconds to wait for the server to respond between chunks.
- Return type
None
- classmethod run(deployment, intake_settings=None, output_settings=None, csv_settings=None, num_concurrent=None, chunk_size=None, abort_on_error=True, monitoring_aggregation=None, monitoring_columns=None, monitoring_output_settings=None, download_timeout=120, download_read_timeout=660, upload_read_timeout=600)¶
Create new batch monitoring job, upload the dataset, and return a batch monitoring job.
- Returns
- BatchMonitoringJob
Instance of BatchMonitoringJob
Examples
>>> import datarobot as dr >>> job_spec = { ... "intake_settings": { ... "type": "jdbc", ... "data_store_id": "645043933d4fbc3215f17e34", ... "catalog": "SANDBOX", ... "table": "10kDiabetes_output_actuals", ... "schema": "SCORING_CODE_UDF_SCHEMA", ... "credential_id": "645043b61a158045f66fb329" ... }, >>> "monitoring_columns": { ... "predictions_columns": [ ... { ... "class_name": "True", ... "column_name": "readmitted_True_PREDICTION" ... }, ... { ... "class_name": "False", ... "column_name": "readmitted_False_PREDICTION" ... } ... ], ... "association_id_column": "rowID", ... "actuals_value_column": "ACTUALS" ... } ... } >>> deployment_id = "foobar" >>> job = dr.BatchMonitoringJob.run(deployment_id, **job_spec) >>> job.wait_for_completion()
- Attributes
- deploymentDeployment or string ID
Deployment which will be used for monitoring.
- intake_settingsdict
A dict configuring how data is coming from. Supported options:
type : string, either localFile, s3, azure, gcp, dataset, jdbc snowflake, synapse or bigquery
Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a dr.Dataset object.
To monitor from a local file, add this parameter to the settings:
file : A file-like object, string path to a file or a pandas.DataFrame of scoring data.
To monitor from S3, add the next parameters to the settings:
url : string, the URL to score (e.g.: s3://bucket/key).
credential_id : string (optional).
endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default).
To monitor from JDBC, add the next parameters to the settings:
data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).
query : string (optional if table, schema and/or catalog is specified), a self-supplied SELECT statement of the data set you wish to predict.
table : string (optional if query is specified), the name of specified database table.
schema : string (optional if query is specified), the name of specified database schema.
catalog : string (optional if query is specified), (new in v2.22) the name of specified database catalog.
fetch_size : int (optional), Changing the fetchSize can be used to balance throughput and memory usage.
credential_id : string (optional) the ID of the credentials holding information about a user with read-access to the JDBC data source (see Credentials).
- output_settingsdict (optional)
A dict configuring how monitored data is to be saved. Supported options:
type : string, either localFile, s3, azure, gcp, jdbc, snowflake, synapse or bigquery
To save monitored data to a local file, add parameters to the settings:
path : string (optional), path to save the scored data as CSV. If a path is not specified, you must download the scored data yourself with job.download(). If a path is specified, the call will block until the job is done. if there are no other jobs currently processing for the targeted prediction instance, uploading, scoring, downloading will happen in parallel without waiting for a full job to complete. Otherwise, it will still block, but start downloading the scored data as soon as it starts generating data. This is the fastest method to get predictions.
To save monitored data to S3, add the next parameters to the settings:
url : string, the URL for storing the results (e.g.: s3://bucket/key).
credential_id : string (optional).
endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default).
To save monitored data to JDBC, add the next parameters to the settings:
data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).
table : string, the name of specified database table.
schema : string (optional), the name of specified database schema.
catalog : string (optional), (new in v2.22) the name of specified database catalog.
statement_type : string, the type of insertion statement to create, one of
datarobot.enums.AVAILABLE_STATEMENT_TYPES
.update_columns : list(string) (optional), a list of strings containing those column names to be updated in case statement_type is set to a value related to update or upsert.
where_columns : list(string) (optional), a list of strings containing those column names to be selected in case statement_type is set to a value related to insert or update.
credential_id : string, the ID of the credentials holding information about a user with write-access to the JDBC data source (see Credentials).
create_table_if_not_exists : bool (optional), If no existing table is detected, attempt to create it before writing data with the strategy defined in the statementType parameter.
- csv_settingsdict (optional)
CSV intake and output settings. Supported options:
delimiter : string (optional, default ,), fields are delimited by this character. Use the string tab to denote TSV (TAB separated values). Must be either a one-character string or the string tab.
quotechar : string (optional, default “), fields containing the delimiter must be quoted using this character.
encoding : string (optional, default utf-8), encoding for the CSV files. For example (but not limited to): shift_jis, latin_1 or mskanji.
- num_concurrentint (optional)
Number of concurrent chunks to score simultaneously. Defaults to the available number of cores of the deployment. Lower it to leave resources for real-time scoring.
- chunk_sizestring or int (optional)
Which strategy should be used to determine the chunk size. Can be either a named strategy or a fixed size in bytes. - auto: use fixed or dynamic based on flipper. - fixed: use 1MB for explanations, 5MB for regular requests. - dynamic: use dynamic chunk sizes. - int: use this many bytes per chunk.
- abort_on_errorboolean (optional)
Default behavior is to abort the job if too many rows fail scoring. This will free up resources for other jobs that may score successfully. Set to false to unconditionally score every row no matter how many errors are encountered. Defaults to True.
- download_timeoutint (optional)
New in version 2.22.
If using localFile output, wait this many seconds for the download to become available. See download().
- download_read_timeoutint (optional, default 660)
New in version 2.22.
If using localFile output, wait this many seconds for the server to respond between chunks.
- upload_read_timeout: int (optional, default 600)
New in version 2.28.
If using localFile intake, wait this many seconds for the server to respond after whole dataset upload.
- Return type
- cancel(ignore_404_errors=False)¶
Cancel this job. If this job has not finished running, it will be removed and canceled.
- Return type
None
- get_status()¶
Get status of batch monitoring job
- Returns
- BatchMonitoringJob status data
Dict with job status
- Return type
Any
- class datarobot.models.BatchMonitoringJobDefinition(id=None, name=None, enabled=None, schedule=None, batch_monitoring_job=None, created=None, updated=None, created_by=None, updated_by=None, last_failed_run_time=None, last_successful_run_time=None, last_started_job_status=None, last_scheduled_run_time=None)¶
- classmethod get(batch_monitoring_job_definition_id)¶
Get batch monitoring job definition
- Returns
- BatchMonitoringJobDefinition
Instance of BatchMonitoringJobDefinition
Examples
>>> import datarobot as dr >>> definition = dr.BatchMonitoringJobDefinition.get('5a8ac9ab07a57a0001be501f') >>> definition BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1)
- Attributes
- batch_monitoring_job_definition_id: str
ID of batch monitoring job definition
- Return type
- classmethod list()¶
Get job all monitoring job definitions
- Returns
- List[BatchMonitoringJobDefinition]
List of job definitions the user has access to see
Examples
>>> import datarobot as dr >>> definition = dr.BatchMonitoringJobDefinition.list() >>> definition [ BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1), BatchMonitoringJobDefinition(6086ba053f3ef731e81af3ca) ]
- Return type
- classmethod create(enabled, batch_monitoring_job, name=None, schedule=None)¶
Creates a new batch monitoring job definition to be run either at scheduled interval or as a manual run.
- Returns
- BatchMonitoringJobDefinition
Instance of BatchMonitoringJobDefinition
Examples
>>> import datarobot as dr >>> job_spec = { ... "num_concurrent": 4, ... "deployment_id": "foobar", ... "intake_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ... "output_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ...} >>> schedule = { ... "day_of_week": [ ... 1 ... ], ... "month": [ ... "*" ... ], ... "hour": [ ... 16 ... ], ... "minute": [ ... 0 ... ], ... "day_of_month": [ ... 1 ... ] ...} >>> definition = BatchMonitoringJobDefinition.create( ... enabled=False, ... batch_monitoring_job=job_spec, ... name="some_definition_name", ... schedule=schedule ... ) >>> definition BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1)
- Attributes
- enabledbool (default False)
Whether the definition should be active on a scheduled basis. If True, schedule is required.
- batch_monitoring_job: dict
The job specifications for your batch monitoring job. It requires the same job input parameters as used with BatchMonitoringJob
- namestring (optional)
The name you want your job to be identified with. Must be unique across the organization’s existing jobs. If you don’t supply a name, a random one will be generated for you.
- scheduledict (optional)
The
schedule
payload defines at what intervals the job should run, which can be combined in various ways to construct complex scheduling terms if needed. In all the elements in the objects, you can supply either an asterisk["*"]
denoting “every” time denomination or an array of integers (e.g.[1, 2, 3]
) to define a specific interval.The
schedule
payload is split up in the following items:Minute:
The minute(s) of the day that the job will run. Allowed values are either
["*"]
meaning every minute of the day or[0 ... 59]
Hour: The hour(s) of the day that the job will run. Allowed values are either
["*"]
meaning every hour of the day or[0 ... 23]
.Day of Month: The date(s) of the month that the job will run. Allowed values are either
[1 ... 31]
or["*"]
for all days of the month. This field is additive withdayOfWeek
, meaning the job will run both on the date(s) defined in this field and the day specified bydayOfWeek
(for example, dates 1st, 2nd, 3rd, plus every Tuesday). IfdayOfMonth
is set to["*"]
anddayOfWeek
is defined, the scheduler will trigger on every day of the month that matchesdayOfWeek
(for example, Tuesday the 2nd, 9th, 16th, 23rd, 30th). Invalid dates such as February 31st are ignored.Month: The month(s) of the year that the job will run. Allowed values are either
[1 ... 12]
or["*"]
for all months of the year. Strings, either 3-letter abbreviations or the full name of the month, can be used interchangeably (e.g., “jan” or “october”). Months that are not compatible withdayOfMonth
are ignored, for example{"dayOfMonth": [31], "month":["feb"]}
Day of Week: The day(s) of the week that the job will run. Allowed values are
[0 .. 6]
, where (Sunday=0), or["*"]
, for all days of the week. Strings, either 3-letter abbreviations or the full name of the day, can be used interchangeably (e.g., “sunday”, “Sunday”, “sun”, or “Sun”, all map to[0]
. This field is additive withdayOfMonth
, meaning the job will run both on the date specified bydayOfMonth
and the day defined in this field.
- Return type
- update(enabled, batch_monitoring_job=None, name=None, schedule=None)¶
Updates a job definition with the changed specs.
Takes the same input as
create()
- Returns
- BatchMonitoringJobDefinition
Instance of the updated BatchMonitoringJobDefinition
Examples
>>> import datarobot as dr >>> job_spec = { ... "num_concurrent": 5, ... "deployment_id": "foobar_new", ... "intake_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ... "output_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ...} >>> schedule = { ... "day_of_week": [ ... 1 ... ], ... "month": [ ... "*" ... ], ... "hour": [ ... "*" ... ], ... "minute": [ ... 30, 59 ... ], ... "day_of_month": [ ... 1, 2, 6 ... ] ...} >>> definition = BatchMonitoringJobDefinition.create( ... enabled=False, ... batch_monitoring_job=job_spec, ... name="updated_definition_name", ... schedule=schedule ... ) >>> definition BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1)
- Attributes
- Return type
- run_on_schedule(schedule)¶
Sets the run schedule of an already created job definition.
If the job was previously not enabled, this will also set the job to enabled.
- Returns
- BatchMonitoringJobDefinition
Instance of the updated BatchMonitoringJobDefinition with the new / updated schedule.
Examples
>>> import datarobot as dr >>> definition = dr.BatchMonitoringJobDefinition.create('...') >>> schedule = { ... "day_of_week": [ ... 1 ... ], ... "month": [ ... "*" ... ], ... "hour": [ ... "*" ... ], ... "minute": [ ... 30, 59 ... ], ... "day_of_month": [ ... 1, 2, 6 ... ] ...} >>> definition.run_on_schedule(schedule) BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1)
- Attributes
- scheduledict
Same as
schedule
increate()
.
- Return type
- run_once()¶
Manually submits a batch monitoring job to the queue, based off of an already created job definition.
- Returns
- BatchMonitoringJob
Instance of BatchMonitoringJob
Examples
>>> import datarobot as dr >>> definition = dr.BatchMonitoringJobDefinition.create('...') >>> job = definition.run_once() >>> job.wait_for_completion()
- Return type
- delete()¶
Deletes the job definition and disables any future schedules of this job if any. If a scheduled job is currently running, this will not be cancelled.
Examples
>>> import datarobot as dr >>> definition = dr.BatchMonitoringJobDefinition.get('5a8ac9ab07a57a0001be501f') >>> definition.delete()
- Return type
None
Status Check Job¶
- class datarobot.models.StatusCheckJob(job_id, resource_type=None)¶
Tracks asynchronous task status
- Attributes
- job_idstr
The ID of the status the job belongs to.
- wait_for_completion(max_wait=600)¶
Waits for job to complete.
- Parameters
- max_waitint, optional
How long to wait for the job to finish. If the time expires, DataRobot returns the current status.
- Returns
- statusJobStatusResult
Returns the current status of the job.
- Return type
- get_status()¶
Retrieve JobStatusResult object with the latest job status data from the server.
- Return type
- class datarobot.models.JobStatusResult(status: Optional[str], status_id: Optional[str], completed_resource_url: Optional[str])¶
This class represents a result of status check for submitted async jobs.
- property status¶
Alias for field number 0
- property status_id¶
Alias for field number 1
- property completed_resource_url¶
Alias for field number 2
Blueprint¶
- class datarobot.models.Blueprint(id=None, processes=None, model_type=None, project_id=None, blueprint_category=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, recommended_featurelist_id=None, supports_composable_ml=None)¶
A Blueprint which can be used to fit models
- Attributes
- idstr
the id of the blueprint
- processeslist of str
the processes used by the blueprint
- model_typestr
the model produced by the blueprint
- project_idstr
the project the blueprint belongs to
- blueprint_categorystr
(New in version v2.6) Describes the category of the blueprint and the kind of model it produces.
- recommended_featurelist_id: str or null
(New in v2.18) The ID of the feature list recommended for this blueprint. If this field is not present, then there is no recommended feature list.
- supports_composable_mlbool or None
(New in version v2.26) whether this blueprint is supported in the Composable ML.
- classmethod get(project_id, blueprint_id)¶
Retrieve a blueprint.
- Parameters
- project_idstr
The project’s id.
- blueprint_idstr
Id of blueprint to retrieve.
- Returns
- blueprintBlueprint
The queried blueprint.
- Return type
- get_json()¶
Get the blueprint json representation used by this model.
- Returns
- BlueprintJson
Json representation of the blueprint stages.
- Return type
Dict
[str
,Tuple
[List
[str
],List
[str
],str
]]
- get_chart()¶
Retrieve a chart.
- Returns
- BlueprintChart
The current blueprint chart.
- Return type
- get_documents()¶
Get documentation for tasks used in the blueprint.
- Returns
- list of BlueprintTaskDocument
All documents available for blueprint.
- Return type
List
[BlueprintTaskDocument
]
- classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters
- datadict
Correctly snake_cased keys and their values.
- Return type
TypeVar
(T
, bound=APIObject
)
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
TypeVar
(T
, bound=APIObject
)
- class datarobot.models.BlueprintTaskDocument(title=None, task=None, description=None, parameters=None, links=None, references=None)¶
Document describing a task from a blueprint.
- Attributes
- titlestr
Title of document.
- taskstr
Name of the task described in document.
- descriptionstr
Task description.
- parameterslist of dict(name, type, description)
Parameters that task can receive in human-readable format.
- linkslist of dict(name, url)
External links used in document
- referenceslist of dict(name, url)
References used in document. When no link available url equals None.
- class datarobot.models.BlueprintChart(nodes, edges)¶
A Blueprint chart that can be used to understand data flow in blueprint.
- Attributes
- nodeslist of dict (id, label)
Chart nodes, id unique in chart.
- edgeslist of tuple (id1, id2)
Directions of data flow between blueprint chart nodes.
- classmethod get(project_id, blueprint_id)¶
Retrieve a blueprint chart.
- Parameters
- project_idstr
The project’s id.
- blueprint_idstr
Id of blueprint to retrieve chart.
- Returns
- BlueprintChart
The queried blueprint chart.
- Return type
- to_graphviz()¶
Get blueprint chart in graphviz DOT format.
- Returns
- unicode
String representation of chart in graphviz DOT language.
- Return type
str
- class datarobot.models.ModelBlueprintChart(nodes, edges)¶
A Blueprint chart that can be used to understand data flow in model. Model blueprint chart represents reduced repository blueprint chart with only elements that used to build this particular model.
- Attributes
- nodeslist of dict (id, label)
Chart nodes, id unique in chart.
- edgeslist of tuple (id1, id2)
Directions of data flow between blueprint chart nodes.
- classmethod get(project_id, model_id)¶
Retrieve a model blueprint chart.
- Parameters
- project_idstr
The project’s id.
- model_idstr
Id of model to retrieve model blueprint chart.
- Returns
- ModelBlueprintChart
The queried model blueprint chart.
- Return type
- to_graphviz()¶
Get blueprint chart in graphviz DOT format.
- Returns
- unicode
String representation of chart in graphviz DOT language.
- Return type
str
Calendar File¶
- class datarobot.CalendarFile(calendar_end_date=None, calendar_start_date=None, created=None, id=None, name=None, num_event_types=None, num_events=None, project_ids=None, role=None, multiseries_id_columns=None)¶
Represents the data for a calendar file.
For more information about calendar files, see the calendar documentation.
- Attributes
- idstr
The id of the calendar file.
- calendar_start_datestr
The earliest date in the calendar.
- calendar_end_datestr
The last date in the calendar.
- createdstr
The date this calendar was created, i.e. uploaded to DR.
- namestr
The name of the calendar.
- num_event_typesint
The number of different event types.
- num_eventsint
The number of events this calendar has.
- project_idslist of strings
A list containing the projectIds of the projects using this calendar.
- multiseries_id_columns: list of str or None
A list of columns in calendar which uniquely identify events for different series. Currently, only one column is supported. If multiseries id columns are not provided, calendar is considered to be single series.
- rolestr
The access role the user has for this calendar.
- classmethod create(file_path, calendar_name=None, multiseries_id_columns=None)¶
Creates a calendar using the given file. For information about calendar files, see the calendar documentation
The provided file must be a CSV in the format:
Date, Event, Series ID, Event Duration <date>, <event_type>, <series id>, <event duration> <date>, <event_type>, , <event duration>
A header row is required, and the “Series ID” and “Event Duration” columns are optional.
Once the CalendarFile has been created, pass its ID with the
DatetimePartitioningSpecification
when setting the target for a time series project in order to use it.- Parameters
- file_pathstring
A string representing a path to a local csv file.
- calendar_namestring, optional
A name to assign to the calendar. Defaults to the name of the file if not provided.
- multiseries_id_columnslist of str or None
A list of the names of multiseries id columns to define which series an event belongs to. Currently only one multiseries id column is supported.
- Returns
- calendar_fileCalendarFile
Instance with initialized data.
- Raises
- AsyncProcessUnsuccessfulError
Raised if there was an error processing the provided calendar file.
Examples
# Creating a calendar with a specified name cal = dr.CalendarFile.create('/home/calendars/somecalendar.csv', calendar_name='Some Calendar Name') cal.id >>> 5c1d4904211c0a061bc93013 cal.name >>> Some Calendar Name # Creating a calendar without specifying a name cal = dr.CalendarFile.create('/home/calendars/somecalendar.csv') cal.id >>> 5c1d4904211c0a061bc93012 cal.name >>> somecalendar.csv # Creating a calendar with multiseries id columns cal = dr.CalendarFile.create('/home/calendars/somemultiseriescalendar.csv', calendar_name='Some Multiseries Calendar Name', multiseries_id_columns=['series_id']) cal.id >>> 5da9bb21962d746f97e4daee cal.name >>> Some Multiseries Calendar Name cal.multiseries_id_columns >>> ['series_id']
- Return type
- classmethod create_calendar_from_dataset(dataset_id, dataset_version_id=None, calendar_name=None, multiseries_id_columns=None, delete_on_error=False)¶
Creates a calendar using the given dataset. For information about calendar files, see the calendar documentation
The provided dataset have the following format:
Date, Event, Series ID, Event Duration <date>, <event_type>, <series id>, <event duration> <date>, <event_type>, , <event duration>
The “Series ID” and “Event Duration” columns are optional.
Once the CalendarFile has been created, pass its ID with the
DatetimePartitioningSpecification
when setting the target for a time series project in order to use it.- Parameters
- dataset_idstring
The identifier of the dataset from which to create the calendar.
- dataset_version_idstring, optional
The identifier of the dataset version from which to create the calendar.
- calendar_namestring, optional
A name to assign to the calendar. Defaults to the name of the dataset if not provided.
- multiseries_id_columnslist of str, optional
A list of the names of multiseries id columns to define which series an event belongs to. Currently only one multiseries id column is supported.
- delete_on_errorboolean, optional
Whether delete calendar file from Catalog if it’s not valid.
- Returns
- calendar_fileCalendarFile
Instance with initialized data.
- Raises
- AsyncProcessUnsuccessfulError
Raised if there was an error processing the provided calendar file.
Examples
# Creating a calendar from a dataset dataset = dr.Dataset.create_from_file('/home/calendars/somecalendar.csv') cal = dr.CalendarFile.create_calendar_from_dataset( dataset.id, calendar_name='Some Calendar Name' ) cal.id >>> 5c1d4904211c0a061bc93013 cal.name >>> Some Calendar Name # Creating a calendar from a new dataset version new_dataset_version = dr.Dataset.create_version_from_file( dataset.id, '/home/calendars/anothercalendar.csv' ) cal = dr.CalendarFile.create( new_dataset_version.id, dataset_version_id=new_dataset_version.version_id ) cal.id >>> 5c1d4904211c0a061bc93012 cal.name >>> anothercalendar.csv
- Return type
- classmethod create_calendar_from_country_code(country_code, start_date, end_date)¶
Generates a calendar based on the provided country code and dataset start date and end dates. The provided country code should be uppercase and 2-3 characters long. See
CalendarFile.get_allowed_country_codes
for a list of allowed country codes.- Parameters
- country_codestring
The country code for the country to use for generating the calendar.
- start_datedatetime.datetime
The earliest date to include in the generated calendar.
- end_datedatetime.datetime
The latest date to include in the generated calendar.
- Returns
- calendar_fileCalendarFile
Instance with initialized data.
- Return type
- classmethod get_allowed_country_codes(offset=None, limit=None)¶
Retrieves the list of allowed country codes that can be used for generating the preloaded calendars.
- Parameters
- offsetint
Optional, defaults to 0. This many results will be skipped.
- limitint
Optional, defaults to 100, maximum 1000. At most this many results are returned.
- Returns
- list
A list dicts, each of which represents an allowed country codes. Each item has the following structure:
name
: (str) The name of the country.code
: (str) The code for this country. This is the value that should be supplied toCalendarFile.create_calendar_from_country_code
.
- Return type
List
[CountryCode
]
- classmethod get(calendar_id)¶
Gets the details of a calendar, given the id.
- Parameters
- calendar_idstr
The identifier of the calendar.
- Returns
- calendar_fileCalendarFile
The requested calendar.
- Raises
- DataError
Raised if the calendar_id is invalid, i.e. the specified CalendarFile does not exist.
Examples
cal = dr.CalendarFile.get(some_calendar_id) cal.id >>> some_calendar_id
- Return type
- classmethod list(project_id=None, batch_size=None)¶
Gets the details of all calendars this user has view access for.
- Parameters
- project_idstr, optional
If provided, will filter for calendars associated only with the specified project.
- batch_sizeint, optional
The number of calendars to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of calendars. If not specified, an appropriate default will be chosen by the server.
- Returns
- calendar_listlist of
CalendarFile
A list of CalendarFile objects.
- calendar_listlist of
Examples
calendars = dr.CalendarFile.list() len(calendars) >>> 10
- Return type
List
[CalendarFile
]
- classmethod delete(calendar_id)¶
Deletes the calendar specified by calendar_id.
- Parameters
- calendar_idstr
The id of the calendar to delete. The requester must have OWNER access for this calendar.
- Raises
- ClientError
Raised if an invalid calendar_id is provided.
Examples
# Deleting with a valid calendar_id status_code = dr.CalendarFile.delete(some_calendar_id) status_code >>> 204 dr.CalendarFile.get(some_calendar_id) >>> ClientError: Item not found
- Return type
None
- classmethod update_name(calendar_id, new_calendar_name)¶
Changes the name of the specified calendar to the specified name. The requester must have at least READ_WRITE permissions on the calendar.
- Parameters
- calendar_idstr
The id of the calendar to update.
- new_calendar_namestr
The new name to set for the specified calendar.
- Returns
- status_codeint
200 for success
- Raises
- ClientError
Raised if an invalid calendar_id is provided.
Examples
response = dr.CalendarFile.update_name(some_calendar_id, some_new_name) response >>> 200 cal = dr.CalendarFile.get(some_calendar_id) cal.name >>> some_new_name
- Return type
int
Shares the calendar with the specified users, assigning the specified roles.
- Parameters
- calendar_idstr
The id of the calendar to update
- access_list:
A list of dr.SharingAccess objects. Specify None for the role to delete a user’s access from the specified CalendarFile. For more information on specific access levels, see the sharing documentation.
- Returns
- status_codeint
200 for success
- Raises
- ClientError
Raised if unable to update permissions for a user.
- AssertionError
Raised if access_list is invalid.
Examples
# assuming some_user is a valid user, share this calendar with some_user sharing_list = [dr.SharingAccess(some_user_username, dr.enums.SHARING_ROLE.READ_WRITE)] response = dr.CalendarFile.share(some_calendar_id, sharing_list) response.status_code >>> 200 # delete some_user from this calendar, assuming they have access of some kind already delete_sharing_list = [dr.SharingAccess(some_user_username, None)] response = dr.CalendarFile.share(some_calendar_id, delete_sharing_list) response.status_code >>> 200 # Attempt to add an invalid user to a calendar invalid_sharing_list = [dr.SharingAccess(invalid_username, dr.enums.SHARING_ROLE.READ_WRITE)] dr.CalendarFile.share(some_calendar_id, invalid_sharing_list) >>> ClientError: Unable to update access for this calendar
- Return type
int
- classmethod get_access_list(calendar_id, batch_size=None)¶
Retrieve a list of users that have access to this calendar.
- Parameters
- calendar_idstr
The id of the calendar to retrieve the access list for.
- batch_sizeint, optional
The number of access records to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of calendars. If not specified, an appropriate default will be chosen by the server.
- Returns
- access_control_listlist of
SharingAccess
A list of
SharingAccess
objects.
- access_control_listlist of
- Raises
- ClientError
Raised if user does not have access to calendar or calendar does not exist.
- Return type
List
[SharingAccess
]
- class datarobot.models.calendar_file.CountryCode() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
Automated Documentation¶
- class datarobot.models.automated_documentation.AutomatedDocument(entity_id=None, document_type=None, output_format=None, locale=None, template_id=None, id=None, filepath=None, created_at=None)¶
An automated documentation object.
New in version v2.24.
- Attributes
- document_typestr or None
Type of automated document. You can specify:
MODEL_COMPLIANCE
,AUTOPILOT_SUMMARY
depending on your account settings. Required for document generation.- entity_idstr or None
ID of the entity to generate the document for. It can be model ID or project ID. Required for document generation.
- output_formatstr or None
Format of the generate document, either
docx
orhtml
. Required for document generation.- localestr or None
Localization of the document, dependent on your account settings. Default setting is
EN_US
.- template_idstr or None
Template ID to use for the document outline. Defaults to standard DataRobot template. See the documentation for
ComplianceDocTemplate
for more information.- idstr or None
ID of the document. Required to download or delete a document.
- filepathstr or None
Path to save a downloaded document to. Either include a file path and name or the file will be saved to the directory from which the script is launched.
- created_atdatetime or None
Document creation timestamp.
- classmethod list_available_document_types()¶
Get a list of all available document types and locales.
- Returns
- List of dicts
Examples
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) doc_types = dr.AutomatedDocument.list_available_document_types()
- Return type
List
[DocumentOption
]
- property is_model_compliance_initialized: Tuple[bool, str]¶
Check if model compliance documentation pre-processing is initialized. Model compliance documentation pre-processing must be initialized before generating documentation for a custom model.
- Returns
- Tuple of (boolean, string)
boolean flag is whether model compliance documentation pre-processing is initialized
string value is the initialization status
- Return type
Tuple
[bool
,str
]
- initialize_model_compliance()¶
Initialize model compliance documentation pre-processing. Must be called before generating documentation for a custom model.
- Returns
- Tuple of (boolean, string)
boolean flag is whether model compliance documentation pre-processing is initialized
string value is the initialization status
Examples
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) # NOTE: entity_id is either a model id or a model package id doc = dr.AutomatedDocument( document_type="MODEL_COMPLIANCE", entity_id="6f50cdb77cc4f8d1560c3ed5", output_format="docx", locale="EN_US") doc.initialize_model_compliance()
- Return type
Tuple
[bool
,str
]
- generate(max_wait=600)¶
Request generation of an automated document.
Required attributes to request document generation:
document_type
,entity_id
, andoutput_format
.- Returns
requests.models.Response
Examples
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) doc = dr.AutomatedDocument( document_type="MODEL_COMPLIANCE", entity_id="6f50cdb77cc4f8d1560c3ed5", output_format="docx", locale="EN_US", template_id="50efc9db8aff6c81a374aeec", filepath="/Users/username/Documents/example.docx" ) doc.generate() doc.download()
- Return type
Response
- download()¶
Download a generated Automated Document. Document ID is required to download a file.
- Returns
requests.models.Response
Examples
Generating and downloading the generated document:
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) doc = dr.AutomatedDocument( document_type="AUTOPILOT_SUMMARY", entity_id="6050d07d9da9053ebb002ef7", output_format="docx", filepath="/Users/username/Documents/Project_Report_1.docx" ) doc.generate() doc.download()
Downloading an earlier generated document when you know the document ID:
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) doc = dr.AutomatedDocument(id='5e8b6a34d2426053ab9a39ed') doc.download()
Notice that
filepath
was not set for this document. In this case, the file is saved to the directory from which the script was launched.Downloading a document chosen from a list of earlier generated documents:
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) model_id = "6f5ed3de855962e0a72a96fe" docs = dr.AutomatedDocument.list_generated_documents(entity_ids=[model_id]) doc = docs[0] doc.filepath = "/Users/me/Desktop/Recommended_model_doc.docx" doc.download()
- Return type
Response
- delete()¶
Delete a document using its ID.
- Returns
requests.models.Response
Examples
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) doc = dr.AutomatedDocument(id="5e8b6a34d2426053ab9a39ed") doc.delete()
If you don’t know the document ID, you can follow the same workflow to get the ID as in the examples for the
AutomatedDocument.download
method.- Return type
Response
- classmethod list_generated_documents(document_types=None, entity_ids=None, output_formats=None, locales=None, offset=None, limit=None)¶
Get information about all previously generated documents available for your account. The information includes document ID and type, ID of the entity it was generated for, time of creation, and other information.
- Parameters
- document_typesList of str or None
Query for one or more document types.
- entity_idsList of str or None
Query generated documents by one or more entity IDs.
- output_formatsList of str or None
Query for one or more output formats.
- localesList of str or None
Query generated documents by one or more locales.
- offset: int or None
Number of items to skip. Defaults to 0 if not provided.
- limit: int or None
Number of items to return, maximum number of items is 1000.
- Returns
- List of AutomatedDocument objects, where each object contains attributes described in
AutomatedDocument
Examples
To get a list of all generated documents:
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) docs = AutomatedDocument.list_generated_documents()
To get a list of all
AUTOPILOT_SUMMARY
documents:import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) docs = AutomatedDocument.list_generated_documents(document_types=["AUTOPILOT_SUMMARY"])
To get a list of 5 recently created automated documents in
html
format:import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) docs = AutomatedDocument.list_generated_documents(output_formats=["html"], limit=5)
To get a list of automated documents created for specific entities (projects or models):
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) docs = AutomatedDocument.list_generated_documents( entity_ids=["6051d3dbef875eb3be1be036", "6051d3e1fbe65cd7a5f6fde6", "6051d3e7f86c04486c2f9584"] )
Note, that the list of results contains
AutomatedDocument
objects, which means that you can execute class-related methods on them. Here’s how you can list, download, and then delete from the server all automated documents related to a certain entity:import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) ids = ["6051d3dbef875eb3be1be036", "5fe1d3d55cd810ebdb60c517f"] docs = AutomatedDocument.list_generated_documents(entity_ids=ids) for doc in docs: doc.download() doc.delete()
- Return type
List
[AutomatedDocument
]
- class datarobot.models.automated_documentation.DocumentOption() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
Class Mapping Aggregation Settings¶
For multiclass projects with a lot of unique values in target column you can specify the parameters for aggregation of rare values to improve the modeling performance and decrease the runtime and resource usage of resulting models.
- class datarobot.helpers.ClassMappingAggregationSettings(max_unaggregated_class_values=None, min_class_support=None, excluded_from_aggregation=None, aggregation_class_name=None)¶
Class mapping aggregation settings. For multiclass projects allows fine control over which target values will be preserved as classes. Classes which aren’t preserved will be - aggregated into a single “catch everything else” class in case of multiclass - or will be ignored in case of multilabel. All attributes are optional, if not specified - server side defaults will be used.
- Attributes
- max_unaggregated_class_valuesint, optional
Maximum amount of unique values allowed before aggregation kicks in.
- min_class_supportint, optional
Minimum number of instances necessary for each target value in the dataset. All values with less instances will be aggregated.
- excluded_from_aggregationlist, optional
List of target values that should be guaranteed to kept as is, regardless of other settings.
- aggregation_class_namestr, optional
If some of the values will be aggregated - this is the name of the aggregation class that will replace them.
Client Configuration¶
- datarobot.client.Client(token=None, endpoint=None, config_path=None, connect_timeout=None, user_agent_suffix=None, ssl_verify=True, max_retries=None, token_type='Token', default_use_case=None, enable_api_consumer_tracking=None, trace_context=None)¶
Configures the global API client for the Python SDK. The client will be configured in one of the following ways, in order of priority.
From call args iff
token
andendpoint
kwargs are specified;From a YAML file at the path specified in the
config_path
kwarg;From a YAML file at the path specified in the env var
DATAROBOT_CONFIG_FILE
;From env vars, iff
DATAROBOT_ENDPOINT
andDATAROBOT_API_TOKEN
are specified;From a YAML file at the path $HOME/.config/datarobot/drconfig.yaml.
Note
All client configuration must be done via a single method; there is no fall back to lower priority methods.
This can also have the side effect of setting a default Use Case for client API requests.
- Parameters
- tokenstr, optional
API token
- endpointstr, optional
Base url of API
- config_pathstr, optional
Alternate location of config file
- connect_timeoutint, optional
How long the client should be willing to wait before establishing a connection with the server.
- user_agent_suffixstr, optional
Additional text that is appended to the User-Agent HTTP header when communicating with the DataRobot REST API. This can be useful for identifying different applications that are built on top of the DataRobot Python Client, which can aid debugging and help track usage.
- ssl_verifybool or str, optional
Whether to check SSL certificate. Could be set to path with certificates of trusted certification authorities.
- max_retriesint or datarobot.rest.Retry, optional
Either an integer number of times to retry connection errors, or a urllib3.util.retry.Retry object to configure retries.
- token_type: str, “Token” by default
Authentication token type: Token, Bearer. “Bearer” is for DataRobot OAuth2 token, “Token” for token generated in Developer Tools.
- default_use_case: str, optional
The entity ID of the default Use Case to use with any requests made by the client.
- enable_api_consumer_tracking: bool, optional
Enable and disable user metrics tracking within the datarobot module. Default: False.
- trace_context: str, optional
An ID or other string for identifying which code template or AI Accelerator was used to make a request.
- Returns
- ——-
The
RESTClientObject
instance created.
- Return type
- datarobot.client.get_client()¶
Returns the global HTTP client for the Python SDK, instantiating it if necessary.
- Return type
- datarobot.client.set_client(client)¶
Configure the global HTTP client for the Python SDK. Returns previous instance.
- Return type
Optional
[RESTClientObject
]
- datarobot.client.client_configuration(*args, **kwargs)¶
This context manager can be used to temporarily change the global HTTP client.
In multithreaded scenarios, it is highly recommended to use a fresh manager object per thread.
DataRobot does not recommend nesting these contexts.
- Parameters
- argsParameters passed to
datarobot.client.Client()
- kwargsKeyword arguments passed to
datarobot.client.Client()
- argsParameters passed to
Examples
from datarobot.client import client_configuration from datarobot.models import Project with client_configuration(token="api-key-here", endpoint="https://host-name.com"): Project.list()
from datarobot.client import Client, client_configuration from datarobot.models import Project Client() # Interact with DataRobot using the default configuration. Project.list() with client_configuration(config_path="/path/to/a/drconfig.yaml"): # Interact with DataRobot using a different configuration. Project.list()
- class datarobot.rest.RESTClientObject(auth, endpoint, connect_timeout=6.05, verify=True, user_agent_suffix=None, max_retries=None, authentication_type=None)¶
- Parameters
- connect_timeout
timeout for http request and connection
- headers
headers for outgoing requests
- open_in_browser()¶
Opens the DataRobot app in a web browser, or logs the URL if a browser is not available.
- Return type
None
Clustering¶
- class datarobot.models.ClusteringModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, is_n_clusters_dynamically_determined=None, blueprint_id=None, metrics=None, project=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, n_clusters=None, has_empty_clusters=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, use_project_settings=None, supports_composable_ml=None)¶
ClusteringModel extends
Model
class. It provides provides properties and methods specific to clustering projects.- compute_insights(max_wait=600)¶
Compute and retrieve cluster insights for model. This method awaits completion of job computing cluster insights and returns results after it is finished. If computation takes longer than specified
max_wait
exception will be raised.- Parameters
- project_id: str
Project to start creation in.
- model_id: str
Project’s model to start creation in.
- max_wait: int
Maximum number of seconds to wait before giving up
- Returns
- List of ClusterInsight
- Raises
- ClientError
Server rejected creation due to client error. Most likely cause is bad
project_id
ormodel_id
.- AsyncFailureError
If any of the responses from the server are unexpected
- AsyncProcessUnsuccessfulError
If the cluster insights computation has failed or was cancelled.
- AsyncTimeoutError
If the cluster insights computation did not resolve in time
- Return type
List
[ClusterInsight
]
- property insights: List[datarobot.models.cluster_insight.ClusterInsight]¶
Return actual list of cluster insights if already computed.
- Returns
- List of ClusterInsight
- Return type
List
[ClusterInsight
]
- property clusters: List[datarobot.models.cluster.Cluster]¶
Return actual list of Clusters.
- Returns
- List of Cluster
- Return type
List
[Cluster
]
- update_cluster_names(cluster_name_mappings)¶
Change many cluster names at once based on list of name mappings.
- Parameters
- cluster_name_mappings: List of tuples
Cluster names mapping consisting of current cluster name and old cluster name. Example:
cluster_name_mappings = [ ("current cluster name 1", "new cluster name 1"), ("current cluster name 2", "new cluster name 2")]
- Returns
- List of Cluster
- Raises
- datarobot.errors.ClientError
Server rejected update of cluster names. Possible reasons include: incorrect format of mapping, mapping introduces duplicates.
- Return type
List
[Cluster
]
- update_cluster_name(current_name, new_name)¶
Change cluster name from current_name to new_name.
- Parameters
- current_name: str
Current cluster name.
- new_name: str
New cluster name.
- Returns
- List of Cluster
- Raises
- datarobot.errors.ClientError
Server rejected update of cluster names.
- Return type
List
[Cluster
]
- class datarobot.models.cluster.Cluster(**kwargs)¶
Representation of a single cluster.
- Attributes
- name: str
Current cluster name
- percent: float
Percent of data contained in the cluster. This value is reported after cluster insights are computed for the model.
- classmethod list(project_id, model_id)¶
Retrieve a list of clusters in the model.
- Parameters
- project_id: str
ID of the project that the model is part of.
- model_id: str
ID of the model.
- Returns
- List of clusters
- Return type
List
[Cluster
]
- classmethod update_multiple_names(project_id, model_id, cluster_name_mappings)¶
Update many clusters at once based on list of name mappings.
- Parameters
- project_id: str
ID of the project that the model is part of.
- model_id: str
ID of the model.
- cluster_name_mappings: List of tuples
Cluster name mappings, consisting of current and previous names for each cluster. Example:
cluster_name_mappings = [ ("current cluster name 1", "new cluster name 1"), ("current cluster name 2", "new cluster name 2")]
- Returns
- List of clusters
- Raises
- datarobot.errors.ClientError
Server rejected update of cluster names.
- ValueError
Invalid cluster name mapping provided.
- Return type
List
[Cluster
]
- classmethod update_name(project_id, model_id, current_name, new_name)¶
Change cluster name from current_name to new_name
- Parameters
- project_id: str
ID of the project that the model is part of.
- model_id: str
ID of the model.
- current_name: str
Current cluster name
- new_name: str
New cluster name
- Returns
- List of Cluster
- Return type
List
[Cluster
]
- class datarobot.models.cluster_insight.ClusterInsight(**kwargs)¶
Holds data on all insights related to feature as well as breakdown per cluster.
- Parameters
- feature_name: str
Name of a feature from the dataset.
- feature_type: str
Type of feature.
- insightsList of classes (ClusterInsight)
List provides information regarding the importance of a specific feature in relation to each cluster. Results help understand how the model is grouping data and what each cluster represents.
- feature_impact: float
Impact of a feature ranging from 0 to 1.
- classmethod compute(project_id, model_id, max_wait=600)¶
Starts creation of cluster insights for the model and if successful, returns computed ClusterInsights. This method allows calculation to continue for a specified time and if not complete, cancels the request.
- Parameters
- project_id: str
ID of the project to begin creation of cluster insights for.
- model_id: str
ID of the project model to begin creation of cluster insights for.
- max_wait: int
Maximum number of seconds to wait canceling the request.
- Returns
- List[ClusterInsight]
- Raises
- ClientError
Server rejected creation due to client error. Most likely cause is bad
project_id
ormodel_id
.- AsyncFailureError
Indicates whether any of the responses from the server are unexpected.
- AsyncProcessUnsuccessfulError
Indicates whether the cluster insights computation failed or was cancelled.
- AsyncTimeoutError
Indicates whether the cluster insights computation did not resolve within the specified time limit (max_wait).
- Return type
List
[ClusterInsight
]
Compliance Documentation Templates¶
- class datarobot.models.compliance_doc_template.ComplianceDocTemplate(id, creator_id, creator_username, name, org_id=None, sections=None)¶
A compliance documentation template. Templates are used to customize contents of
AutomatedDocument
.New in version v2.14.
Notes
Each
section
dictionary has the following schema:title
: title of the sectiontype
: type of section. Must be one of “datarobot”, “user” or “table_of_contents”.
Each type of section has a different set of attributes described bellow.
Section of type
"datarobot"
represent a section owned by DataRobot. DataRobot sections have the following additional attributes:content_id
: The identifier of the content in this section. You can get the default template withget_default
for a complete list of possible DataRobot section content ids.sections
: list of sub-section dicts nested under the parent section.
Section of type
"user"
represent a section with user-defined content. Those sections may contain text generated by user and have the following additional fields:regularText
: regular text of the section, optionally separated by\n
to split paragraphs.highlightedText
: highlighted text of the section, optionally separated by\n
to split paragraphs.sections
: list of sub-section dicts nested under the parent section.
Section of type
"table_of_contents"
represent a table of contents and has no additional attributes.- Attributes
- idstr
the id of the template
- namestr
the name of the template.
- creator_idstr
the id of the user who created the template
- creator_usernamestr
username of the user who created the template
- org_idstr
the id of the organization the template belongs to
- sectionslist of dicts
the sections of the template describing the structure of the document. Section schema is described in Notes section above.
- classmethod get_default(template_type=None)¶
Get a default DataRobot template. This template is used for generating compliance documentation when no template is specified.
- Parameters
- template_typestr or None
Type of the template. Currently supported values are “normal” and “time_series”
- Returns
- templateComplianceDocTemplate
the default template object with
sections
attribute populated with default sections.
- Return type
- classmethod create_from_json_file(name, path)¶
Create a template with the specified name and sections in a JSON file.
This is useful when working with sections in a JSON file. Example:
default_template = ComplianceDocTemplate.get_default() default_template.sections_to_json_file('path/to/example.json') # ... edit example.json in your editor my_template = ComplianceDocTemplate.create_from_json_file( name='my template', path='path/to/example.json' )
- Parameters
- namestr
the name of the template. Must be unique for your user.
- pathstr
the path to find the JSON file at
- Returns
- templateComplianceDocTemplate
the created template
- Return type
- classmethod create(name, sections)¶
Create a template with the specified name and sections.
- Parameters
- namestr
the name of the template. Must be unique for your user.
- sectionslist
list of section objects
- Returns
- templateComplianceDocTemplate
the created template
- Return type
- classmethod get(template_id)¶
Retrieve a specific template.
- Parameters
- template_idstr
the id of the template to retrieve
- Returns
- templateComplianceDocTemplate
the retrieved template
- Return type
- classmethod list(name_part=None, limit=None, offset=None)¶
Get a paginated list of compliance documentation template objects.
- Parameters
- name_partstr or None
Return only the templates with names matching specified string. The matching is case-insensitive.
- limitint
The number of records to return. The server will use a (possibly finite) default if not specified.
- offsetint
The number of records to skip.
- Returns
- templateslist of ComplianceDocTemplate
the list of template objects
- Return type
List
[ComplianceDocTemplate
]
- sections_to_json_file(path, indent=2)¶
Save sections of the template to a json file at the specified path
- Parameters
- pathstr
the path to save the file to
- indentint
indentation to use in the json file.
- Return type
None
- update(name=None, sections=None)¶
Update the name or sections of an existing doc template.
Note that default or non-existent templates can not be updated.
- Parameters
- namestr, optional
the new name for the template
- sectionslist of dicts
list of sections
- Return type
None
- delete()¶
Delete the compliance documentation template.
- Return type
None
Confusion Chart¶
- class datarobot.models.confusion_chart.ConfusionChart(source, data, source_model_id)¶
Confusion Chart data for model.
Notes
ClassMetrics
is a dict containing the following:class_name
(string) name of the classactual_count
(int) number of times this class is seen in the validation datapredicted_count
(int) number of times this class has been predicted for the validation dataf1
(float) F1 scorerecall
(float) recall scoreprecision
(float) precision scorewas_actual_percentages
(list of dict) one vs all actual percentages in format specified below.other_class_name
(string) the name of the other classpercentage
(float) the percentage of the times this class was predicted when is was actually class (from 0 to 1)
was_predicted_percentages
(list of dict) one vs all predicted percentages in format specified below.other_class_name
(string) the name of the other classpercentage
(float) the percentage of the times this class was actual predicted (from 0 to 1)
confusion_matrix_one_vs_all
(list of list) 2d list representing 2x2 one vs all matrix.This represents the True/False Negative/Positive rates as integer for each class. The data structure looks like:
[ [ True Negative, False Positive ], [ False Negative, True Positive ] ]
- Attributes
- sourcestr
Confusion Chart data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.
- raw_datadict
All of the raw data for the Confusion Chart
- confusion_matrixlist of list
The NxN confusion matrix
- classeslist
The names of each of the classes
- class_metricslist of dicts
List of dicts with schema described as
ClassMetrics
above.- source_model_idstr
ID of the model this Confusion chart represents; in some cases, insights from the parent of a frozen model may be used
Credentials¶
- class datarobot.models.Credential(credential_id=None, name=None, credential_type=None, creation_date=None, description=None)¶
- classmethod list()¶
Returns list of available credentials.
- Returns
- credentialslist of Credential instances
contains a list of available credentials.
Examples
>>> import datarobot as dr >>> data_sources = dr.Credential.list() >>> data_sources [ Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'), Credential('5e42cc4dcf8a5f3256865840', 'my_jdbc_cred', 'jdbc'), ]
- Return type
List
[Credential
]
- classmethod get(credential_id)¶
Gets the Credential.
- Parameters
- credential_idstr
the identifier of the credential.
- Returns
- credentialCredential
the requested credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.get('5a8ac9ab07a57a0001be501f') >>> cred Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'),
- Return type
- delete()¶
Deletes the Credential the store.
- Parameters
- credential_idstr
the identifier of the credential.
- Returns
- credentialCredential
the requested credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.get('5a8ac9ab07a57a0001be501f') >>> cred.delete()
- Return type
None
- classmethod create_basic(name, user, password, description=None)¶
Creates the credentials.
- Parameters
- namestr
the name to use for this set of credentials.
- userstr
the username to store for this set of credentials.
- passwordstr
the password to store for this set of credentials.
- descriptionstr, optional
the description to use for this set of credentials.
- Returns
- credentialCredential
the created credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.create_basic( ... name='my_basic_cred', ... user='username', ... password='password', ... ) >>> cred Credential('5e429d6ecf8a5f36c5693e03', 'my_basic_cred', 'basic'),
- Return type
- classmethod create_oauth(name, token, refresh_token, description=None)¶
Creates the OAUTH credentials.
- Parameters
- namestr
the name to use for this set of credentials.
- token: str
the OAUTH token
- refresh_token: str
The OAUTH token
- descriptionstr, optional
the description to use for this set of credentials.
- Returns
- credentialCredential
the created credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.create_oauth( ... name='my_oauth_cred', ... token='XXX', ... refresh_token='YYY', ... ) >>> cred Credential('5e429d6ecf8a5f36c5693e03', 'my_oauth_cred', 'oauth'),
- Return type
- classmethod create_s3(name, aws_access_key_id=None, aws_secret_access_key=None, aws_session_token=None, description=None)¶
Creates the S3 credentials.
- Parameters
- namestr
the name to use for this set of credentials.
- aws_access_key_idstr, optional
the AWS access key id.
- aws_secret_access_keystr, optional
the AWS secret access key.
- aws_session_tokenstr, optional
the AWS session token.
- descriptionstr, optional
the description to use for this set of credentials.
- Returns
- credentialCredential
the created credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.create_s3( ... name='my_s3_cred', ... aws_access_key_id='XXX', ... aws_secret_access_key='YYY', ... aws_session_token='ZZZ', ... ) >>> cred Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'),
- Return type
- classmethod create_azure(name, azure_connection_string, description=None)¶
Creates the Azure storage credentials.
- Parameters
- namestr
the name to use for this set of credentials.
- azure_connection_stringstr
the Azure connection string.
- descriptionstr, optional
the description to use for this set of credentials.
- Returns
- credentialCredential
the created credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.create_azure( ... name='my_azure_cred', ... azure_connection_string='XXX', ... ) >>> cred Credential('5e429d6ecf8a5f36c5693e03', 'my_azure_cred', 'azure'),
- Return type
- classmethod create_gcp(name, gcp_key=None, description=None)¶
Creates the GCP credentials.
- Parameters
- namestr
the name to use for this set of credentials.
- gcp_keystr | dict
the GCP key in json format or parsed as dict.
- descriptionstr, optional
the description to use for this set of credentials.
- Returns
- credentialCredential
the created credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.create_gcp( ... name='my_gcp_cred', ... gcp_key='XXX', ... ) >>> cred Credential('5e429d6ecf8a5f36c5693e03', 'my_gcp_cred', 'gcp'),
- Return type
- update(name=None, description=None, **kwargs)¶
Update the credential values of an existing credential. Updates this object in place.
New in version v3.2.
- Parameters
- namestr
The name to use for this set of credentials.
- descriptionstr, optional
The description to use for this set of credentials; if omitted, and name is not omitted, then it clears any previous description for that name.
- kwargsKeyword arguments specific to the given credential_type that should be updated.
- Return type
None
Custom Models¶
- class datarobot.models.custom_model_version.CustomModelFileItem(id, file_name, file_path, file_source, created_at=None)¶
A file item attached to a DataRobot custom model version.
New in version v2.21.
- Attributes
- id: str
The ID of the file item.
- file_name: str
The name of the file item.
- file_path: str
The path of the file item.
- file_source: str
The source of the file item.
- created_at: str, optional
ISO-8601 formatted timestamp of when the version was created.
- class datarobot.CustomInferenceModel(**kwargs)¶
A custom inference model.
New in version v2.21.
- Attributes
- id: str
The ID of the custom model.
- name: str
The name of the custom model.
- language: str
The programming language of the custom inference model. Can be “python”, “r”, “java” or “other”.
- description: str
The description of the custom inference model.
- target_type: datarobot.TARGET_TYPE
Target type of the custom inference model. Values: [datarobot.TARGET_TYPE.BINARY, datarobot.TARGET_TYPE.REGRESSION, datarobot.TARGET_TYPE.MULTICLASS, datarobot.TARGET_TYPE.UNSTRUCTURED, datarobot.TARGET_TYPE.ANOMALY]
- target_name: str, optional
Target feature name. It is optional(ignored if provided) for datarobot.TARGET_TYPE.UNSTRUCTURED or datarobot.TARGET_TYPE.ANOMALY target type.
- latest_version: datarobot.CustomModelVersion or None
The latest version of the custom model if the model has a latest version.
- deployments_count: int
Number of a deployments of the custom models.
- target_name: str
The custom model target name.
- positive_class_label: str
For binary classification projects, a label of a positive class.
- negative_class_label: str
For binary classification projects, a label of a negative class.
- prediction_threshold: float
For binary classification projects, a threshold used for predictions.
- training_data_assignment_in_progress: bool
Flag describing if training data assignment is in progress.
- training_dataset_id: str, optional
The ID of a dataset assigned to the custom model.
- training_dataset_version_id: str, optional
The ID of a dataset version assigned to the custom model.
- training_data_file_name: str, optional
The name of assigned training data file.
- training_data_partition_column: str, optional
The name of a partition column in a training dataset assigned to the custom model.
- created_by: str
The username of a user who created the custom model.
- updated_at: str
ISO-8601 formatted timestamp of when the custom model was updated
- created_at: str
ISO-8601 formatted timestamp of when the custom model was created
- network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional
Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS, datarobot.NETWORK_EGRESS_POLICY.PUBLIC]. Note: datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS value is only supported by the SaaS (cloud) environment.
- maximum_memory: int, optional
The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s.
- replicas: int, optional
A fixed number of replicas that will be deployed in the cluster
- is_training_data_for_versions_permanently_enabled: bool, optional
Whether training data assignment on the version level is permanently enabled for the model.
- classmethod list(is_deployed=None, search_for=None, order_by=None)¶
List custom inference models available to the user.
New in version v2.21.
- Parameters
- is_deployed: bool, optional
Flag for filtering custom inference models. If set to True, only deployed custom inference models are returned. If set to False, only not deployed custom inference models are returned.
- search_for: str, optional
String for filtering custom inference models - only custom inference models that contain the string in name or description will be returned. If not specified, all custom models will be returned
- order_by: str, optional
Property to sort custom inference models by. Supported properties are “created” and “updated”. Prefix the attribute name with a dash to sort in descending order, e.g. order_by=’-created’. By default, the order_by parameter is None which will result in custom models being returned in order of creation time descending.
- Returns
- List[CustomInferenceModel]
A list of custom inference models.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status
- datarobot.errors.ServerError
If the server responded with 5xx status
- Return type
List
[CustomInferenceModel
]
- classmethod get(custom_model_id)¶
Get custom inference model by id.
New in version v2.21.
- Parameters
- custom_model_id: str
The ID of the custom inference model.
- Returns
- CustomInferenceModel
Retrieved custom inference model.
- Raises
- datarobot.errors.ClientError
The ID the server responded with 4xx status.
- datarobot.errors.ServerError
The ID the server responded with 5xx status.
- Return type
- download_latest_version(file_path)¶
Download the latest custom inference model version.
New in version v2.21.
- Parameters
- file_path: str
Path to create a file with custom model version content.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- classmethod create(name, target_type, target_name=None, language=None, description=None, positive_class_label=None, negative_class_label=None, prediction_threshold=None, class_labels=None, class_labels_file=None, network_egress_policy=None, maximum_memory=None, replicas=None, is_training_data_for_versions_permanently_enabled=None)¶
Create a custom inference model.
New in version v2.21.
- Parameters
- name: str
Name of the custom inference model.
- target_type: datarobot.TARGET_TYPE
Target type of the custom inference model. Values: [datarobot.TARGET_TYPE.BINARY, datarobot.TARGET_TYPE.REGRESSION, datarobot.TARGET_TYPE.MULTICLASS, datarobot.TARGET_TYPE.UNSTRUCTURED]
- target_name: str, optional
Target feature name. It is optional(ignored if provided) for datarobot.TARGET_TYPE.UNSTRUCTURED target type.
- language: str, optional
Programming language of the custom learning model.
- description: str, optional
Description of the custom learning model.
- positive_class_label: str, optional
Custom inference model positive class label for binary classification.
- negative_class_label: str, optional
Custom inference model negative class label for binary classification.
- prediction_threshold: float, optional
Custom inference model prediction threshold.
- class_labels: List[str], optional
Custom inference model class labels for multiclass classification. Cannot be used with class_labels_file.
- class_labels_file: str, optional
Path to file containing newline separated class labels for multiclass classification. Cannot be used with class_labels.
- network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional
Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS, datarobot.NETWORK_EGRESS_POLICY.PUBLIC] Note: datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS value is only supported by the SaaS (cloud) environment.
- maximum_memory: int, optional
The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s.
- replicas: int, optional
A fixed number of replicas that will be deployed in the cluster.
- is_training_data_for_versions_permanently_enabled: bool, optional
Permanently enable training data assignment on the version level for the current model, instead of training data assignment on the model level.
- Returns
- CustomInferenceModel
Created a custom inference model.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
- classmethod copy_custom_model(custom_model_id)¶
Create a custom inference model by copying existing one.
New in version v2.21.
- Parameters
- custom_model_id: str
The ID of the custom inference model to copy.
- Returns
- CustomInferenceModel
Created a custom inference model.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
- update(name=None, language=None, description=None, target_name=None, positive_class_label=None, negative_class_label=None, prediction_threshold=None, class_labels=None, class_labels_file=None, is_training_data_for_versions_permanently_enabled=None)¶
Update custom inference model properties.
New in version v2.21.
- Parameters
- name: str, optional
New custom inference model name.
- language: str, optional
New custom inference model programming language.
- description: str, optional
New custom inference model description.
- target_name: str, optional
New custom inference model target name.
- positive_class_label: str, optional
New custom inference model positive class label.
- negative_class_label: str, optional
New custom inference model negative class label.
- prediction_threshold: float, optional
New custom inference model prediction threshold.
- class_labels: List[str], optional
custom inference model class labels for multiclass classification Cannot be used with class_labels_file
- class_labels_file: str, optional
Path to file containing newline separated class labels for multiclass classification. Cannot be used with class_labels
- is_training_data_for_versions_permanently_enabled: bool, optional
Permanently enable training data assignment on the version level for the current model, instead of training data assignment on the model level.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- refresh()¶
Update custom inference model with the latest data from server.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- delete()¶
Delete custom inference model.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- assign_training_data(dataset_id, partition_column=None, max_wait=600)¶
Assign training data to the custom inference model.
New in version v2.21.
- Parameters
- dataset_id: str
The id of the training dataset to be assigned.
- partition_column: str, optional
Name of a partition column in the training dataset.
- max_wait: int, optional
Max time to wait for a training data assignment. If set to None - method will return without waiting. Defaults to 10 min.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status
- datarobot.errors.ServerError
If the server responded with 5xx status
- Return type
None
- class datarobot.CustomModelTest(**kwargs)¶
An custom model test.
New in version v2.21.
- Attributes
- id: str
test id
- custom_model_image_id: str
id of a custom model image
- image_type: str
the type of the image, either CUSTOM_MODEL_IMAGE_TYPE.CUSTOM_MODEL_IMAGE if the testing attempt is using a CustomModelImage as its model or CUSTOM_MODEL_IMAGE_TYPE.CUSTOM_MODEL_VERSION if the testing attempt is using a CustomModelVersion with dependency management
- overall_status: str
a string representing testing status. Status can be - ‘not_tested’: the check not run - ‘failed’: the check failed - ‘succeeded’: the check succeeded - ‘warning’: the check resulted in a warning, or in non-critical failure - ‘in_progress’: the check is in progress
- detailed_status: dict
detailed testing status - maps the testing types to their status and message. The keys of the dict are one of ‘errorCheck’, ‘nullValueImputation’, ‘longRunningService’, ‘sideEffects’. The values are dict with ‘message’ and ‘status’ keys.
- created_by: str
a user who created a test
- dataset_id: str, optional
id of a dataset used for testing
- dataset_version_id: str, optional
id of a dataset version used for testing
- completed_at: str, optional
ISO-8601 formatted timestamp of when the test has completed
- created_at: str, optional
ISO-8601 formatted timestamp of when the version was created
- network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional
Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS, datarobot.NETWORK_EGRESS_POLICY.PUBLIC]. Note: datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS value is only supported by the SaaS (cloud) environment.
- maximum_memory: int, optional
The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s
- replicas: int, optional
A fixed number of replicas that will be deployed in the cluster
- classmethod create(custom_model_id, custom_model_version_id, dataset_id=None, max_wait=600, network_egress_policy=None, maximum_memory=None, replicas=None)¶
Create and start a custom model test.
New in version v2.21.
- Parameters
- custom_model_id: str
the id of the custom model
- custom_model_version_id: str
the id of the custom model version
- dataset_id: str, optional
The id of the testing dataset for non-unstructured custom models. Ignored and not required for unstructured models.
- max_wait: int, optional
max time to wait for a test completion. If set to None - method will return without waiting.
- network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional
Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS, datarobot.NETWORK_EGRESS_POLICY.PUBLIC]. Note: datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS value is only supported by the SaaS (cloud) environment.
- maximum_memory: int, optional
The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s
- replicas: int, optional
A fixed number of replicas that will be deployed in the cluster
- Returns
- CustomModelTest
created custom model test
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- classmethod list(custom_model_id)¶
List custom model tests.
New in version v2.21.
- Parameters
- custom_model_id: str
the id of the custom model
- Returns
- List[CustomModelTest]
a list of custom model tests
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- classmethod get(custom_model_test_id)¶
Get custom model test by id.
New in version v2.21.
- Parameters
- custom_model_test_id: str
the id of the custom model test
- Returns
- CustomModelTest
retrieved custom model test
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- get_log()¶
Get log of a custom model test.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- get_log_tail()¶
Get log tail of a custom model test.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- cancel()¶
Cancel custom model test that is in progress.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- refresh()¶
Update custom model test with the latest data from server.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- class datarobot.CustomModelVersion(**kwargs)¶
A version of a DataRobot custom model.
New in version v2.21.
- Attributes
- id: str
The ID of the custom model version.
- custom_model_id: str
The ID of the custom model.
- version_minor: int
A minor version number of the custom model version.
- version_major: int
A major version number of the custom model version.
- is_frozen: bool
A flag if the custom model version is frozen.
- items: List[CustomModelFileItem]
A list of file items attached to the custom model version.
- base_environment_id: str
The ID of the environment to use with the model.
- base_environment_version_id: str
The ID of the environment version to use with the model.
- label: str, optional
A short human readable string to label the version.
- description: str, optional
The custom model version description.
- created_at: str, optional
ISO-8601 formatted timestamp of when the version was created.
- dependencies: List[CustomDependency]
The parsed dependencies of the custom model version if the version has a valid requirements.txt file.
- network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional
Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS, datarobot.NETWORK_EGRESS_POLICY.PUBLIC]. Note: datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS value is only supported by the SaaS (cloud) environment.
- maximum_memory: int, optional
The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s.
- replicas: int, optional
A fixed number of replicas that will be deployed in the cluster.
- required_metadata_values: List[RequiredMetadataValue]
Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
- training_data: TrainingData, optional
The information about the training data assigned to the model version.
- holdout_data: HoldoutData, optional
The information about the holdout data assigned to the model version.
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
- classmethod create_clean(custom_model_id, base_environment_id, is_major_update=True, folder_path=None, files=None, network_egress_policy=None, maximum_memory=None, replicas=None, required_metadata_values=None, training_dataset_id=None, partition_column=None, holdout_dataset_id=None, keep_training_holdout_data=None, max_wait=600)¶
Create a custom model version without files from previous versions.
Create a version with training or holdout data: If training/holdout data related parameters are provided, the training data is assigned asynchronously. In this case: * if max_wait is not None, the function returns once the job is finished. * if max_wait is None, the function returns immediately. Progress can be polled by the user (see examples).
If training data assignment fails, new version is still created, but it is not allowed to create a model package for the model version and to deploy it. To check for training data assignment error, check version.training_data.assignment_error[“message”].
New in version v2.21.
- Parameters
- custom_model_id: str
The ID of the custom model.
- base_environment_id: str
The ID of the base environment to use with the custom model version.
- is_major_update: bool
The flag defining if a custom model version will be a minor or a major version. Default to True
- folder_path: str, optional
The path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path.
- files: list, optional
The list of tuples, where values in each tuple are the local filesystem path and the path the file should be placed in the model. If the list is of strings, then basenames will be used for tuples. Example: [(“/home/user/Documents/myModel/file1.txt”, “file1.txt”), (“/home/user/Documents/myModel/folder/file2.txt”, “folder/file2.txt”)] or [“/home/user/Documents/myModel/file1.txt”, “/home/user/Documents/myModel/folder/file2.txt”]
- network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional
Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS, datarobot.NETWORK_EGRESS_POLICY.PUBLIC]. Note: datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS value is only supported by the SaaS (cloud) environment.
- maximum_memory: int, optional
The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s.
- replicas: int, optional
A fixed number of replicas that will be deployed in the cluster.
- required_metadata_values: List[RequiredMetadataValue]
Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
- training_dataset_id: str, optional
The ID of the training dataset to assign to the custom model.
- partition_column: str, optional
Name of a partition column in a training dataset assigned to the custom model. Can only be assigned for structured models.
- holdout_dataset_id: str, optional
The ID of the holdout dataset to assign to the custom model. Can only be assigned for unstructured models.
- keep_training_holdout_data: bool, optional
If the version should inherit training and holdout data from the previous version. Defaults to True. This field is only applicable if the model has training data for versions enabled, otherwise the field value will be ignored.
- max_wait: int, optional
Max time to wait for training data assignment. If set to None - method will return without waiting. Defaults to 10 minutes.
- Returns
- CustomModelVersion
Created custom model version.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- datarobot.errors.InvalidUsageError
If wrong parameters are provided.
- datarobot.errors.TrainingDataAssignmentError
If training data assignment fails.
Examples
Create a version with blocking (default max_wait=600) training data assignment:
import datarobot as dr from datarobot.errors import TrainingDataAssignmentError dr.Client(token=my_token, endpoint=endpoint) try: version = dr.CustomModelVersion.create_from_previous( custom_model_id="6444482e5583f6ee2e572265", base_environment_id="642209acc563893014a41e24", training_dataset_id="6421f2149a4f9b1bec6ad6dd", ) except TrainingDataAssignmentError as e: print(e)
Create a version with non-blocking training data assignment:
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) version = dr.CustomModelVersion.create_from_previous( custom_model_id="6444482e5583f6ee2e572265", base_environment_id="642209acc563893014a41e24", training_dataset_id="6421f2149a4f9b1bec6ad6dd", max_wait=None, ) while version.training_data.assignment_in_progress: time.sleep(10) version.refresh() if version.training_data.assignment_error: print(version.training_data.assignment_error["message"])
- Return type
- classmethod create_from_previous(custom_model_id, base_environment_id, is_major_update=True, folder_path=None, files=None, files_to_delete=None, network_egress_policy=None, maximum_memory=None, replicas=None, required_metadata_values=None, training_dataset_id=None, partition_column=None, holdout_dataset_id=None, keep_training_holdout_data=None, max_wait=600)¶
Create a custom model version containing files from a previous version.
Create a version with training/holdout data: If training/holdout data related parameters are provided, the training data is assigned asynchronously. In this case: * if max_wait is not None, function returns once job is finished. * if max_wait is None, function returns immediately, progress can be polled by the user, see examples.
If training data assignment fails, new version is still created, but it is not allowed to create a model package for the model version and to deploy it. To check for training data assignment error, check version.training_data.assignment_error[“message”].
New in version v2.21.
- Parameters
- custom_model_id: str
The ID of the custom model.
- base_environment_id: str
The ID of the base environment to use with the custom model version.
- is_major_update: bool, optional
The flag defining if a custom model version will be a minor or a major version. Defaults to True.
- folder_path: str, optional
The path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path.
- files: list, optional
The list of tuples, where values in each tuple are the local filesystem path and the path the file should be placed in the model. If list is of strings, then basenames will be used for tuples Example: [(“/home/user/Documents/myModel/file1.txt”, “file1.txt”), (“/home/user/Documents/myModel/folder/file2.txt”, “folder/file2.txt”)] or [“/home/user/Documents/myModel/file1.txt”, “/home/user/Documents/myModel/folder/file2.txt”]
- files_to_delete: list, optional
The list of a file items ids to be deleted. Example: [“5ea95f7a4024030aba48e4f9”, “5ea6b5da402403181895cc51”]
- network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional
Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS, datarobot.NETWORK_EGRESS_POLICY.PUBLIC]. Note: datarobot.NETWORK_EGRESS_POLICY.DR_API_ACCESS value is only supported by the SaaS (cloud) environment.
- maximum_memory: int, optional
The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s
- replicas: int, optional
A fixed number of replicas that will be deployed in the cluster
- required_metadata_values: List[RequiredMetadataValue]
Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
- training_dataset_id: str, optional
The ID of the training dataset to assign to the custom model.
- partition_column: str, optional
Name of a partition column in a training dataset assigned to the custom model. Can only be assigned for structured models.
- holdout_dataset_id: str, optional
The ID of the holdout dataset to assign to the custom model. Can only be assigned for unstructured models.
- keep_training_holdout_data: bool, optional
If the version should inherit training and holdout data from the previous version. Defaults to True. This field is only applicable if the model has training data for versions enabled, otherwise the field value will be ignored.
- max_wait: int, optional
Max time to wait for training data assignment. If set to None - method will return without waiting. Defaults to 10 minutes.
- Returns
- CustomModelVersion
created custom model version
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- datarobot.errors.InvalidUsageError
If wrong parameters are provided.
- datarobot.errors.TrainingDataAssignmentError
If training data assignment fails.
Examples
Create a version with blocking (default max_wait=600) training data assignment:
import datarobot as dr from datarobot.errors import TrainingDataAssignmentError dr.Client(token=my_token, endpoint=endpoint) try: version = dr.CustomModelVersion.create_from_previous( custom_model_id="6444482e5583f6ee2e572265", base_environment_id="642209acc563893014a41e24", training_dataset_id="6421f2149a4f9b1bec6ad6dd", ) except TrainingDataAssignmentError as e: print(e)
Create a version with non-blocking training data assignment:
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) version = dr.CustomModelVersion.create_from_previous( custom_model_id="6444482e5583f6ee2e572265", base_environment_id="642209acc563893014a41e24", training_dataset_id="6421f2149a4f9b1bec6ad6dd", max_wait=None, ) while version.training_data.assignment_in_progress: time.sleep(10) version.refresh() if version.training_data.assignment_error: print(version.training_data.assignment_error["message"])
- Return type
- classmethod list(custom_model_id)¶
List custom model versions.
New in version v2.21.
- Parameters
- custom_model_id: str
The ID of the custom model.
- Returns
- List[CustomModelVersion]
A list of custom model versions.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
List
[CustomModelVersion
]
- classmethod get(custom_model_id, custom_model_version_id)¶
Get custom model version by id.
New in version v2.21.
- Parameters
- custom_model_id: str
The ID of the custom model.
- custom_model_version_id: str
The id of the custom model version to retrieve.
- Returns
- CustomModelVersion
Retrieved custom model version.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
- download(file_path)¶
Download custom model version.
New in version v2.21.
- Parameters
- file_path: str
Path to create a file with custom model version content.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- update(description=None, required_metadata_values=None)¶
Update custom model version properties.
New in version v2.21.
- Parameters
- description: str, optional
New custom model version description.
- required_metadata_values: List[RequiredMetadataValue], optional
Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- refresh()¶
Update custom model version with the latest data from server.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- get_feature_impact(with_metadata=False)¶
Get custom model feature impact.
New in version v2.23.
- Parameters
- with_metadatabool
The flag indicating if the result should include the metadata as well.
- Returns
- feature_impactslist of dict
The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
List
[Dict
[str
,Any
]]
- calculate_feature_impact(max_wait=600)¶
Calculate custom model feature impact.
New in version v2.23.
- Parameters
- max_wait: int, optional
Max time to wait for feature impact calculation. If set to None - method will return without waiting. Defaults to 10 min
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
None
- class datarobot.models.execution_environment.RequiredMetadataKey(**kwargs)¶
Definition of a metadata key that custom models using this environment must define
New in version v2.25.
- Attributes
- field_name: str
The required field key. This value will be added as an environment variable when running custom models.
- display_name: str
A human readable name for the required field.
- class datarobot.models.CustomModelVersionConversion(**kwargs)¶
A conversion of a DataRobot custom model version.
New in version v2.27.
- Attributes
- id: str
The ID of the custom model version conversion.
- custom_model_version_id: str
The ID of the custom model version.
- created: str
ISO-8601 timestamp of when the custom model conversion created.
- main_program_item_id: str or None
The ID of the main program item.
- log_message: str or None
The conversion output log message.
- generated_metadata: dict or None
The dict contains two items: ‘outputDataset’ & ‘outputColumns’.
- conversion_succeeded: bool
Whether the conversion succeeded or not.
- conversion_in_progress: bool
Whether a given conversion is in progress or not.
- should_stop: bool
Whether the user asked to stop a conversion.
- classmethod run_conversion(custom_model_id, custom_model_version_id, main_program_item_id, max_wait=None)¶
Initiate a new custom model version conversion.
- Parameters
- custom_model_idstr
The associated custom model ID.
- custom_model_version_idstr
The associated custom model version ID.
- main_program_item_idstr
The selected main program item ID. This should be one of the SAS items in the associated custom model version.
- max_wait: int or None
Max wait time in seconds. If None, then don’t wait.
- Returns
- conversion_idstr
The ID of the newly created conversion entity.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
str
- classmethod stop_conversion(custom_model_id, custom_model_version_id, conversion_id)¶
Stop a conversion that is in progress.
- Parameters
- custom_model_idstr
The ID of the associated custom model.
- custom_model_version_idstr
The ID of the associated custom model version.
- conversion_id
THe ID of a conversion that is in-progress.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
Response
- classmethod get(custom_model_id, custom_model_version_id, conversion_id)¶
Get custom model version conversion by id.
New in version v2.27.
- Parameters
- custom_model_id: str
The ID of the custom model.
- custom_model_version_id: str
The ID of the custom model version.
- conversion_id: str
The ID of the conversion to retrieve.
- Returns
- CustomModelVersionConversion
Retrieved custom model version conversion.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
- classmethod get_latest(custom_model_id, custom_model_version_id)¶
Get latest custom model version conversion for a given custom model version.
New in version v2.27.
- Parameters
- custom_model_id: str
The ID of the custom model.
- custom_model_version_id: str
The ID of the custom model version.
- Returns
- CustomModelVersionConversion or None
Retrieved latest conversion for a given custom model version.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
Optional
[CustomModelVersionConversion
]
- classmethod list(custom_model_id, custom_model_version_id)¶
Get custom model version conversions list per custom model version.
New in version v2.27.
- Parameters
- custom_model_id: str
The ID of the custom model.
- custom_model_version_id: str
The ID of the custom model version.
- Returns
- List[CustomModelVersionConversion]
Retrieved conversions for a given custom model version.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
- class datarobot.CustomModelVersionDependencyBuild(**kwargs)¶
Metadata about a DataRobot custom model version’s dependency build
New in version v2.22.
- Attributes
- custom_model_id: str
The ID of the custom model.
- custom_model_version_id: str
The ID of the custom model version.
- build_status: str
The status of the custom model version’s dependency build.
- started_at: str
ISO-8601 formatted timestamp of when the build was started.
- completed_at: str, optional
ISO-8601 formatted timestamp of when the build has completed.
- classmethod get_build_info(custom_model_id, custom_model_version_id)¶
Retrieve information about a custom model version’s dependency build
New in version v2.22.
- Parameters
- custom_model_id: str
The ID of the custom model.
- custom_model_version_id: str
The ID of the custom model version.
- Returns
- CustomModelVersionDependencyBuild
The dependency build information.
- Return type
- classmethod start_build(custom_model_id, custom_model_version_id, max_wait=600)¶
Start the dependency build for a custom model version dependency build
New in version v2.22.
- Parameters
- custom_model_id: str
The ID of the custom model
- custom_model_version_id: str
the ID of the custom model version
- max_wait: int, optional
Max time to wait for a build completion. If set to None - method will return without waiting.
- Return type
Optional
[CustomModelVersionDependencyBuild
]
- get_log()¶
Get log of a custom model version dependency build.
New in version v2.22.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
str
- cancel()¶
Cancel custom model version dependency build that is in progress.
New in version v2.22.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- refresh()¶
Update custom model version dependency build with the latest data from server.
New in version v2.22.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- class datarobot.ExecutionEnvironment(**kwargs)¶
An execution environment entity.
New in version v2.21.
- Attributes
- id: str
the id of the execution environment
- name: str
the name of the execution environment
- description: str, optional
the description of the execution environment
- programming_language: str, optional
the programming language of the execution environment. Can be “python”, “r”, “java” or “other”
- is_public: bool, optional
public accessibility of environment, visible only for admin user
- created_at: str, optional
ISO-8601 formatted timestamp of when the execution environment version was created
- latest_version: ExecutionEnvironmentVersion, optional
the latest version of the execution environment
- classmethod create(name, description=None, programming_language=None, required_metadata_keys=None)¶
Create an execution environment.
New in version v2.21.
- Parameters
- name: str
execution environment name
- description: str, optional
execution environment description
- programming_language: str, optional
programming language of the environment to be created. Can be “python”, “r”, “java” or “other”. Default value - “other”
- required_metadata_keys: List[RequiredMetadataKey]
Definition of a metadata keys that custom models using this environment must define
- Returns
- ExecutionEnvironment
created execution environment
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- classmethod list(search_for=None)¶
List execution environments available to the user.
New in version v2.21.
- Parameters
- search_for: str, optional
the string for filtering execution environment - only execution environments that contain the string in name or description will be returned.
- Returns
- List[ExecutionEnvironment]
a list of execution environments.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- classmethod get(execution_environment_id)¶
Get execution environment by it’s id.
New in version v2.21.
- Parameters
- execution_environment_id: str
ID of the execution environment to retrieve
- Returns
- ExecutionEnvironment
retrieved execution environment
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- delete()¶
Delete execution environment.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- update(name=None, description=None, required_metadata_keys=None)¶
Update execution environment properties.
New in version v2.21.
- Parameters
- name: str, optional
new execution environment name
- description: str, optional
new execution environment description
- required_metadata_keys: List[RequiredMetadataKey]
Definition of a metadata keys that custom models using this environment must define
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- refresh()¶
Update execution environment with the latest data from server.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- class datarobot.ExecutionEnvironmentVersion(**kwargs)¶
A version of a DataRobot execution environment.
New in version v2.21.
- Attributes
- id: str
the id of the execution environment version
- environment_id: str
the id of the execution environment the version belongs to
- build_status: str
the status of the execution environment version build
- label: str, optional
the label of the execution environment version
- description: str, optional
the description of the execution environment version
- created_at: str, optional
ISO-8601 formatted timestamp of when the execution environment version was created
- docker_context_size: int, optional
The size of the uploaded Docker context in bytes if available or None if not
- docker_image_size: int, optional
The size of the built Docker image in bytes if available or None if not
- classmethod create(execution_environment_id, docker_context_path, label=None, description=None, max_wait=600)¶
Create an execution environment version.
New in version v2.21.
- Parameters
- execution_environment_id: str
the id of the execution environment
- docker_context_path: str
the path to a docker context archive or folder
- label: str, optional
short human readable string to label the version
- description: str, optional
execution environment version description
- max_wait: int, optional
max time to wait for a final build status (“success” or “failed”). If set to None - method will return without waiting.
- Returns
- ExecutionEnvironmentVersion
created execution environment version
- Raises
- datarobot.errors.AsyncTimeoutError
if version did not reach final state during timeout seconds
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- classmethod list(execution_environment_id, build_status=None)¶
List execution environment versions available to the user.
New in version v2.21.
- Parameters
- execution_environment_id: str
the id of the execution environment
- build_status: str, optional
build status of the execution environment version to filter by. See datarobot.enums.EXECUTION_ENVIRONMENT_VERSION_BUILD_STATUS for valid options
- Returns
- List[ExecutionEnvironmentVersion]
a list of execution environment versions.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- classmethod get(execution_environment_id, version_id)¶
Get execution environment version by id.
New in version v2.21.
- Parameters
- execution_environment_id: str
the id of the execution environment
- version_id: str
the id of the execution environment version to retrieve
- Returns
- ExecutionEnvironmentVersion
retrieved execution environment version
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- download(file_path)¶
Download execution environment version.
New in version v2.21.
- Parameters
- file_path: str
path to create a file with execution environment version content
- Returns
- ExecutionEnvironmentVersion
retrieved execution environment version
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- get_build_log()¶
Get execution environment version build log and error.
New in version v2.21.
- Returns
- Tuple[str, str]
retrieved execution environment version build log and error. If there is no build error - None is returned.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- refresh()¶
Update execution environment version with the latest data from server.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- class datarobot.models.custom_model_version.HoldoutData(dataset_id=None, dataset_version_id=None, dataset_name=None, partition_column=None)¶
Holdout data assigned to a DataRobot custom model version.
New in version v3.2.
- Attributes
- dataset_id: str
The ID of the dataset.
- dataset_version_id: str
The ID of the dataset version.
- dataset_name: str
The name of the dataset.
- partition_column: str
The name of the partitions column.
- class datarobot.models.custom_model_version.TrainingData(dataset_id=None, dataset_version_id=None, dataset_name=None, assignment_in_progress=None, assignment_error=None)¶
Training data assigned to a DataRobot custom model version.
New in version v3.2.
- Attributes
- dataset_id: str
The ID of the dataset.
- dataset_version_id: str
The ID of the dataset version.
- dataset_name: str
The name of the dataset.
- assignment_in_progress: bool
The status of the assignment in progress.
- assignment_error: dict
The assignment error message.
Custom Tasks¶
- class datarobot.CustomTask(id, target_type, latest_version, created_at, updated_at, name, description, language, created_by, calibrate_predictions=None)¶
A custom task. This can be in a partial state or a complete state. When the latest_version is None, the empty task has been initialized with some metadata. It is not yet use-able for actual training. Once the first CustomTaskVersion has been created, you can put the CustomTask in UserBlueprints to train Models in Projects
New in version v2.26.
- Attributes
- id: str
id of the custom task
- name: str
name of the custom task
- language: str
programming language of the custom task. Can be “python”, “r”, “java” or “other”
- description: str
description of the custom task
- target_type: datarobot.enums.CUSTOM_TASK_TARGET_TYPE
the target type of the custom task. One of:
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.BINARY
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.REGRESSION
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.MULTICLASS
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.ANOMALY
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.TRANSFORM
- latest_version: datarobot.CustomTaskVersion or None
latest version of the custom task if the task has a latest version. If the latest version is None, the custom task is not ready for use in user blueprints. You must create its first CustomTaskVersion before you can use the CustomTask
- created_by: str
The username of the user who created the custom task.
- updated_at: str
An ISO-8601 formatted timestamp of when the custom task was updated.
- created_at: str
ISO-8601 formatted timestamp of when the custom task was created
- calibrate_predictions: bool
whether anomaly predictions should be calibrated to be between 0 and 1 by DR. only applies to custom estimators with target type datarobot.enums.CUSTOM_TASK_TARGET_TYPE.ANOMALY
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
- classmethod list(order_by=None, search_for=None)¶
List custom tasks available to the user.
New in version v2.26.
- Parameters
- search_for: str, optional
string for filtering custom tasks - only tasks that contain the string in name or description will be returned. If not specified, all custom task will be returned
- order_by: str, optional
property to sort custom tasks by. Supported properties are “created” and “updated”. Prefix the attribute name with a dash to sort in descending order, e.g. order_by=’-created’. By default, the order_by parameter is None which will result in custom tasks being returned in order of creation time descending
- Returns
- List[CustomTask]
a list of custom tasks.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
List
[CustomTask
]
- classmethod get(custom_task_id)¶
Get custom task by id.
New in version v2.26.
- Parameters
- custom_task_id: str
id of the custom task
- Returns
- CustomTask
retrieved custom task
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- Return type
- classmethod copy(custom_task_id)¶
Create a custom task by copying existing one.
New in version v2.26.
- Parameters
- custom_task_id: str
id of the custom task to copy
- Returns
- CustomTask
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
- classmethod create(name, target_type, language=None, description=None, calibrate_predictions=None, **kwargs)¶
Creates only the metadata for a custom task. This task will not be use-able until you have created a CustomTaskVersion attached to this task.
New in version v2.26.
- Parameters
- name: str
name of the custom task
- target_type: datarobot.enums.CUSTOM_TASK_TARGET_TYPE
the target typed based on the following values. Anything else will raise an error
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.BINARY
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.REGRESSION
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.MULTICLASS
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.ANOMALY
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.TRANSFORM
- language: str, optional
programming language of the custom task. Can be “python”, “r”, “java” or “other”
- description: str, optional
description of the custom task
- calibrate_predictions: bool, optional
whether anomaly predictions should be calibrated to be between 0 and 1 by DR. if None, uses default value from DR app (True). only applies to custom estimators with target type datarobot.enums.CUSTOM_TASK_TARGET_TYPE.ANOMALY
- Returns
- CustomTask
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- Return type
- update(name=None, language=None, description=None, **kwargs)¶
Update custom task properties.
New in version v2.26.
- Parameters
- name: str, optional
new custom task name
- language: str, optional
new custom task programming language
- description: str, optional
new custom task description
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- Return type
None
- refresh()¶
Update custom task with the latest data from server.
New in version v2.26.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
None
- delete()¶
Delete custom task.
New in version v2.26.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
None
- download_latest_version(file_path)¶
Download the latest custom task version.
New in version v2.26.
- Parameters
- file_path: str
the full path of the target zip file
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- Return type
None
- get_access_list()¶
Retrieve access control settings of this custom task.
New in version v2.27.
- Returns
- list ofclass:SharingAccess <datarobot.SharingAccess>
- Return type
List
[SharingAccess
]
Update the access control settings of this custom task.
New in version v2.27.
- Parameters
- access_listlist of
SharingAccess
A list of SharingAccess to update.
- access_listlist of
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
Examples
Transfer access to the custom task from old_user@datarobot.com to new_user@datarobot.com
import datarobot as dr new_access = dr.SharingAccess(new_user@datarobot.com, dr.enums.SHARING_ROLE.OWNER, can_share=True) access_list = [dr.SharingAccess(old_user@datarobot.com, None), new_access] dr.CustomTask.get('custom-task-id').share(access_list)
- Return type
None
- class datarobot.models.custom_task_version.CustomTaskFileItem(id, file_name, file_path, file_source, created_at=None)¶
A file item attached to a DataRobot custom task version.
New in version v2.26.
- Attributes
- id: str
id of the file item
- file_name: str
name of the file item
- file_path: str
path of the file item
- file_source: str
source of the file item
- created_at: str
ISO-8601 formatted timestamp of when the version was created
- class datarobot.CustomTaskVersion(id, custom_task_id, version_major, version_minor, label, created_at, is_frozen, items, description=None, base_environment_id=None, maximum_memory=None, base_environment_version_id=None, dependencies=None, required_metadata_values=None, arguments=None)¶
A version of a DataRobot custom task.
New in version v2.26.
- Attributes
- id: str
id of the custom task version
- custom_task_id: str
id of the custom task
- version_minor: int
a minor version number of custom task version
- version_major: int
a major version number of custom task version
- label: str
short human readable string to label the version
- created_at: str
ISO-8601 formatted timestamp of when the version was created
- is_frozen: bool
a flag if the custom task version is frozen
- items: List[CustomTaskFileItem]
a list of file items attached to the custom task version
- description: str, optional
custom task version description
- base_environment_id: str, optional
id of the environment to use with the task
- base_environment_version_id: str, optional
id of the environment version to use with the task
- dependencies: List[CustomDependency]
the parsed dependencies of the custom task version if the version has a valid requirements.txt file
- required_metadata_values: List[RequiredMetadataValue]
Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
- arguments: List[UserBlueprintTaskArgument]
A list of custom task version arguments.
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- classmethod create_clean(custom_task_id, base_environment_id, maximum_memory=None, is_major_update=True, folder_path=None, required_metadata_values=None)¶
Create a custom task version without files from previous versions.
New in version v2.26.
- Parameters
- custom_task_id: str
the id of the custom task
- base_environment_id: str
the id of the base environment to use with the custom task version
- is_major_update: bool, optional
if the current version is 2.3, True would set the new version at 3.0. False would set the new version at 2.4. Default to True
- folder_path: str, optional
the path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path
- required_metadata_values: List[RequiredMetadataValue]
Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
- maximum_memory: int
A number in bytes about how much memory custom tasks’ inference containers can run with.
- Returns
- CustomTaskVersion
created custom task version
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- classmethod create_from_previous(custom_task_id, base_environment_id, is_major_update=True, folder_path=None, files_to_delete=None, required_metadata_values=None, maximum_memory=None)¶
Create a custom task version containing files from a previous version.
New in version v2.26.
- Parameters
- custom_task_id: str
the id of the custom task
- base_environment_id: str
the id of the base environment to use with the custom task version
- is_major_update: bool, optional
if the current version is 2.3, True would set the new version at 3.0. False would set the new version at 2.4. Default to True
- folder_path: str, optional
the path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path
- files_to_delete: list, optional
the list of a file items ids to be deleted Example: [“5ea95f7a4024030aba48e4f9”, “5ea6b5da402403181895cc51”]
- required_metadata_values: List[RequiredMetadataValue]
Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
- maximum_memory: int
A number in bytes about how much memory custom tasks’ inference containers can run with.
- Returns
- CustomTaskVersion
created custom task version
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- classmethod list(custom_task_id)¶
List custom task versions.
New in version v2.26.
- Parameters
- custom_task_id: str
the id of the custom task
- Returns
- List[CustomTaskVersion]
a list of custom task versions
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- classmethod get(custom_task_id, custom_task_version_id)¶
Get custom task version by id.
New in version v2.26.
- Parameters
- custom_task_id: str
the id of the custom task
- custom_task_version_id: str
the id of the custom task version to retrieve
- Returns
- CustomTaskVersion
retrieved custom task version
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- download(file_path)¶
Download custom task version.
New in version v2.26.
- Parameters
- file_path: str
path to create a file with custom task version content
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- update(description=None, required_metadata_values=None)¶
Update custom task version properties.
New in version v2.26.
- Parameters
- description: str
new custom task version description
- required_metadata_values: List[RequiredMetadataValue]
Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- refresh()¶
Update custom task version with the latest data from server.
New in version v2.26.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- start_dependency_build()¶
Start the dependency build for a custom task version and return build status. .. versionadded:: v2.27
- Returns
- CustomTaskVersionDependencyBuild
DTO of custom task version dependency build.
- start_dependency_build_and_wait(max_wait)¶
Start the dependency build for a custom task version and wait while pulling status. .. versionadded:: v2.27
- Parameters
- max_wait: int
max time to wait for a build completion
- Returns
- CustomTaskVersionDependencyBuild
DTO of custom task version dependency build.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- datarobot.errors.AsyncTimeoutError
Raised if the dependency build is not finished after max_wait.
- cancel_dependency_build()¶
Cancel custom task version dependency build that is in progress. .. versionadded:: v2.27
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- get_dependency_build()¶
Retrieve information about a custom task version’s dependency build. .. versionadded:: v2.27
- Returns
- CustomTaskVersionDependencyBuild
DTO of custom task version dependency build.
- download_dependency_build_log(file_directory='.')¶
Get log of a custom task version dependency build. .. versionadded:: v2.27
- Parameters
- file_directory: str (optional, default is “.”)
Directory path where downloaded file is to save.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
Database Connectivity¶
- class datarobot.DataDriver(id=None, creator=None, base_names=None, class_name=None, canonical_name=None)¶
A data driver
- Attributes
- idstr
the id of the driver.
- class_namestr
the Java class name for the driver.
- canonical_namestr
the user-friendly name of the driver.
- creatorstr
the id of the user who created the driver.
- base_nameslist of str
a list of the file name(s) of the jar files.
- classmethod list()¶
Returns list of available drivers.
- Returns
- driverslist of DataDriver instances
contains a list of available drivers.
Examples
>>> import datarobot as dr >>> drivers = dr.DataDriver.list() >>> drivers [DataDriver('mysql'), DataDriver('RedShift'), DataDriver('PostgreSQL')]
- Return type
List
[DataDriver
]
- classmethod get(driver_id)¶
Gets the driver.
- Parameters
- driver_idstr
the identifier of the driver.
- Returns
- driverDataDriver
the required driver.
Examples
>>> import datarobot as dr >>> driver = dr.DataDriver.get('5ad08a1889453d0001ea7c5c') >>> driver DataDriver('PostgreSQL')
- Return type
- classmethod create(class_name, canonical_name, files)¶
Creates the driver. Only available to admin users.
- Parameters
- class_namestr
the Java class name for the driver.
- canonical_namestr
the user-friendly name of the driver.
- fileslist of str
a list of the file paths on file system file_path(s) for the driver.
- Returns
- driverDataDriver
the created driver.
- Raises
- ClientError
raised if user is not granted for Can manage JDBC database drivers feature
Examples
>>> import datarobot as dr >>> driver = dr.DataDriver.create( ... class_name='org.postgresql.Driver', ... canonical_name='PostgreSQL', ... files=['/tmp/postgresql-42.2.2.jar'] ... ) >>> driver DataDriver('PostgreSQL')
- Return type
- update(class_name=None, canonical_name=None)¶
Updates the driver. Only available to admin users.
- Parameters
- class_namestr
the Java class name for the driver.
- canonical_namestr
the user-friendly name of the driver.
- Raises
- ClientError
raised if user is not granted for Can manage JDBC database drivers feature
Examples
>>> import datarobot as dr >>> driver = dr.DataDriver.get('5ad08a1889453d0001ea7c5c') >>> driver.canonical_name 'PostgreSQL' >>> driver.update(canonical_name='postgres') >>> driver.canonical_name 'postgres'
- Return type
None
- delete()¶
Removes the driver. Only available to admin users.
- Raises
- ClientError
raised if user is not granted for Can manage JDBC database drivers feature
- Return type
None
- class datarobot.Connector(id=None, creator_id=None, configuration_id=None, base_name=None, canonical_name=None)¶
A connector
- Attributes
- idstr
the id of the connector.
- creator_idstr
the id of the user who created the connector.
- base_namestr
the file name of the jar file.
- canonical_namestr
the user-friendly name of the connector.
- configuration_idstr
the id of the configuration of the connector.
- classmethod list()¶
Returns list of available connectors.
- Returns
- connectorslist of Connector instances
contains a list of available connectors.
Examples
>>> import datarobot as dr >>> connectors = dr.Connector.list() >>> connectors [Connector('ADLS Gen2 Connector'), Connector('S3 Connector')]
- Return type
List
[Connector
]
- classmethod get(connector_id)¶
Gets the connector.
- Parameters
- connector_idstr
the identifier of the connector.
- Returns
- connectorConnector
the required connector.
Examples
>>> import datarobot as dr >>> connector = dr.Connector.get('5fe1063e1c075e0245071446') >>> connector Connector('ADLS Gen2 Connector')
- Return type
- classmethod create(file_path)¶
Creates the connector from a jar file. Only available to admin users.
- Parameters
- file_pathstr
the file path on file system file_path(s) for the connector.
- Returns
- connectorConnector
the created connector.
- Raises
- ClientError
raised if user is not granted for Can manage connectors feature
Examples
>>> import datarobot as dr >>> connector = dr.Connector.create('/tmp/connector-adls-gen2.jar') >>> connector Connector('ADLS Gen2 Connector')
- Return type
- update(file_path)¶
Updates the connector with new jar file. Only available to admin users.
- Parameters
- file_pathstr
the file path on file system file_path(s) for the connector.
- Returns
- connectorConnector
the updated connector.
- Raises
- ClientError
raised if user is not granted for Can manage connectors feature
Examples
>>> import datarobot as dr >>> connector = dr.Connector.get('5fe1063e1c075e0245071446') >>> connector.base_name 'connector-adls-gen2.jar' >>> connector.update('/tmp/connector-s3.jar') >>> connector.base_name 'connector-s3.jar'
- Return type
- delete()¶
Removes the connector. Only available to admin users.
- Raises
- ClientError
raised if user is not granted for Can manage connectors feature
- Return type
None
- class datarobot.DataStore(data_store_id=None, data_store_type=None, canonical_name=None, creator=None, updated=None, params=None, role=None)¶
A data store. Represents database
- Attributes
- idstr
The id of the data store.
- data_store_typestr
The type of data store.
- canonical_namestr
The user-friendly name of the data store.
- creatorstr
The id of the user who created the data store.
- updateddatetime.datetime
The time of the last update
- paramsDataStoreParameters
A list specifying data store parameters.
- rolestr
Your access role for this data store.
- classmethod list()¶
Returns list of available data stores.
- Returns
- data_storeslist of DataStore instances
contains a list of available data stores.
Examples
>>> import datarobot as dr >>> data_stores = dr.DataStore.list() >>> data_stores [DataStore('Demo'), DataStore('Airlines')]
- Return type
List
[DataStore
]
- classmethod get(data_store_id)¶
Gets the data store.
- Parameters
- data_store_idstr
the identifier of the data store.
- Returns
- data_storeDataStore
the required data store.
Examples
>>> import datarobot as dr >>> data_store = dr.DataStore.get('5a8ac90b07a57a0001be501e') >>> data_store DataStore('Demo')
- Return type
- classmethod create(data_store_type, canonical_name, driver_id, jdbc_url)¶
Creates the data store.
- Parameters
- data_store_typestr
the type of data store.
- canonical_namestr
the user-friendly name of the data store.
- driver_idstr
the identifier of the DataDriver.
- jdbc_urlstr
the full JDBC url, for example jdbc:postgresql://my.dbaddress.org:5432/my_db.
- Returns
- data_storeDataStore
the created data store.
Examples
>>> import datarobot as dr >>> data_store = dr.DataStore.create( ... data_store_type='jdbc', ... canonical_name='Demo DB', ... driver_id='5a6af02eb15372000117c040', ... jdbc_url='jdbc:postgresql://my.db.address.org:5432/perftest' ... ) >>> data_store DataStore('Demo DB')
- Return type
- update(canonical_name=None, driver_id=None, jdbc_url=None)¶
Updates the data store.
- Parameters
- canonical_namestr
optional, the user-friendly name of the data store.
- driver_idstr
optional, the identifier of the DataDriver.
- jdbc_urlstr
optional, the full JDBC url, for example jdbc:postgresql://my.dbaddress.org:5432/my_db.
Examples
>>> import datarobot as dr >>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae') >>> data_store DataStore('Demo DB') >>> data_store.update(canonical_name='Demo DB updated') >>> data_store DataStore('Demo DB updated')
- Return type
None
- delete()¶
Removes the DataStore
- Return type
None
- test(username=None, password=None, credential_id=None, use_kerberos=None, credential_data=None)¶
Tests database connection.
Changed in version v3.2: Added credential_id, use_kerberos and credential_data optional params and made username and password optional.
- Parameters
- usernamestr
optional, the username for database authentication.
- passwordstr
optional, the password for database authentication. The password is encrypted at server side and never saved / stored
- credential_idstr
optional, id of the set of credentials to use instead of username and password
- use_kerberosbool
optional, whether to use Kerberos for data store authentication
- credential_datadict
optional, the credentials to authenticate with the database, to use instead of user/password or credential ID
- Returns
- messagedict
message with status.
Examples
>>> import datarobot as dr >>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae') >>> data_store.test(username='db_username', password='db_password') {'message': 'Connection successful'}
- Return type
- schemas(username, password)¶
Returns list of available schemas.
- Parameters
- usernamestr
the username for database authentication.
- passwordstr
the password for database authentication. The password is encrypted at server side and never saved / stored
- Returns
- responsedict
dict with database name and list of str - available schemas
Examples
>>> import datarobot as dr >>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae') >>> data_store.schemas(username='db_username', password='db_password') {'catalog': 'perftest', 'schemas': ['demo', 'information_schema', 'public']}
- Return type
- tables(username, password, schema=None)¶
Returns list of available tables in schema.
- Parameters
- usernamestr
optional, the username for database authentication.
- passwordstr
optional, the password for database authentication. The password is encrypted at server side and never saved / stored
- schemastr
optional, the schema name.
- Returns
- responsedict
dict with catalog name and tables info
Examples
>>> import datarobot as dr >>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae') >>> data_store.tables(username='db_username', password='db_password', schema='demo') {'tables': [{'type': 'TABLE', 'name': 'diagnosis', 'schema': 'demo'}, {'type': 'TABLE', 'name': 'kickcars', 'schema': 'demo'}, {'type': 'TABLE', 'name': 'patient', 'schema': 'demo'}, {'type': 'TABLE', 'name': 'transcript', 'schema': 'demo'}], 'catalog': 'perftest'}
- Return type
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
- get_access_list()¶
Retrieve what users have access to this data store
New in version v2.14.
- Returns
- list ofclass:SharingAccess <datarobot.SharingAccess>
- Return type
List
[SharingAccess
]
Retrieve what users have access to this data store
New in version v3.2.
- Returns
- list ofclass:SharingRole <datarobot.models.sharing.SharingRole>
- Return type
List
[SharingRole
]
Modify the ability of users to access this data store
New in version v2.14.
- Parameters
- access_listlist of
SharingRole
the modifications to make.
- access_listlist of
- Raises
- datarobot.ClientError
if you do not have permission to share this data store, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the data store without an owner.
Examples
The
SharingRole
class is needed in order to share a Data Store with one or more users.For example, suppose you had a list of user IDs you wanted to share this DataStore with. You could use a loop to generate a list of
SharingRole
objects for them, and bulk share this Data Store.>>> import datarobot as dr >>> from datarobot.models.sharing import SharingRole >>> from datarobot.enums import SHARING_ROLE, SHARING_RECIPIENT_TYPE >>> >>> user_ids = ["60912e09fd1f04e832a575c1", "639ce542862e9b1b1bfa8f1b", "63e185e7cd3a5f8e190c6393"] >>> sharing_roles = [] >>> for user_id in user_ids: ... new_sharing_role = SharingRole( ... role=SHARING_ROLE.CONSUMER, ... share_recipient_type=SHARING_RECIPIENT_TYPE.USER, ... id=user_id, ... can_share=True, ... ) ... sharing_roles.append(new_sharing_role) >>> dr.DataStore.get('my-data-store-id').share(access_list)
Similarly, a
SharingRole
instance can be used to remove a user’s access if therole
is set toSHARING_ROLE.NO_ROLE
, like in this example:>>> import datarobot as dr >>> from datarobot.models.sharing import SharingRole >>> from datarobot.enums import SHARING_ROLE, SHARING_RECIPIENT_TYPE >>> >>> user_to_remove = "[email protected]" ... remove_sharing_role = SharingRole( ... role=SHARING_ROLE.NO_ROLE, ... share_recipient_type=SHARING_RECIPIENT_TYPE.USER, ... username=user_to_remove, ... can_share=False, ... ) >>> dr.DataStore.get('my-data-store-id').share(roles=[remove_sharing_role])
- Return type
None
- class datarobot.DataSource(data_source_id=None, data_source_type=None, canonical_name=None, creator=None, updated=None, params=None, role=None)¶
A data source. Represents data request
- Attributes
- idstr
the id of the data source.
- typestr
the type of data source.
- canonical_namestr
the user-friendly name of the data source.
- creatorstr
the id of the user who created the data source.
- updateddatetime.datetime
the time of the last update.
- paramsDataSourceParameters
a list specifying data source parameters.
- rolestr or None
if a string, represents a particular level of access and should be one of
datarobot.enums.SHARING_ROLE
. For more information on the specific access levels, see the sharing documentation. If None, can be passed to a share function to revoke access for a specific user.
- classmethod list()¶
Returns list of available data sources.
- Returns
- data_sourceslist of DataSource instances
contains a list of available data sources.
Examples
>>> import datarobot as dr >>> data_sources = dr.DataSource.list() >>> data_sources [DataSource('Diagnostics'), DataSource('Airlines 100mb'), DataSource('Airlines 10mb')]
- Return type
List
[DataSource
]
- classmethod get(data_source_id)¶
Gets the data source.
- Parameters
- data_source_idstr
the identifier of the data source.
- Returns
- data_sourceDataSource
the requested data source.
Examples
>>> import datarobot as dr >>> data_source = dr.DataSource.get('5a8ac9ab07a57a0001be501f') >>> data_source DataSource('Diagnostics')
- Return type
TypeVar
(TDataSource
, bound=DataSource
)
- classmethod create(data_source_type, canonical_name, params)¶
Creates the data source.
- Parameters
- data_source_typestr
the type of data source.
- canonical_namestr
the user-friendly name of the data source.
- paramsDataSourceParameters
a list specifying data source parameters.
- Returns
- data_sourceDataSource
the created data source.
Examples
>>> import datarobot as dr >>> params = dr.DataSourceParameters( ... data_store_id='5a8ac90b07a57a0001be501e', ... query='SELECT * FROM airlines10mb WHERE "Year" >= 1995;' ... ) >>> data_source = dr.DataSource.create( ... data_source_type='jdbc', ... canonical_name='airlines stats after 1995', ... params=params ... ) >>> data_source DataSource('airlines stats after 1995')
- Return type
TypeVar
(TDataSource
, bound=DataSource
)
- update(canonical_name=None, params=None)¶
Creates the data source.
- Parameters
- canonical_namestr
optional, the user-friendly name of the data source.
- paramsDataSourceParameters
optional, the identifier of the DataDriver.
Examples
>>> import datarobot as dr >>> data_source = dr.DataSource.get('5ad840cc613b480001570953') >>> data_source DataSource('airlines stats after 1995') >>> params = dr.DataSourceParameters( ... query='SELECT * FROM airlines10mb WHERE "Year" >= 1990;' ... ) >>> data_source.update( ... canonical_name='airlines stats after 1990', ... params=params ... ) >>> data_source DataSource('airlines stats after 1990')
- Return type
None
- delete()¶
Removes the DataSource
- Return type
None
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
TypeVar
(TDataSource
, bound=DataSource
)
- get_access_list()¶
Retrieve what users have access to this data source
New in version v2.14.
- Returns
- list ofclass:SharingAccess <datarobot.SharingAccess>
- Return type
List
[SharingAccess
]
Modify the ability of users to access this data source
New in version v2.14.
- Parameters
- access_list: list ofclass:SharingAccess <datarobot.SharingAccess>
The modifications to make.
- Raises
- datarobot.ClientError:
If you do not have permission to share this data source, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the data source without an owner.
Examples
Transfer access to the data source from old_user@datarobot.com to new_user@datarobot.com
from datarobot.enums import SHARING_ROLE from datarobot.models.data_source import DataSource from datarobot.models.sharing import SharingAccess new_access = SharingAccess( "[email protected]", SHARING_ROLE.OWNER, can_share=True, ) access_list = [ SharingAccess("[email protected]", SHARING_ROLE.OWNER, can_share=True), new_access, ] DataSource.get('my-data-source-id').share(access_list)
- Return type
None
- create_dataset(username=None, password=None, do_snapshot=None, persist_data_after_ingestion=None, categories=None, credential_id=None, use_kerberos=None)¶
Create a
Dataset
from this data source.New in version v2.22.
- Parameters
- username: string, optional
The username for database authentication.
- password: string, optional
The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored.
- do_snapshot: bool, optional
If unset, uses the server default: True. If true, creates a snapshot dataset; if false, creates a remote dataset. Creating snapshots from non-file sources requires an additional permission, Enable Create Snapshot Data Source.
- persist_data_after_ingestion: bool, optional
If unset, uses the server default: True. If true, will enforce saving all data (for download and sampling) and will allow a user to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.). If false, will not enforce saving data. The data schema (feature names and types) still will be available. Specifying this parameter to false and doSnapshot to true will result in an error.
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- credential_id: string, optional
The ID of the set of credentials to use instead of user and password. Note that with this change, username and password will become optional.
- use_kerberos: bool, optional
If unset, uses the server default: False. If true, use kerberos authentication for database authentication.
- Returns
- response: Dataset
The Dataset created from the uploaded data
- Return type
- class datarobot.DataSourceParameters(data_store_id=None, table=None, schema=None, partition_column=None, query=None, fetch_size=None)¶
Data request configuration
- Attributes
- data_store_idstr
the id of the DataStore.
- tablestr
optional, the name of specified database table.
- schemastr
optional, the name of the schema associated with the table.
- partition_columnstr
optional, the name of the partition column.
- querystr
optional, the user specified SQL query.
- fetch_sizeint
optional, a user specified fetch size in the range [1, 20000]. By default a fetchSize will be assigned to balance throughput and memory usage
Datasets¶
- class datarobot.models.Dataset(dataset_id, version_id, name, categories, created_at, is_data_engine_eligible, is_latest_version, is_snapshot, processing_state, created_by=None, data_persisted=None, size=None, row_count=None, recipe_id=None)¶
Represents a Dataset returned from the api/v2/datasets/ endpoints.
- Attributes
- id: string
The ID of this dataset
- name: string
The name of this dataset in the catalog
- is_latest_version: bool
Whether this dataset version is the latest version of this dataset
- version_id: string
The object ID of the catalog_version the dataset belongs to
- categories: list(string)
An array of strings describing the intended use of the dataset. The supported options are “TRAINING” and “PREDICTION”.
- created_at: string
The date when the dataset was created
- created_by: string, optional
Username of the user who created the dataset
- is_snapshot: bool
Whether the dataset version is an immutable snapshot of data which has previously been retrieved and saved to Data_robot
- data_persisted: bool, optional
If true, user is allowed to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.) and download data. If false, download is not allowed and only the data schema (feature names and types) will be available.
- is_data_engine_eligible: bool
Whether this dataset can be a data source of a data engine query.
- processing_state: string
Current ingestion process state of the dataset
- row_count: int, optional
The number of rows in the dataset.
- size: int, optional
The size of the dataset as a CSV in bytes.
- get_uri()¶
- Returns
- urlstr
Permanent static hyperlink to this dataset in AI Catalog.
- Return type
str
- classmethod upload(source)¶
This method covers Dataset creation from local materials (file & DataFrame) and a URL.
- Parameters
- source: str, pd.DataFrame or file object
Pass a URL, filepath, file or DataFrame to create and return a Dataset.
- Returns
- response: Dataset
The Dataset created from the uploaded data source.
- Raises
- InvalidUsageError
If the source parameter cannot be determined to be a URL, filepath, file or DataFrame.
Examples
# Upload a local file dataset_one = Dataset.upload("./data/examples.csv") # Create a dataset via URL dataset_two = Dataset.upload( "https://raw.githubusercontent.com/curran/data/gh-pages/dbpedia/cities/data.csv" ) # Create dataset with a pandas Dataframe dataset_three = Dataset.upload(my_df) # Create dataset using a local file with open("./data/examples.csv", "rb") as file_pointer: dataset_four = Dataset.create_from_file(filelike=file_pointer)
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod create_from_file(cls, file_path=None, filelike=None, categories=None, read_timeout=600, max_wait=600, *, use_cases=None)¶
A blocking call that creates a new Dataset from a file. Returns when the dataset has been successfully uploaded and processed.
Warning: This function does not clean up it’s open files. If you pass a filelike, you are responsible for closing it. If you pass a file_path, this will create a file object from the file_path but will not close it.
- Parameters
- file_path: string, optional
The path to the file. This will create a file object pointing to that file but will not close it.
- filelike: file, optional
An open and readable file object.
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- read_timeout: int, optional
The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete
- max_wait: int, optional
Time in seconds after which dataset creation is considered unsuccessful
- use_cases: list[UseCase] | UseCase | list[string] | string, optional
A list of UseCase objects, UseCase object, list of Use Case ids or a single Use Case id to add this new Dataset to. Must be a kwarg.
- Returns
- response: Dataset
A fully armed and operational Dataset
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod create_from_in_memory_data(cls, data_frame=None, records=None, categories=None, read_timeout=600, max_wait=600, fname=None, *, use_cases=None)¶
A blocking call that creates a new Dataset from in-memory data. Returns when the dataset has been successfully uploaded and processed.
The data can be either a pandas DataFrame or a list of dictionaries with identical keys.
- Parameters
- data_frame: DataFrame, optional
The data frame to upload
- records: list[dict], optional
A list of dictionaries with identical keys to upload
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- read_timeout: int, optional
The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete
- max_wait: int, optional
Time in seconds after which dataset creation is considered unsuccessful
- fname: string, optional
The file name, “data.csv” by default
- use_cases: list[UseCase] | UseCase | list[string] | string, optional
A list of UseCase objects, UseCase object, list of Use Case IDs or a single Use Case ID to add this new dataset to. Must be a kwarg.
- Returns
- response: Dataset
The Dataset created from the uploaded data.
- Raises
- InvalidUsageError
If neither a DataFrame or list of records is passed.
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod create_from_url(cls, url, do_snapshot=None, persist_data_after_ingestion=None, categories=None, max_wait=600, *, use_cases=None)¶
A blocking call that creates a new Dataset from data stored at a url. Returns when the dataset has been successfully uploaded and processed.
- Parameters
- url: string
The URL to use as the source of data for the dataset being created.
- do_snapshot: bool, optional
If unset, uses the server default: True. If true, creates a snapshot dataset; if false, creates a remote dataset. Creating snapshots from non-file sources may be disabled by the permission, Disable AI Catalog Snapshots.
- persist_data_after_ingestion: bool, optional
If unset, uses the server default: True. If true, will enforce saving all data (for download and sampling) and will allow a user to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.). If false, will not enforce saving data. The data schema (feature names and types) still will be available. Specifying this parameter to false and doSnapshot to true will result in an error.
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- max_wait: int, optional
Time in seconds after which dataset creation is considered unsuccessful.
- use_cases: list[UseCase] | UseCase | list[string] | string, optional
A list of UseCase objects, UseCase object, list of Use Case IDs or a single Use Case ID to add this new dataset to. Must be a kwarg.
- Returns
- response: Dataset
The Dataset created from the uploaded data
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod create_from_data_source(cls, data_source_id, username=None, password=None, do_snapshot=None, persist_data_after_ingestion=None, categories=None, credential_id=None, use_kerberos=None, credential_data=None, max_wait=600, *, use_cases=None)¶
A blocking call that creates a new Dataset from data stored at a DataSource. Returns when the dataset has been successfully uploaded and processed.
New in version v2.22.
- Parameters
- data_source_id: string
The ID of the DataSource to use as the source of data.
- username: string, optional
The username for database authentication.
- password: string, optional
The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored.
- do_snapshot: bool, optional
If unset, uses the server default: True. If true, creates a snapshot dataset; if false, creates a remote dataset. Creating snapshots from non-file sources requires may be disabled by the permission, Disable AI Catalog Snapshots.
- persist_data_after_ingestion: bool, optional
If unset, uses the server default: True. If true, will enforce saving all data (for download and sampling) and will allow a user to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.). If false, will not enforce saving data. The data schema (feature names and types) still will be available. Specifying this parameter to false and doSnapshot to true will result in an error.
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- credential_id: string, optional
The ID of the set of credentials to use instead of user and password. Note that with this change, username and password will become optional.
- use_kerberos: bool, optional
If unset, uses the server default: False. If true, use kerberos authentication for database authentication.
- credential_data: dict, optional
The credentials to authenticate with the database, to use instead of user/password or credential ID.
- max_wait: int, optional
Time in seconds after which project creation is considered unsuccessful.
- use_cases: list[UseCase] | UseCase | list[string] | string, optional
A list of UseCase objects, UseCase object, list of Use Case IDs or a single Use Case ID to add this new dataset to. Must be a kwarg.
- Returns
- response: Dataset
The Dataset created from the uploaded data
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod create_from_query_generator(cls, generator_id, dataset_id=None, dataset_version_id=None, max_wait=600, *, use_cases=None)¶
A blocking call that creates a new Dataset from the query generator. Returns when the dataset has been successfully processed. If optional parameters are not specified the query is applied to the dataset_id and dataset_version_id stored in the query generator. If specified they will override the stored dataset_id/dataset_version_id, e.g. to prep a prediction dataset.
- Parameters
- generator_id: str
The id of the query generator to use.
- dataset_id: str, optional
The id of the dataset to apply the query to.
- dataset_version_id: str, optional
The id of the dataset version to apply the query to. If not specified the latest version associated with dataset_id (if specified) is used.
- max_waitint
optional, the maximum number of seconds to wait before giving up.
- use_cases: list[UseCase] | UseCase | list[string] | string, optional
A list of UseCase objects, UseCase object, list of Use Case IDs or a single Use Case ID to add this new dataset to. Must be a kwarg.
- Returns
- response: Dataset
The Dataset created from the query generator
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod get(dataset_id)¶
Get information about a dataset.
- Parameters
- dataset_idstring
the id of the dataset
- Returns
- datasetDataset
the queried dataset
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod delete(dataset_id)¶
Soft deletes a dataset. You cannot get it or list it or do actions with it, except for un-deleting it.
- Parameters
- dataset_id: string
The id of the dataset to mark for deletion
- Returns
- None
- Return type
None
- classmethod un_delete(dataset_id)¶
Un-deletes a previously deleted dataset. If the dataset was not deleted, nothing happens.
- Parameters
- dataset_id: string
The id of the dataset to un-delete
- Returns
- None
- Return type
None
- classmethod list(category=None, filter_failed=None, order_by=None, use_cases=None)¶
List all datasets a user can view.
- Parameters
- category: string, optional
Optional. If specified, only dataset versions that have the specified category will be included in the results. Categories identify the intended use of the dataset; supported categories are “TRAINING” and “PREDICTION”.
- filter_failed: bool, optional
If unset, uses the server default: False. Whether datasets that failed during import should be excluded from the results. If True invalid datasets will be excluded.
- order_by: string, optional
If unset, uses the server default: “-created”. Sorting order which will be applied to catalog list, valid options are: - “created” – ascending order by creation datetime; - “-created” – descending order by creation datetime.
- use_cases: Union[UseCase, List[UseCase], str, List[str]], optional
Filter available datasets by a specific Use Case or Cases. Accepts either the entity or the ID.
- Returns
- list[Dataset]
a list of datasets the user can view
- Return type
List
[TypeVar
(TDataset
, bound=Dataset
)]
- classmethod iterate(offset=None, limit=None, category=None, order_by=None, filter_failed=None, use_cases=None)¶
Get an iterator for the requested datasets a user can view. This lazily retrieves results. It does not get the next page from the server until the current page is exhausted.
- Parameters
- offset: int, optional
If set, this many results will be skipped
- limit: int, optional
Specifies the size of each page retrieved from the server. If unset, uses the server default.
- category: string, optional
Optional. If specified, only dataset versions that have the specified category will be included in the results. Categories identify the intended use of the dataset; supported categories are “TRAINING” and “PREDICTION”.
- filter_failed: bool, optional
If unset, uses the server default: False. Whether datasets that failed during import should be excluded from the results. If True invalid datasets will be excluded.
- order_by: string, optional
If unset, uses the server default: “-created”. Sorting order which will be applied to catalog list, valid options are: - “created” – ascending order by creation datetime; - “-created” – descending order by creation datetime.
- use_cases: Union[UseCase, List[UseCase], str, List[str]], optional
Filter available datasets by a specific Use Case or Cases. Accepts either the entity or the ID.
- Yields
- Dataset
An iterator of the datasets the user can view
- Return type
Generator
[TypeVar
(TDataset
, bound=Dataset
),None
,None
]
- update()¶
Updates the Dataset attributes in place with the latest information from the server.
- Returns
- None
- Return type
None
- modify(name=None, categories=None)¶
Modifies the Dataset name and/or categories. Updates the object in place.
- Parameters
- name: string, optional
The new name of the dataset
- categories: list[string], optional
A list of strings describing the intended use of the dataset. The supported options are “TRAINING” and “PREDICTION”. If any categories were previously specified for the dataset, they will be overwritten.
- Returns
- None
- Return type
None
Modify the ability of users to access this dataset
- Parameters
- access_list: list ofclass:SharingAccess <datarobot.SharingAccess>
The modifications to make.
- apply_grant_to_linked_objects: bool
If true for any users being granted access to the dataset, grant the user read access to any linked objects such as DataSources and DataStores that may be used by this dataset. Ignored if no such objects are relevant for dataset, defaults to False.
- Raises
- datarobot.ClientError:
If you do not have permission to share this dataset, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the dataset without an owner.
Examples
Transfer access to the dataset from old_user@datarobot.com to new_user@datarobot.com
from datarobot.enums import SHARING_ROLE from datarobot.models.dataset import Dataset from datarobot.models.sharing import SharingAccess new_access = SharingAccess( "[email protected]", SHARING_ROLE.OWNER, can_share=True, ) access_list = [ SharingAccess( "[email protected]", SHARING_ROLE.OWNER, can_share=True, can_use_data=True, ), new_access, ] Dataset.get('my-dataset-id').share(access_list)
- Return type
None
- get_details()¶
Gets the details for this Dataset
- Returns
- DatasetDetails
- Return type
- get_all_features(order_by=None)¶
Get a list of all the features for this dataset.
- Parameters
- order_by: string, optional
If unset, uses the server default: ‘name’. How the features should be ordered. Can be ‘name’ or ‘featureType’.
- Returns
- list[DatasetFeature]
- Return type
List
[DatasetFeature
]
- iterate_all_features(offset=None, limit=None, order_by=None)¶
Get an iterator for the requested features of a dataset. This lazily retrieves results. It does not get the next page from the server until the current page is exhausted.
- Parameters
- offset: int, optional
If set, this many results will be skipped.
- limit: int, optional
Specifies the size of each page retrieved from the server. If unset, uses the server default.
- order_by: string, optional
If unset, uses the server default: ‘name’. How the features should be ordered. Can be ‘name’ or ‘featureType’.
- Yields
- DatasetFeature
- Return type
Generator
[DatasetFeature
,None
,None
]
- get_featurelists()¶
Get DatasetFeaturelists created on this Dataset
- Returns
- feature_lists: list[DatasetFeaturelist]
- Return type
List
[DatasetFeaturelist
]
- create_featurelist(name, features)¶
Create a new dataset featurelist
- Parameters
- namestr
the name of the modeling featurelist to create. Names must be unique within the dataset, or the server will return an error.
- featureslist of str
the names of the features to include in the dataset featurelist. Each feature must be a dataset feature.
- Returns
- featurelistDatasetFeaturelist
the newly created featurelist
Examples
dataset = Dataset.get('1234deadbeeffeeddead4321') dataset_features = dataset.get_all_features() selected_features = [feat.name for feat in dataset_features][:5] # select first five new_flist = dataset.create_featurelist('Simple Features', selected_features)
- Return type
- get_file(file_path=None, filelike=None)¶
Retrieves all the originally uploaded data in CSV form. Writes it to either the file or a filelike object that can write bytes.
Only one of file_path or filelike can be provided and it must be provided as a keyword argument (i.e. file_path=’path-to-write-to’). If a file-like object is provided, the user is responsible for closing it when they are done.
The user must also have permission to download data.
- Parameters
- file_path: string, optional
The destination to write the file to.
- filelike: file, optional
A file-like object to write to. The object must be able to write bytes. The user is responsible for closing the object
- Returns
- None
- Return type
None
- get_as_dataframe(low_memory=False)¶
Retrieves all the originally uploaded data in a pandas DataFrame.
New in version v3.0.
- Parameters
- low_memory: bool, optional
If True, use local files to reduce memory usage which will be slower.
- Returns
- pd.DataFrame
- Return type
DataFrame
- get_projects()¶
Retrieves the Dataset’s projects as ProjectLocation named tuples.
- Returns
- locations: list[ProjectLocation]
- Return type
List
[ProjectLocation
]
- create_project(project_name=None, user=None, password=None, credential_id=None, use_kerberos=None, credential_data=None, *, use_cases=None)¶
Create a
datarobot.models.Project
from this dataset- Parameters
- project_name: string, optional
The name of the project to be created. If not specified, will be “Untitled Project” for database connections, otherwise the project name will be based on the file used.
- user: string, optional
The username for database authentication.
- password: string, optional
The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored
- credential_id: string, optional
The ID of the set of credentials to use instead of user and password.
- use_kerberos: bool, optional
Server default is False. If true, use kerberos authentication for database authentication.
- credential_data: dict, optional
The credentials to authenticate with the database, to use instead of user/password or credential ID.
- use_cases: list[UseCase] | UseCase | list[string] | string, optional
A list of UseCase objects, UseCase object, list of Use Case ids or a single Use Case id to add this new Dataset to. Must be a kwarg.
- Returns
- Project
- Return type
- classmethod create_version_from_file(dataset_id, file_path=None, filelike=None, categories=None, read_timeout=600, max_wait=600)¶
A blocking call that creates a new Dataset version from a file. Returns when the new dataset version has been successfully uploaded and processed.
Warning: This function does not clean up it’s open files. If you pass a filelike, you are responsible for closing it. If you pass a file_path, this will create a file object from the file_path but will not close it.
New in version v2.23.
- Parameters
- dataset_id: string
The ID of the dataset for which new version to be created
- file_path: string, optional
The path to the file. This will create a file object pointing to that file but will not close it.
- filelike: file, optional
An open and readable file object.
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- read_timeout: int, optional
The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete
- max_wait: int, optional
Time in seconds after which project creation is considered unsuccessful
- Returns
- response: Dataset
A fully armed and operational Dataset version
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod create_version_from_in_memory_data(dataset_id, data_frame=None, records=None, categories=None, read_timeout=600, max_wait=600)¶
A blocking call that creates a new Dataset version for a dataset from in-memory data. Returns when the dataset has been successfully uploaded and processed.
The data can be either a pandas DataFrame or a list of dictionaries with identical keys.
New in version v2.23.
- Parameters
- dataset_id: string
The ID of the dataset for which new version to be created
- data_frame: DataFrame, optional
The data frame to upload
- records: list[dict], optional
A list of dictionaries with identical keys to upload
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- read_timeout: int, optional
The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete
- max_wait: int, optional
Time in seconds after which project creation is considered unsuccessful
- Returns
- response: Dataset
The Dataset version created from the uploaded data
- Raises
- InvalidUsageError
If neither a DataFrame or list of records is passed.
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod create_version_from_url(dataset_id, url, categories=None, max_wait=600)¶
A blocking call that creates a new Dataset from data stored at a url for a given dataset. Returns when the dataset has been successfully uploaded and processed.
New in version v2.23.
- Parameters
- dataset_id: string
The ID of the dataset for which new version to be created
- url: string
The URL to use as the source of data for the dataset being created.
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- max_wait: int, optional
Time in seconds after which project creation is considered unsuccessful
- Returns
- response: Dataset
The Dataset version created from the uploaded data
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod create_version_from_data_source(dataset_id, data_source_id, username=None, password=None, categories=None, credential_id=None, use_kerberos=None, credential_data=None, max_wait=600)¶
A blocking call that creates a new Dataset from data stored at a DataSource. Returns when the dataset has been successfully uploaded and processed.
New in version v2.23.
- Parameters
- dataset_id: string
The ID of the dataset for which new version to be created
- data_source_id: string
The ID of the DataSource to use as the source of data.
- username: string, optional
The username for database authentication.
- password: string, optional
The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored.
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- credential_id: string, optional
The ID of the set of credentials to use instead of user and password. Note that with this change, username and password will become optional.
- use_kerberos: bool, optional
If unset, uses the server default: False. If true, use kerberos authentication for database authentication.
- credential_data: dict, optional
The credentials to authenticate with the database, to use instead of user/password or credential ID.
- max_wait: int, optional
Time in seconds after which project creation is considered unsuccessful
- Returns
- response: Dataset
The Dataset version created from the uploaded data
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters
- datadict
Correctly snake_cased keys and their values.
- Return type
TypeVar
(T
, bound=APIObject
)
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
TypeVar
(T
, bound=APIObject
)
- open_in_browser()¶
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type
None
- class datarobot.DatasetDetails(dataset_id, version_id, categories, created_by, created_at, data_source_type, error, is_latest_version, is_snapshot, is_data_engine_eligible, last_modification_date, last_modifier_full_name, name, uri, processing_state, data_persisted=None, data_engine_query_id=None, data_source_id=None, description=None, eda1_modification_date=None, eda1_modifier_full_name=None, feature_count=None, feature_count_by_type=None, row_count=None, size=None, tags=None, recipe_id=None, is_wrangling_eligible=None)¶
Represents a detailed view of a Dataset. The to_dataset method creates a Dataset from this details view.
- Attributes
- dataset_id: string
The ID of this dataset
- name: string
The name of this dataset in the catalog
- is_latest_version: bool
Whether this dataset version is the latest version of this dataset
- version_id: string
The object ID of the catalog_version the dataset belongs to
- categories: list(string)
An array of strings describing the intended use of the dataset. The supported options are “TRAINING” and “PREDICTION”.
- created_at: string
The date when the dataset was created
- created_by: string
Username of the user who created the dataset
- is_snapshot: bool
Whether the dataset version is an immutable snapshot of data which has previously been retrieved and saved to Data_robot
- data_persisted: bool, optional
If true, user is allowed to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.) and download data. If false, download is not allowed and only the data schema (feature names and types) will be available.
- is_data_engine_eligible: bool
Whether this dataset can be a data source of a data engine query.
- processing_state: string
Current ingestion process state of the dataset
- row_count: int, optional
The number of rows in the dataset.
- size: int, optional
The size of the dataset as a CSV in bytes.
- data_engine_query_id: string, optional
ID of the source data engine query
- data_source_id: string, optional
ID of the datasource used as the source of the dataset
- data_source_type: string
the type of the datasource that was used as the source of the dataset
- description: string, optional
the description of the dataset
- eda1_modification_date: string, optional
the ISO 8601 formatted date and time when the EDA1 for the dataset was updated
- eda1_modifier_full_name: string, optional
the user who was the last to update EDA1 for the dataset
- error: string
details of exception raised during ingestion process, if any
- feature_count: int, optional
total number of features in the dataset
- feature_count_by_type: list[FeatureTypeCount]
number of features in the dataset grouped by feature type
- last_modification_date: string
the ISO 8601 formatted date and time when the dataset was last modified
- last_modifier_full_name: string
full name of user who was the last to modify the dataset
- tags: list[string]
list of tags attached to the item
- uri: string
the uri to datasource like: - ‘file_name.csv’ - ‘jdbc:DATA_SOURCE_GIVEN_NAME/SCHEMA.TABLE_NAME’ - ‘jdbc:DATA_SOURCE_GIVEN_NAME/<query>’ - for query based datasources - ‘https://s3.amazonaws.com/datarobot_test/kickcars-sample-200.csv’ - etc.
- classmethod get(dataset_id)¶
Get details for a Dataset from the server
- Parameters
- dataset_id: str
The id for the Dataset from which to get details
- Returns
- DatasetDetails
- Return type
TypeVar
(TDatasetDetails
, bound=DatasetDetails
)
Data Engine Query Generator¶
- class datarobot.DataEngineQueryGenerator(**generator_kwargs)¶
DataEngineQueryGenerator is used to set up time series data prep.
New in version v2.27.
- Attributes
- id: str
id of the query generator
- query: str
text of the generated Spark SQL query
- datasets: list(QueryGeneratorDataset)
datasets associated with the query generator
- generator_settings: QueryGeneratorSettings
the settings used to define the query
- generator_type: str
“TimeSeries” is the only supported type
- classmethod create(generator_type, datasets, generator_settings)¶
Creates a query generator entity.
New in version v2.27.
- Parameters
- generator_typestr
Type of data engine query generator
- datasetsList[QueryGeneratorDataset]
Source datasets in the Data Engine workspace.
- generator_settingsdict
Data engine generator settings of the given generator_type.
- Returns
- query_generatorDataEngineQueryGenerator
The created generator
Examples
import datarobot as dr from datarobot.models.data_engine_query_generator import ( QueryGeneratorDataset, QueryGeneratorSettings, ) dataset = QueryGeneratorDataset( alias='My_Awesome_Dataset_csv', dataset_id='61093144cabd630828bca321', dataset_version_id=1, ) settings = QueryGeneratorSettings( datetime_partition_column='date', time_unit='DAY', time_step=1, default_numeric_aggregation_method='sum', default_categorical_aggregation_method='mostFrequent', ) g = dr.DataEngineQueryGenerator.create( generator_type='TimeSeries', datasets=[dataset], generator_settings=settings, ) g.id >>>'54e639a18bd88f08078ca831' g.generator_type >>>'TimeSeries'
- classmethod get(generator_id)¶
Gets information about a query generator.
- Parameters
- generator_idstr
The identifier of the query generator you want to load.
- Returns
- query_generatorDataEngineQueryGenerator
The queried generator
Examples
import datarobot as dr g = dr.DataEngineQueryGenerator.get(generator_id='54e639a18bd88f08078ca831') g.id >>>'54e639a18bd88f08078ca831' g.generator_type >>>'TimeSeries'
- create_dataset(dataset_id=None, dataset_version_id=None, max_wait=600)¶
A blocking call that creates a new Dataset from the query generator. Returns when the dataset has been successfully processed. If optional parameters are not specified the query is applied to the dataset_id and dataset_version_id stored in the query generator. If specified they will override the stored dataset_id/dataset_version_id, i.e. to prep a prediction dataset.
- Parameters
- dataset_id: str, optional
The id of the unprepped dataset to apply the query to
- dataset_version_id: str, optional
The version_id of the unprepped dataset to apply the query to
- Returns
- response: Dataset
The Dataset created from the query generator
- prepare_prediction_dataset_from_catalog(project_id, dataset_id, dataset_version_id=None, max_wait=600, relax_known_in_advance_features_check=None)¶
Apply time series data prep to a catalog dataset and upload it to the project as a PredictionDataset.
New in version v3.1.
- Parameters
- project_idstr
The id of the project to which you upload the prediction dataset.
- dataset_idstr
The identifier of the dataset.
- dataset_version_idstr, optional
The version id of the dataset to use.
- max_waitint, optional
Optional, the maximum number of seconds to wait before giving up.
- relax_known_in_advance_features_checkbool, optional
For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- Returns
- datasetPredictionDataset
The newly uploaded dataset.
- Return type
- prepare_prediction_dataset(sourcedata, project_id, max_wait=600, relax_known_in_advance_features_check=None)¶
Apply time series data prep and upload the PredictionDataset to the project.
New in version v3.1.
- Parameters
- sourcedatastr, file or pandas.DataFrame
Data to be used for predictions. If it is a string, it can be either a path to a local file, or raw file content. If using a file on disk, the filename must consist of ASCII characters only.
- project_idstr
The id of the project to which you upload the prediction dataset.
- max_waitint, optional
The maximum number of seconds to wait for the uploaded dataset to be processed before raising an error.
- relax_known_in_advance_features_checkbool, optional
For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- Returns
- ——-
- datasetPredictionDataset
The newly uploaded dataset.
- Raises
- InputNotUnderstoodError
Raised if
sourcedata
isn’t one of supported types.- AsyncFailureError
Raised if polling for the status of an async process resulted in a response with an unsupported status code.
- AsyncProcessUnsuccessfulError
Raised if project creation was unsuccessful (i.e. the server reported an error in uploading the dataset).
- AsyncTimeoutError
Raised if processing the uploaded dataset took more time than specified by the
max_wait
parameter.
- Return type
Data Store¶
- class datarobot.models.data_store.TestResponse() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
- class datarobot.models.data_store.SchemasResponse() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
- class datarobot.models.data_store.TablesResponse() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
Datetime Trend Plots¶
- class datarobot.models.datetime_trend_plots.AccuracyOverTimePlotsMetadata(project_id, model_id, forecast_distance, resolutions, backtest_metadata, holdout_metadata, backtest_statuses, holdout_statuses)¶
Accuracy over Time metadata for datetime model.
New in version v2.25.
Notes
Backtest/holdout status is a dict containing the following:
- training: string
Status backtest/holdout training. One of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
- validation: string
Status backtest/holdout validation. One of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
Backtest/holdout metadata is a dict containing the following:
- training: dict
Start and end dates for the backtest/holdout training.
- validation: dict
Start and end dates for the backtest/holdout validation.
Each dict in the training and validation in backtest/holdout metadata is structured like:
- start_date: datetime.datetime or None
The datetime of the start of the chart data (inclusive). None if chart data is not computed.
- end_date: datetime.datetime or None
The datetime of the end of the chart data (exclusive). None if chart data is not computed.
- Attributes
- project_id: string
The project ID.
- model_id: string
The model ID.
- forecast_distance: int or None
The forecast distance for which the metadata was retrieved. None for OTV projects.
- resolutions: list of string
A list of
datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
, which represents available time resolutions for which plots can be retrieved.- backtest_metadata: list of dict
List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details.
- holdout_metadata: dict
Holdout metadata dict. See backtest/holdout metadata info in Notes for more details.
- backtest_statuses: list of dict
List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details.
- holdout_statuses: dict
Holdout status dict. See backtest/holdout status info in Notes for more details.
- class datarobot.models.datetime_trend_plots.AccuracyOverTimePlot(project_id, model_id, start_date, end_date, resolution, bins, statistics, calendar_events)¶
Accuracy over Time plot for datetime model.
New in version v2.25.
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
- actual: float or None
Average actual value of the target in the bin. None if there are no entries in the bin.
- predicted: float or None
Average prediction of the model in the bin. None if there are no entries in the bin.
- frequency: int or None
Indicates number of values averaged in bin.
Statistics is a dict containing the following:
- durbin_watson: float or None
The Durbin-Watson statistic for the chart data. Value is between 0 and 4. Durbin-Watson statistic is a test statistic used to detect the presence of autocorrelation at lag 1 in the residuals (prediction errors) from a regression analysis. More info https://wikipedia.org/wiki/Durbin%E2%80%93Watson_statistic
Calendar event is a dict containing the following:
- name: string
Name of the calendar event.
- date: datetime
Date of the calendar event.
- series_id: string or None
The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.
- Attributes
- project_id: string
The project ID.
- model_id: string
The model ID.
- resolution: string
The resolution that is used for binning. One of
datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
- start_date: datetime.datetime
The datetime of the start of the chart data (inclusive).
- end_date: datetime.datetime
The datetime of the end of the chart data (exclusive).
- bins: list of dict
List of plot bins. See bin info in Notes for more details.
- statistics: dict
Statistics for plot. See statistics info in Notes for more details.
- calendar_events: list of dict
List of calendar events for the plot. See calendar events info in Notes for more details.
- class datarobot.models.datetime_trend_plots.AccuracyOverTimePlotPreview(project_id, model_id, start_date, end_date, bins)¶
Accuracy over Time plot preview for datetime model.
New in version v2.25.
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
- actual: float or None
Average actual value of the target in the bin. None if there are no entries in the bin.
- predicted: float or None
Average prediction of the model in the bin. None if there are no entries in the bin.
- Attributes
- project_id: string
The project ID.
- model_id: string
The model ID.
- start_date: datetime.datetime
The datetime of the start of the chart data (inclusive).
- end_date: datetime.datetime
The datetime of the end of the chart data (exclusive).
- bins: list of dict
List of plot bins. See bin info in Notes for more details.
- class datarobot.models.datetime_trend_plots.ForecastVsActualPlotsMetadata(project_id, model_id, resolutions, backtest_metadata, holdout_metadata, backtest_statuses, holdout_statuses)¶
Forecast vs Actual plots metadata for datetime model.
New in version v2.25.
Notes
Backtest/holdout status is a dict containing the following:
- training: dict
Dict containing each of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
as dict key, and list of forecast distances for particular status as dict value.
- validation: dict
Dict containing each of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
as dict key, and list of forecast distances for particular status as dict value.
Backtest/holdout metadata is a dict containing the following:
- training: dict
Start and end dates for the backtest/holdout training.
- validation: dict
Start and end dates for the backtest/holdout validation.
Each dict in the training and validation in backtest/holdout metadata is structured like:
- start_date: datetime.datetime or None
The datetime of the start of the chart data (inclusive). None if chart data is not computed.
- end_date: datetime.datetime or None
The datetime of the end of the chart data (exclusive). None if chart data is not computed.
- Attributes
- project_id: string
The project ID.
- model_id: string
The model ID.
- resolutions: list of string
A list of
datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
, which represents available time resolutions for which plots can be retrieved.- backtest_metadata: list of dict
List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details.
- holdout_metadata: dict
Holdout metadata dict. See backtest/holdout metadata info in Notes for more details.
- backtest_statuses: list of dict
List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details.
- holdout_statuses: dict
Holdout status dict. See backtest/holdout status info in Notes for more details.
- class datarobot.models.datetime_trend_plots.ForecastVsActualPlot(project_id, model_id, forecast_distances, start_date, end_date, resolution, bins, calendar_events)¶
Forecast vs Actual plot for datetime model.
New in version v2.25.
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
- actual: float or None
Average actual value of the target in the bin. None if there are no entries in the bin.
- forecasts: list of float
A list of average forecasts for the model for each forecast distance. Empty if there are no forecasts in the bin. Each index in the forecasts list maps to forecastDistances list index.
- error: float or None
Average absolute residual value of the bin. None if there are no entries in the bin.
- normalized_error: float or None
Normalized average absolute residual value of the bin. None if there are no entries in the bin.
- frequency: int or None
Indicates number of values averaged in bin.
Calendar event is a dict containing the following:
- name: string
Name of the calendar event.
- date: datetime
Date of the calendar event.
- series_id: string or None
The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.
- Attributes
- project_id: string
The project ID.
- model_id: string
The model ID.
- forecast_distances: list of int
A list of forecast distances that were retrieved.
- resolution: string
The resolution that is used for binning. One of
datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
- start_date: datetime.datetime
The datetime of the start of the chart data (inclusive).
- end_date: datetime.datetime
The datetime of the end of the chart data (exclusive).
- bins: list of dict
List of plot bins. See bin info in Notes for more details.
- calendar_events: list of dict
List of calendar events for the plot. See calendar events info in Notes for more details.
- class datarobot.models.datetime_trend_plots.ForecastVsActualPlotPreview(project_id, model_id, start_date, end_date, bins)¶
Forecast vs Actual plot preview for datetime model.
New in version v2.25.
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
- actual: float or None
Average actual value of the target in the bin. None if there are no entries in the bin.
- predicted: float or None
Average prediction of the model in the bin. None if there are no entries in the bin.
- Attributes
- project_id: string
The project ID.
- model_id: string
The model ID.
- start_date: datetime.datetime
The datetime of the start of the chart data (inclusive).
- end_date: datetime.datetime
The datetime of the end of the chart data (exclusive).
- bins: list of dict
List of plot bins. See bin info in Notes for more details.
- class datarobot.models.datetime_trend_plots.AnomalyOverTimePlotsMetadata(project_id, model_id, resolutions, backtest_metadata, holdout_metadata, backtest_statuses, holdout_statuses)¶
Anomaly over Time metadata for datetime model.
New in version v2.25.
Notes
Backtest/holdout status is a dict containing the following:
- training: string
Status backtest/holdout training. One of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
- validation: string
Status backtest/holdout validation. One of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
Backtest/holdout metadata is a dict containing the following:
- training: dict
Start and end dates for the backtest/holdout training.
- validation: dict
Start and end dates for the backtest/holdout validation.
Each dict in the training and validation in backtest/holdout metadata is structured like:
- start_date: datetime.datetime or None
The datetime of the start of the chart data (inclusive). None if chart data is not computed.
- end_date: datetime.datetime or None
The datetime of the end of the chart data (exclusive). None if chart data is not computed.
- Attributes
- project_id: string
The project ID.
- model_id: string
The model ID.
- resolutions: list of string
A list of
datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
, which represents available time resolutions for which plots can be retrieved.- backtest_metadata: list of dict
List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details.
- holdout_metadata: dict
Holdout metadata dict. See backtest/holdout metadata info in Notes for more details.
- backtest_statuses: list of dict
List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details.
- holdout_statuses: dict
Holdout status dict. See backtest/holdout status info in Notes for more details.
- class datarobot.models.datetime_trend_plots.AnomalyOverTimePlot(project_id, model_id, start_date, end_date, resolution, bins, calendar_events)¶
Anomaly over Time plot for datetime model.
New in version v2.25.
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
- predicted: float or None
Average prediction of the model in the bin. None if there are no entries in the bin.
- frequency: int or None
Indicates number of values averaged in bin.
Calendar event is a dict containing the following:
- name: string
Name of the calendar event.
- date: datetime
Date of the calendar event.
- series_id: string or None
The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.
- Attributes
- project_id: string
The project ID.
- model_id: string
The model ID.
- resolution: string
The resolution that is used for binning. One of
datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
- start_date: datetime.datetime
The datetime of the start of the chart data (inclusive).
- end_date: datetime.datetime
The datetime of the end of the chart data (exclusive).
- bins: list of dict
List of plot bins. See bin info in Notes for more details.
- calendar_events: list of dict
List of calendar events for the plot. See calendar events info in Notes for more details.
- class datarobot.models.datetime_trend_plots.AnomalyOverTimePlotPreview(project_id, model_id, prediction_threshold, start_date, end_date, bins)¶
Anomaly over Time plot preview for datetime model.
New in version v2.25.
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
- Attributes
- project_id: string
The project ID.
- model_id: string
The model ID.
- prediction_threshold: float
Only bins with predictions exceeding this threshold are returned in the response.
- start_date: datetime.datetime
The datetime of the start of the chart data (inclusive).
- end_date: datetime.datetime
The datetime of the end of the chart data (exclusive).
- bins: list of dict
List of plot bins. See bin info in Notes for more details.
Deployment¶
- class datarobot.models.Deployment(id, label=None, description=None, status=None, default_prediction_server=None, model=None, capabilities=None, prediction_usage=None, permissions=None, service_health=None, model_health=None, accuracy_health=None, importance=None, fairness_health=None, governance=None, owners=None, prediction_environment=None)¶
A deployment created from a DataRobot model.
- Attributes
- idstr
the id of the deployment
- labelstr
the label of the deployment
- descriptionstr
the description of the deployment
- statusstr
(New in version v2.29) deployment status
- default_prediction_serverdict
Information about the default prediction server for the deployment. Accepts the following values:
id: str. Prediction server ID.
url: str, optional. Prediction server URL.
datarobot-key: str. Corresponds the to the
PredictionServer
’s “snake_cased”datarobot_key
parameter that allows you to verify and access the prediction server.
- importancestr, optional
deployment importance
- modeldict
information on the model of the deployment
- capabilitiesdict
information on the capabilities of the deployment
- prediction_usagedict
information on the prediction usage of the deployment
- permissionslist
(New in version v2.18) user’s permissions on the deployment
- service_healthdict
information on the service health of the deployment
- model_healthdict
information on the model health of the deployment
- accuracy_healthdict
information on the accuracy health of the deployment
- fairness_healthdict
information on the fairness health of a deployment
- governancedict
information on approval and change requests of a deployment
- ownersdict
information on the owners of a deployment
- prediction_environmentdict
information on the prediction environment of a deployment
- classmethod create_from_learning_model(model_id, label, description=None, default_prediction_server_id=None, importance=None, prediction_threshold=None, status=None)¶
Create a deployment from a DataRobot model.
New in version v2.17.
- Parameters
- model_idstr
id of the DataRobot model to deploy
- labelstr
a human-readable label of the deployment
- descriptionstr, optional
a human-readable description of the deployment
- default_prediction_server_idstr, optional
an identifier of a prediction server to be used as the default prediction server
- importancestr, optional
deployment importance
- prediction_thresholdfloat, optional
threshold used for binary classification in predictions
- statusstr, optional
deployment status
- Returns
- deploymentDeployment
The created deployment
Examples
from datarobot import Project, Deployment project = Project.get('5506fcd38bd88f5953219da0') model = project.get_models()[0] deployment = Deployment.create_from_learning_model(model.id, 'New Deployment') deployment >>> Deployment('New Deployment')
- Return type
TypeVar
(TDeployment
, bound=Deployment
)
- classmethod create_from_custom_model_version(custom_model_version_id, label, description=None, default_prediction_server_id=None, max_wait=600, importance=None)¶
Create a deployment from a DataRobot custom model image.
- Parameters
- custom_model_version_idstr
id of the DataRobot custom model version to deploy The version must have a base_environment_id.
- labelstr
a human readable label of the deployment
- descriptionstr, optional
a human readable description of the deployment
- default_prediction_server_idstr, optional
an identifier of a prediction server to be used as the default prediction server
- max_waitint, optional
seconds to wait for successful resolution of a deployment creation job. Deployment supports making predictions only after a deployment creating job has successfully finished
- importancestr, optional
deployment importance
- Returns
- deploymentDeployment
The created deployment
- Return type
TypeVar
(TDeployment
, bound=Deployment
)
- classmethod list(order_by=None, search=None, filters=None)¶
List all deployments a user can view.
New in version v2.17.
- Parameters
- order_bystr, optional
(New in version v2.18) the order to sort the deployment list by, defaults to label
Allowed attributes to sort by are:
label
serviceHealth
modelHealth
accuracyHealth
recentPredictions
lastPredictionTimestamp
If the sort attribute is preceded by a hyphen, deployments will be sorted in descending order, otherwise in ascending order.
For health related sorting, ascending means failing, warning, passing, unknown.
- searchstr, optional
(New in version v2.18) case insensitive search against deployment’s label and description.
- filtersdatarobot.models.deployment.DeploymentListFilters, optional
(New in version v2.20) an object containing all filters that you’d like to apply to the resulting list of deployments. See
DeploymentListFilters
for details on usage.
- Returns
- deploymentslist
a list of deployments the user can view
Examples
from datarobot import Deployment deployments = Deployment.list() deployments >>> [Deployment('New Deployment'), Deployment('Previous Deployment')]
from datarobot import Deployment from datarobot.enums import DEPLOYMENT_SERVICE_HEALTH_STATUS filters = DeploymentListFilters( role='OWNER', service_health=[DEPLOYMENT_SERVICE_HEALTH.FAILING] ) filtered_deployments = Deployment.list(filters=filters) filtered_deployments >>> [Deployment('Deployment I Own w/ Failing Service Health')]
- Return type
List
[TypeVar
(TDeployment
, bound=Deployment
)]
- classmethod get(deployment_id)¶
Get information about a deployment.
New in version v2.17.
- Parameters
- deployment_idstr
the id of the deployment
- Returns
- deploymentDeployment
the queried deployment
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') deployment.id >>>'5c939e08962d741e34f609f0' deployment.label >>>'New Deployment'
- Return type
TypeVar
(TDeployment
, bound=Deployment
)
- predict_batch(source, passthrough_columns=None, download_timeout=None, download_read_timeout=None, upload_read_timeout=None)¶
A convenience method for making predictions with csv file or pandas DataFrame using a batch prediction job.
For advanced usage, use
datarobot.models.BatchPredictionJob
directly.New in version v3.0.
- Parameters
- source: str, pd.DataFrame or file object
Pass a filepath, file, or DataFrame for making batch predictions.
- passthrough_columnslist[string] (optional)
Keep these columns from the scoring dataset in the scored dataset. This is useful for correlating predictions with source data.
- download_timeout: int, optional
Wait this many seconds for the download to become available. See
datarobot.models.BatchPredictionJob.score()
.- download_read_timeout: int, optional
Wait this many seconds for the server to respond between chunks. See
datarobot.models.BatchPredictionJob.score()
.- upload_read_timeout: int, optional
Wait this many seconds for the server to respond after a whole dataset upload. See
datarobot.models.BatchPredictionJob.score()
.
- Returns
- pd.DataFrame
Prediction results in a pandas DataFrame.
- Raises
- InvalidUsageError
If the source parameter cannot be determined to be a filepath, file, or DataFrame.
Examples
from datarobot.models.deployment import Deployment deployment = Deployment.get("<MY_DEPLOYMENT_ID>") prediction_results_as_dataframe = deployment.predict_batch( source="./my_local_file.csv", )
- Return type
DataFrame
- get_uri()¶
- Returns
- urlstr
Deployment’s overview URI
- Return type
str
- update(label=None, description=None, importance=None)¶
Update the label and description of this deployment.
New in version v2.19.
- Return type
None
- delete()¶
Delete this deployment.
New in version v2.17.
- Return type
None
- activate(max_wait=600)¶
Activates this deployment. When succeeded, deployment status become active.
New in version v2.29.
- Parameters
- max_waitint, optional
The maximum time to wait for deployment activation to complete before erroring
- Return type
None
- deactivate(max_wait=600)¶
Deactivates this deployment. When succeeded, deployment status become inactive.
New in version v2.29.
- Parameters
- max_waitint, optional
The maximum time to wait for deployment deactivation to complete before erroring
- Return type
None
- replace_model(new_model_id, reason, max_wait=600)¶
- Replace the model used in this deployment. To confirm model replacement eligibility, use
validate_replacement_model()
beforehand.
New in version v2.17.
Model replacement is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.
Predictions made against this deployment will start using the new model as soon as the request is completed. There will be no interruption for predictions throughout the process.
- Parameters
- new_model_idstr
The id of the new model to use. If replacing the deployment’s model with a CustomInferenceModel, a specific CustomModelVersion ID must be used.
- reasonMODEL_REPLACEMENT_REASON
The reason for the model replacement. Must be one of ‘ACCURACY’, ‘DATA_DRIFT’, ‘ERRORS’, ‘SCHEDULED_REFRESH’, ‘SCORING_SPEED’, or ‘OTHER’. This value will be stored in the model history to keep track of why a model was replaced
- max_waitint, optional
(new in version 2.22) The maximum time to wait for model replacement job to complete before erroring
Examples
from datarobot import Deployment from datarobot.enums import MODEL_REPLACEMENT_REASON deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') deployment.model['id'], deployment.model['type'] >>>('5c0a979859b00004ba52e431', 'Decision Tree Classifier (Gini)') deployment.replace_model('5c0a969859b00004ba52e41b', MODEL_REPLACEMENT_REASON.ACCURACY) deployment.model['id'], deployment.model['type'] >>>('5c0a969859b00004ba52e41b', 'Support Vector Classifier (Linear Kernel)')
- Return type
None
- validate_replacement_model(new_model_id)¶
Validate a model can be used as the replacement model of the deployment.
New in version v2.17.
- Parameters
- new_model_idstr
the id of the new model to validate
- Returns
- statusstr
status of the validation, will be one of ‘passing’, ‘warning’ or ‘failing’. If the status is passing or warning, use
replace_model()
to perform a model replacement. If the status is failing, refer tochecks
for more detail on why the new model cannot be used as a replacement.- messagestr
message for the validation result
- checksdict
explain why the new model can or cannot replace the deployment’s current model
- Return type
Tuple
[str
,str
,Dict
[str
,Any
]]
- get_features()¶
Retrieve the list of features needed to make predictions on this deployment.
- Returns
- features: list
a list of feature dict
Notes
Each feature dict contains the following structure:
name
: str, feature namefeature_type
: str, feature typeimportance
: float, numeric measure of the relationship strength between the feature and target (independent of model or other features)date_format
: str or None, the date format string for how this feature was interpreted, null if not a date feature, compatible with https://docs.python.org/2/library/time.html#time.strftime.known_in_advance
: bool, whether the feature was selected as known in advance in a time series model, false for non-time series models.
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') features = deployment.get_features() features[0]['feature_type'] >>>'Categorical' features[0]['importance'] >>>0.133
- Return type
List
[FeatureDict
]
- submit_actuals(data, batch_size=10000)¶
Submit actuals for processing. The actuals submitted will be used to calculate accuracy metrics.
- Parameters
- data: list or pandas.DataFrame
- batch_size: the max number of actuals in each request
- If `data` is a list, each item should be a dict-like object with the following keys and
- values; if `data` is a pandas.DataFrame, it should contain the following columns:
- - association_id: str, a unique identifier used with a prediction,
max length 128 characters
- - actual_value: str or int or float, the actual value of a prediction;
should be numeric for deployments with regression models or string for deployments with classification model
- - was_acted_on: bool, optional, indicates if the prediction was acted on in a way that
could have affected the actual outcome
- - timestamp: datetime or string in RFC3339 format, optional. If the datetime provided
does not have a timezone, we assume it is UTC.
- Raises
- ValueError
if input data is not a list of dict-like objects or a pandas.DataFrame if input data is empty
Examples
from datarobot import Deployment, AccuracyOverTime deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') data = [{ 'association_id': '439917', 'actual_value': 'True', 'was_acted_on': True }] deployment.submit_actuals(data)
- Return type
None
- submit_actuals_from_catalog_async(dataset_id, actual_value_column, association_id_column, dataset_version_id=None, timestamp_column=None, was_acted_on_column=None)¶
Submit actuals from AI Catalog for processing. The actuals submitted will be used to calculate accuracy metrics.
- Parameters
- dataset_id: str,
The ID of the source dataset.
- dataset_version_id: str, optional
The ID of the dataset version to apply the query to. If not specified, the latest version associated with dataset_id is used.
- association_id_column: str,
The name of the column that contains a unique identifier used with a prediction.
- actual_value_column: str,
The name of the column that contains the actual value of a prediction.
- was_acted_on_column: str, optional,
The name of the column that indicates if the prediction was acted on in a way that could have affected the actual outcome.
- timestamp_column: str, optional,
The name of the column that contains datetime or string in RFC3339 format.
- Returns
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Raises
- ValueError
if dataset_id not provided if actual_value_column not provided if association_id_column not provided
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') status_check_job = deployment.submit_actuals_from_catalog_async(data)
- Return type
- get_predictions_by_forecast_date_settings()¶
Retrieve predictions by forecast date settings of this deployment.
New in version v2.27.
- Returns
- settingsdict
Predictions by forecast date settings of the deployment is a dict with the following format:
- enabledbool
Is ‘’True’’ if predictions by forecast date is enabled for this deployment. To update this setting, see
update_predictions_by_forecast_date_settings()
- column_namestring
The column name in prediction datasets to be used as forecast date.
- datetime_formatstring
The datetime format of the forecast date column in prediction datasets.
- Return type
- update_predictions_by_forecast_date_settings(enable_predictions_by_forecast_date, forecast_date_column_name=None, forecast_date_format=None, max_wait=600)¶
Update predictions by forecast date settings of this deployment.
New in version v2.27.
Updating predictions by forecast date setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.
- Parameters
- enable_predictions_by_forecast_datebool
set to ‘’True’’ if predictions by forecast date is to be turned on or set to ‘’False’’ if predictions by forecast date is to be turned off.
- forecast_date_column_name: string, optional
The column name in prediction datasets to be used as forecast date. If ‘’enable_predictions_by_forecast_date’’ is set to ‘’False’’, then the parameter will be ignored.
- forecast_date_format: string, optional
The datetime format of the forecast date column in prediction datasets. If ‘’enable_predictions_by_forecast_date’’ is set to ‘’False’’, then the parameter will be ignored.
- max_waitint, optional
seconds to wait for successful
Examples
# To set predictions by forecast date settings to the same default settings you see when using # the DataRobot web application, you use your 'Deployment' object like this: deployment.update_predictions_by_forecast_date_settings( enable_predictions_by_forecast_date=True, forecast_date_column_name="date (actual)", forecast_date_format="%Y-%m-%d", )
- Return type
None
- get_challenger_models_settings()¶
Retrieve challenger models settings of this deployment.
New in version v2.27.
- Returns
- settingsdict
Challenger models settings of the deployment is a dict with the following format:
- enabledbool
Is ‘’True’’ if challenger models is enabled for this deployment. To update existing ‘’challenger_models’’ settings, see
update_challenger_models_settings()
- Return type
- update_challenger_models_settings(challenger_models_enabled, max_wait=600)¶
Update challenger models settings of this deployment.
New in version v2.27.
Updating challenger models setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.
- Parameters
- challenger_models_enabledbool
set to ‘’True’’ if challenger models is to be turned on or set to ‘’False’’ if challenger models is to be turned off
- max_waitint, optional
seconds to wait for successful resolution
- Return type
None
- get_segment_analysis_settings()¶
Retrieve segment analysis settings of this deployment.
New in version v2.27.
- Returns
- settingsdict
Segment analysis settings of the deployment containing two items with keys
enabled
andattributes
, which are further described below.- enabledbool
Set to ‘’True’’ if segment analysis is enabled for this deployment. To update existing setting, see
update_segment_analysis_settings()
- attributeslist
To create or update existing segment analysis attributes, see
update_segment_analysis_settings()
- Return type
- update_segment_analysis_settings(segment_analysis_enabled, segment_analysis_attributes=None, max_wait=600)¶
Update segment analysis settings of this deployment.
New in version v2.27.
Updating segment analysis setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.
- Parameters
- segment_analysis_enabledbool
set to ‘’True’’ if segment analysis is to be turned on or set to ‘’False’’ if segment analysis is to be turned off
- segment_analysis_attributes: list, optional
A list of strings that gives the segment attributes selected for tracking.
- max_waitint, optional
seconds to wait for successful resolution
- Return type
None
- get_bias_and_fairness_settings()¶
Retrieve bias and fairness settings of this deployment.
..versionadded:: v3.2.0
- Returns
- settingsdict in the following format:
- protected_featuresList[str]
A list of features to mark as protected.
- preferable_target_valuebool
A target value that should be treated as a positive outcome for the prediction.
- fairness_metric_setstr
Can be one of <datarobot.enums.FairnessMetricsSet>. A set of fairness metrics to use for calculating fairness.
- fairness_thresholdfloat
Threshold value of the fairness metric. Cannot be less than 0 or greater than 1.
- Return type
Optional
[BiasAndFairnessSettings
]
- update_bias_and_fairness_settings(protected_features, fairness_metric_set, fairness_threshold, preferable_target_value, max_wait=600)¶
Update bias and fairness settings of this deployment.
..versionadded:: v3.2.0
Updating bias and fairness setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.
- Parameters
- protected_featuresList[str]
A list of features to mark as protected.
- preferable_target_valuebool
A target value that should be treated as a positive outcome for the prediction.
- fairness_metric_setstr
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- fairness_thresholdfloat
Threshold value of the fairness metric. Cannot be less than 0 or greater than 1.
- max_waitint, optional
seconds to wait for successful resolution
- Return type
None
- get_drift_tracking_settings()¶
Retrieve drift tracking settings of this deployment.
New in version v2.17.
- Returns
- settingsdict
Drift tracking settings of the deployment containing two nested dicts with key
target_drift
andfeature_drift
, which are further described below.Target drift
setting contains:- enabledbool
If target drift tracking is enabled for this deployment. To create or update existing ‘’target_drift’’ settings, see
update_drift_tracking_settings()
Feature drift
setting contains:- enabledbool
If feature drift tracking is enabled for this deployment. To create or update existing ‘’feature_drift’’ settings, see
update_drift_tracking_settings()
- Return type
- update_drift_tracking_settings(target_drift_enabled=None, feature_drift_enabled=None, max_wait=600)¶
Update drift tracking settings of this deployment.
New in version v2.17.
Updating drift tracking setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.
- Parameters
- target_drift_enabledbool, optional
if target drift tracking is to be turned on
- feature_drift_enabledbool, optional
if feature drift tracking is to be turned on
- max_waitint, optional
seconds to wait for successful resolution
- Return type
None
- get_association_id_settings()¶
Retrieve association ID setting for this deployment.
New in version v2.19.
- Returns
- association_id_settingsdict in the following format:
- column_nameslist[string], optional
name of the columns to be used as association ID,
- required_in_prediction_requestsbool, optional
whether the association ID column is required in prediction requests
- Return type
str
- update_association_id_settings(column_names=None, required_in_prediction_requests=None, max_wait=600)¶
Update association ID setting for this deployment.
New in version v2.19.
- Parameters
- column_nameslist[string], optional
name of the columns to be used as association ID, currently only support a list of one string
- required_in_prediction_requestsbool, optional
whether the association ID column is required in prediction requests
- max_waitint, optional
seconds to wait for successful resolution
- Return type
None
- get_predictions_data_collection_settings()¶
Retrieve predictions data collection settings of this deployment.
New in version v2.21.
- Returns
- predictions_data_collection_settingsdict in the following format:
- enabledbool
If predictions data collection is enabled for this deployment. To update existing ‘’predictions_data_collection’’ settings, see
update_predictions_data_collection_settings()
- Return type
Dict
[str
,bool
]
- update_predictions_data_collection_settings(enabled, max_wait=600)¶
Update predictions data collection settings of this deployment.
New in version v2.21.
Updating predictions data collection setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.
- Parameters
- enabled: bool
if predictions data collection is to be turned on
- max_waitint, optional
seconds to wait for successful resolution
- Return type
None
- get_prediction_warning_settings()¶
Retrieve prediction warning settings of this deployment.
New in version v2.19.
- Returns
- settingsdict in the following format:
- enabledbool
If target prediction_warning is enabled for this deployment. To create or update existing ‘’prediction_warning’’ settings, see
update_prediction_warning_settings()
- custom_boundariesdict or None
- If None default boundaries for a model are used. Otherwise has following keys:
- upperfloat
All predictions greater than provided value are considered anomalous
- lowerfloat
All predictions less than provided value are considered anomalous
- Return type
- update_prediction_warning_settings(prediction_warning_enabled, use_default_boundaries=None, lower_boundary=None, upper_boundary=None, max_wait=600)¶
Update prediction warning settings of this deployment.
New in version v2.19.
- Parameters
- prediction_warning_enabledbool
If prediction warnings should be turned on.
- use_default_boundariesbool, optional
If default boundaries of the model should be used for the deployment.
- upper_boundaryfloat, optional
All predictions greater than provided value will be considered anomalous
- lower_boundaryfloat, optional
All predictions less than provided value will be considered anomalous
- max_waitint, optional
seconds to wait for successful resolution
- Return type
None
- get_prediction_intervals_settings()¶
Retrieve prediction intervals settings for this deployment.
New in version v2.19.
- Returns
- dict in the following format:
- enabledbool
Whether prediction intervals are enabled for this deployment
- percentileslist[int]
List of enabled prediction intervals’ sizes for this deployment. Currently we only support one percentile at a time.
Notes
Note that prediction intervals are only supported for time series deployments.
- Return type
- update_prediction_intervals_settings(percentiles, enabled=True, max_wait=600)¶
Update prediction intervals settings for this deployment.
New in version v2.19.
- Parameters
- percentileslist[int]
The prediction intervals percentiles to enable for this deployment. Currently we only support setting one percentile at a time.
- enabledbool, optional (defaults to True)
Whether to enable showing prediction intervals in the results of predictions requested using this deployment.
- max_waitint, optional
seconds to wait for successful resolution
- Raises
- AssertionError
If
percentiles
is in an invalid format- AsyncFailureError
If any of the responses from the server are unexpected
- AsyncProcessUnsuccessfulError
If the prediction intervals calculation job has failed or has been cancelled.
- AsyncTimeoutError
If the prediction intervals calculation job did not resolve in time
Notes
Updating prediction intervals settings is an asynchronous process, which means some preparatory work may be performed before the settings request is completed. This function will not return until all work is fully finished.
Note that prediction intervals are only supported for time series deployments.
- Return type
None
- get_service_stats(model_id=None, start_time=None, end_time=None, execution_time_quantile=None, response_time_quantile=None, slow_requests_threshold=None)¶
Retrieves values of many service stat metrics aggregated over a time period.
New in version v2.18.
- Parameters
- model_idstr, optional
the id of the model
- start_timedatetime, optional
start of the time period
- end_timedatetime, optional
end of the time period
- execution_time_quantilefloat, optional
quantile for executionTime, defaults to 0.5
- response_time_quantilefloat, optional
quantile for responseTime, defaults to 0.5
- slow_requests_thresholdfloat, optional
threshold for slowRequests, defaults to 1000
- Returns
- service_statsServiceStats
the queried service stats metrics information
- Return type
- get_service_stats_over_time(metric=None, model_id=None, start_time=None, end_time=None, bucket_size=None, quantile=None, threshold=None)¶
Retrieves values of a single service stat metric over a time period.
New in version v2.18.
- Parameters
- metricSERVICE_STAT_METRIC, optional
the service stat metric to retrieve
- model_idstr, optional
the id of the model
- start_timedatetime, optional
start of the time period
- end_timedatetime, optional
end of the time period
- bucket_sizestr, optional
time duration of a bucket, in ISO 8601 time duration format
- quantilefloat, optional
quantile for ‘executionTime’ or ‘responseTime’, ignored when querying other metrics
- thresholdint, optional
threshold for ‘slowQueries’, ignored when querying other metrics
- Returns
- service_stats_over_timeServiceStatsOverTime
the queried service stats metric over time information
- Return type
- get_target_drift(model_id=None, start_time=None, end_time=None, metric=None)¶
Retrieve target drift information over a certain time period.
New in version v2.21.
- Parameters
- model_idstr
the id of the model
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- metricstr
(New in version v2.22) metric used to calculate the drift score
- Returns
- target_driftTargetDrift
the queried target drift information
- Return type
- get_feature_drift(model_id=None, start_time=None, end_time=None, metric=None)¶
Retrieve drift information for deployment’s features over a certain time period.
New in version v2.21.
- Parameters
- model_idstr
the id of the model
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- metricstr
(New in version v2.22) The metric used to calculate the drift score. Allowed values include psi, kl_divergence, dissimilarity, hellinger, and js_divergence.
- Returns
- feature_drift_data[FeatureDrift]
the queried feature drift information
- Return type
List
[FeatureDrift
]
- get_predictions_over_time(model_ids=None, start_time=None, end_time=None, bucket_size=None, target_classes=None, include_percentiles=False)¶
Retrieve stats of deployment’s prediction response over a certain time period.
New in version v3.2.
- Parameters
- model_idslist[str]
ID of models to retrieve prediction stats
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- bucket_sizeBUCKET_SIZE
time duration of each bucket
- target_classeslist[str]
class names of target, only for deployments with multiclass target
- include_percentilesbool
if the returned data includes percentiles, only for a deployment with a binary and regression target
- Returns
- predictions_over_timePredictionsOverTime
the queried predictions over time information
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') predictions_over_time = deployment.get_predictions_over_time() predictions_over_time.buckets[0]['mean_predicted_value'] >>>0.3772 predictions_over_time.buckets[0]['row_count'] >>>2000
- Return type
- get_accuracy(model_id=None, start_time=None, end_time=None, start=None, end=None, target_classes=None)¶
Retrieves values of many accuracy metrics aggregated over a time period.
New in version v2.18.
- Parameters
- model_idstr
the id of the model
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- target_classeslist[str], optional
Optional list of target class strings
- Returns
- accuracyAccuracy
the queried accuracy metrics information
- Return type
- get_accuracy_over_time(metric=None, model_id=None, start_time=None, end_time=None, bucket_size=None, target_classes=None)¶
Retrieves values of a single accuracy metric over a time period.
New in version v2.18.
- Parameters
- metricACCURACY_METRIC
the accuracy metric to retrieve
- model_idstr
the id of the model
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- bucket_sizestr
time duration of a bucket, in ISO 8601 time duration format
- target_classeslist[str], optional
Optional list of target class strings
- Returns
- accuracy_over_timeAccuracyOverTime
the queried accuracy metric over time information
- Return type
- get_fairness_scores_over_time(start_time=None, end_time=None, bucket_size=None, model_id=None, protected_feature=None, fairness_metric=None)¶
Retrieves values of a single fairness score over a time period.
New in version v3.2.
- Parameters
- model_idstr
the id of the model
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- bucket_sizestr
time duration of a bucket, in ISO 8601 time duration format
- protected_featurestr
name of protected feature
- fairness_metricstr
A consolidation of the fairness metrics by the use case.
- Returns
- fairness_scores_over_timeFairnessScoresOverTime
the queried fairness score over time information
- Return type
- update_secondary_dataset_config(secondary_dataset_config_id, credential_ids=None)¶
Update the secondary dataset config used by Feature discovery model for a given deployment.
New in version v2.23.
- Parameters
- secondary_dataset_config_id: str
Id of the secondary dataset config
- credential_ids: list or None
List of DatasetsCredentials used by the secondary datasets
Examples
from datarobot import Deployment deployment = Deployment(deployment_id='5c939e08962d741e34f609f0') config = deployment.update_secondary_dataset_config('5df109112ca582033ff44084') config >>> '5df109112ca582033ff44084'
- Return type
str
- get_secondary_dataset_config()¶
Get the secondary dataset config used by Feature discovery model for a given deployment.
New in version v2.23.
- Returns
- secondary_dataset_configSecondaryDatasetConfigurations
Id of the secondary dataset config
Examples
from datarobot import Deployment deployment = Deployment(deployment_id='5c939e08962d741e34f609f0') deployment.update_secondary_dataset_config('5df109112ca582033ff44084') config = deployment.get_secondary_dataset_config() config >>> '5df109112ca582033ff44084'
- Return type
str
- get_prediction_results(model_id=None, start_time=None, end_time=None, actuals_present=None, offset=None, limit=None)¶
Retrieve a list of prediction results of the deployment.
New in version v2.24.
- Parameters
- model_idstr
the id of the model
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- actuals_presentbool
filters predictions results to only those who have actuals present or with missing actuals
- offsetint
this many results will be skipped
- limitint
at most this many results are returned
- Returns
- prediction_results: list[dict]
a list of prediction results
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') results = deployment.get_prediction_results()
- Return type
List
[Dict
[str
,Any
]]
- download_prediction_results(filepath, model_id=None, start_time=None, end_time=None, actuals_present=None, offset=None, limit=None)¶
Download prediction results of the deployment as a CSV file.
New in version v2.24.
- Parameters
- filepathstr
path of the csv file
- model_idstr
the id of the model
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- actuals_presentbool
filters predictions results to only those who have actuals present or with missing actuals
- offsetint
this many results will be skipped
- limitint
at most this many results are returned
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') results = deployment.download_prediction_results('path_to_prediction_results.csv')
- Return type
None
- download_scoring_code(filepath, source_code=False, include_agent=False, include_prediction_explanations=False, include_prediction_intervals=False)¶
Retrieve scoring code of the current deployed model.
New in version v2.24.
- Parameters
- filepathstr
path of the scoring code file
- source_codebool
whether source code or binary of the scoring code will be retrieved
- include_agentbool
whether the scoring code retrieved will include tracking agent
- include_prediction_explanationsbool
whether the scoring code retrieved will include prediction explanations
- include_prediction_intervalsbool
whether the scoring code retrieved will support prediction intervals
Notes
When setting include_agent or include_predictions_explanations or include_prediction_intervals to True, it can take a considerably longer time to download the scoring code.
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') results = deployment.download_scoring_code('path_to_scoring_code.jar')
- Return type
None
- delete_monitoring_data(model_id, start_time=None, end_time=None, max_wait=600)¶
Delete deployment monitoring data.
- Parameters
- model_idstr
id of the model to delete monitoring data
- start_timedatetime, optional
start of the time period to delete monitoring data
- end_timedatetime, optional
end of the time period to delete monitoring data
- max_waitint, optional
seconds to wait for successful resolution
- Return type
None
Get a list of users, groups and organizations that have an access to this user blueprint
- Parameters
- id: str, Optional
Only return the access control information for a organization, group or user with this ID.
- name: string, Optional
Only return the access control information for a organization, group or user with this name.
- share_recipient_type: enum(‘user’, ‘group’, ‘organization’), Optional
Only returns results with the given recipient type.
- limit: int (Default=0)
At most this many results are returned.
- offset: int (Default=0)
This many results will be skipped.
- Returns
- list(DeploymentSharedRole)
- Return type
List
[DeploymentSharedRole
]
Share a deployment with a user, group, or organization
- Parameters
- roles: list(or(GrantAccessControlWithUsernameValidator, GrantAccessControlWithIdValidator))
Array of GrantAccessControl objects, up to maximum 100 objects.
- Return type
None
- classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters
- datadict
Correctly snake_cased keys and their values.
- Return type
TypeVar
(T
, bound=APIObject
)
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
TypeVar
(T
, bound=APIObject
)
- open_in_browser()¶
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type
None
- class datarobot.models.deployment.DeploymentListFilters(role=None, service_health=None, model_health=None, accuracy_health=None, execution_environment_type=None, importance=None)¶
- class datarobot.models.deployment.ServiceStats(period=None, metrics=None, model_id=None)¶
Deployment service stats information.
- Attributes
- model_idstr
the model used to retrieve service stats metrics
- perioddict
the time period used to retrieve service stats metrics
- metricsdict
the service stats metrics
- classmethod get(deployment_id, model_id=None, start_time=None, end_time=None, execution_time_quantile=None, response_time_quantile=None, slow_requests_threshold=None)¶
Retrieve value of service stat metrics over a certain time period.
New in version v2.18.
- Parameters
- deployment_idstr
the id of the deployment
- model_idstr, optional
the id of the model
- start_timedatetime, optional
start of the time period
- end_timedatetime, optional
end of the time period
- execution_time_quantilefloat, optional
quantile for executionTime, defaults to 0.5
- response_time_quantilefloat, optional
quantile for responseTime, defaults to 0.5
- slow_requests_thresholdfloat, optional
threshold for slowRequests, defaults to 1000
- Returns
- service_statsServiceStats
the queried service stats metrics
- Return type
- class datarobot.models.deployment.ServiceStatsOverTime(buckets=None, summary=None, metric=None, model_id=None)¶
Deployment service stats over time information.
- Attributes
- model_idstr
the model used to retrieve accuracy metric
- metricstr
the service stat metric being retrieved
- bucketsdict
how the service stat metric changes over time
- summarydict
summary for the service stat metric
- classmethod get(deployment_id, metric=None, model_id=None, start_time=None, end_time=None, bucket_size=None, quantile=None, threshold=None)¶
Retrieve information about how a service stat metric changes over a certain time period.
New in version v2.18.
- Parameters
- deployment_idstr
the id of the deployment
- metricSERVICE_STAT_METRIC, optional
the service stat metric to retrieve
- model_idstr, optional
the id of the model
- start_timedatetime, optional
start of the time period
- end_timedatetime, optional
end of the time period
- bucket_sizestr, optional
time duration of a bucket, in ISO 8601 time duration format
- quantilefloat, optional
quantile for ‘executionTime’ or ‘responseTime’, ignored when querying other metrics
- thresholdint, optional
threshold for ‘slowQueries’, ignored