API Reference¶
API Object¶
- class datarobot.models.api_object.APIObject¶
- classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters
- datadict
Correctly snake_cased keys and their values.
- Return type
TypeVar
(T
, bound=APIObject
)
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
TypeVar
(T
, bound=APIObject
)
Advanced Options¶
- class datarobot.helpers.AdvancedOptions(weights=None, response_cap=None, blueprint_threshold=None, seed=None, smart_downsampled=None, majority_downsampling_rate=None, offset=None, exposure=None, accuracy_optimized_mb=None, scaleout_modeling_mode=None, events_count=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, only_include_monotonic_blueprints=None, allowed_pairwise_interaction_groups=None, blend_best_models=None, scoring_code_only=None, prepare_model_for_deployment=None, consider_blenders_in_recommendation=None, min_secondary_validation_model_count=None, shap_only_mode=None, autopilot_data_sampling_method=None, run_leakage_removed_feature_list=None, autopilot_with_feature_discovery=False, feature_discovery_supervised_feature_reduction=None, exponentially_weighted_moving_alpha=None, external_time_series_baseline_dataset_id=None, use_supervised_feature_reduction=True, primary_location_column=None, protected_features=None, preferable_target_value=None, fairness_metrics_set=None, fairness_threshold=None, bias_mitigation_feature_name=None, bias_mitigation_technique=None, include_bias_mitigation_feature_as_predictor_variable=None, default_monotonic_increasing_featurelist_id=None, default_monotonic_decreasing_featurelist_id=None, model_group_id=None, model_regime_id=None, model_baselines=None, incremental_learning_only_mode=None, incremental_learning_on_best_model=None, chunk_definition_id=None, incremental_learning_early_stopping_rounds=None)¶
Used when setting the target of a project to set advanced options of modeling process.
- Parameters
- weightsstring, optional
The name of a column indicating the weight of each row
- response_capbool or float in [0.5, 1), optional
Defaults to none here, but server defaults to False. If specified, it is the quantile of the response distribution to use for response capping.
- blueprint_thresholdint, optional
Number of hours models are permitted to run before being excluded from later autopilot stages Minimum 1
- seedint, optional
a seed to use for randomization
- smart_downsampledbool, optional
whether to use smart downsampling to throw away excess rows of the majority class. Only applicable to classification and zero-boosted regression projects.
- majority_downsampling_ratefloat, optional
the percentage between 0 and 100 of the majority rows that should be kept. Specify only if using smart downsampling. May not cause the majority class to become smaller than the minority class.
- offsetlist of str, optional
(New in version v2.6) the list of the names of the columns containing the offset of each row
- exposurestring, optional
(New in version v2.6) the name of a column containing the exposure of each row
- accuracy_optimized_mbbool, optional
(New in version v2.6) Include additional, longer-running models that will be run by the autopilot and available to run manually.
- scaleout_modeling_modestring, optional
(Deprecated in 2.28. Will be removed in 2.30) DataRobot no longer supports scaleout models. Please remove any usage of this parameter as it will be removed from the API soon.
- events_countstring, optional
(New in version v2.8) the name of a column specifying events count.
- monotonic_increasing_featurelist_idstring, optional
(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overridden at model submission time if desired.
- monotonic_decreasing_featurelist_idstring, optional
(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overridden at model submission time if desired.
- only_include_monotonic_blueprintsbool, optional
(new in version 2.11) when true, only blueprints that support enforcing monotonic constraints will be available in the project or selected for the autopilot.
- allowed_pairwise_interaction_groupslist of tuple, optional
(New in version v2.19) For GA2M models - specify groups of columns for which pairwise interactions will be allowed. E.g. if set to [(A, B, C), (C, D)] then GA2M models will allow interactions between columns A x B, B x C, A x C, C x D. All others (A x D, B x D) will not be considered.
- blend_best_models: bool, optional
(New in version v2.19) blend best models during Autopilot run.
- scoring_code_only: bool, optional
(New in version v2.19) Keep only models that can be converted to scorable java code during Autopilot run
- shap_only_mode: bool, optional
(New in version v2.21) Keep only models that support SHAP values during Autopilot run. Use SHAP-based insights wherever possible. Defaults to False.
- prepare_model_for_deployment: bool, optional
(New in version v2.19) Prepare model for deployment during Autopilot run. The preparation includes creating reduced feature list models, retraining best model on higher sample size, computing insights and assigning “RECOMMENDED FOR DEPLOYMENT” label.
- consider_blenders_in_recommendation: bool, optional
(New in version 2.22.0) Include blenders when selecting a model to prepare for deployment in an Autopilot Run. Defaults to False.
- min_secondary_validation_model_count: int, optional
(New in version v2.19) Compute “All backtest” scores (datetime models) or cross validation scores for the specified number of the highest ranking models on the Leaderboard, if over the Autopilot default.
- autopilot_data_sampling_method: str, optional
(New in version v2.23) one of
datarobot.enums.DATETIME_AUTOPILOT_DATA_SAMPLING_METHOD
. Applicable for OTV projects only, defines if autopilot uses “random” or “latest” sampling when iteratively building models on various training samples. Defaults to “random” for duration-based projects and to “latest” for row-based projects.- run_leakage_removed_feature_list: bool, optional
(New in version v2.23) Run Autopilot on Leakage Removed feature list (if exists).
- autopilot_with_feature_discovery: bool, default ``False``, optional
(New in version v2.23) If true, autopilot will run on a feature list that includes features found via search for interactions.
- feature_discovery_supervised_feature_reduction: bool, optional
(New in version v2.23) Run supervised feature reduction for feature discovery projects.
- exponentially_weighted_moving_alpha: float, optional
(New in version v2.26) defaults to None, value between 0 and 1 (inclusive), indicates alpha parameter used in exponentially weighted moving average within feature derivation window.
- external_time_series_baseline_dataset_id: str, optional
(New in version v2.26) If provided, will generate metrics scaled by external model predictions metric for time series projects. The external predictions catalog must be validated before autopilot starts, see
Project.validate_external_time_series_baseline
and external baseline predictions documentation for further explanation.- use_supervised_feature_reduction: bool, default ``True` optional
Time Series only. When true, during feature generation DataRobot runs a supervised algorithm to retain only qualifying features. Setting to false can severely impact autopilot duration, especially for datasets with many features.
- primary_location_column: str, optional.
The name of primary location column.
- protected_features: list of str, optional.
(New in version v2.24) A list of project features to mark as protected for Bias and Fairness testing calculations. Max number of protected features allowed is 10.
- preferable_target_value: str, optional.
(New in version v2.24) A target value that should be treated as a favorable outcome for the prediction. For example, if we want to check gender discrimination for giving a loan and our target is named
is_bad
, then the positive outcome for the prediction would beNo
, which means that the loan is good and that’s what we treat as a favorable result for the loaner.- fairness_metrics_set: str, optional.
(New in version v2.24) Metric to use for calculating fairness. Can be one of
proportionalParity
,equalParity
,predictionBalance
,trueFavorableAndUnfavorableRateParity
orfavorableAndUnfavorablePredictiveValueParity
. Used and required only if Bias & Fairness in AutoML feature is enabled.- fairness_threshold: str, optional.
(New in version v2.24) Threshold value for the fairness metric. Can be in a range of
[0.0, 1.0]
. If the relative (i.e. normalized) fairness score is below the threshold, then the user will see a visual indication on the- bias_mitigation_feature_namestr, optional
The feature from protected features that will be used in a bias mitigation task to mitigate bias
- bias_mitigation_techniquestr, optional
One of datarobot.enums.BiasMitigationTechnique Options: - ‘preprocessingReweighing’ - ‘postProcessingRejectionOptionBasedClassification’ The technique by which we’ll mitigate bias, which will inform which bias mitigation task we insert into blueprints
- include_bias_mitigation_feature_as_predictor_variablebool, optional
Whether we should also use the mitigation feature as in input to the modeler just like any other categorical used for training, i.e. do we want the model to “train on” this feature in addition to using it for bias mitigation
- default_monotonic_increasing_featurelist_idstr, optional
Returned from server on Project GET request - not able to be updated by user
- default_monotonic_decreasing_featurelist_idstr, optional
Returned from server on Project GET request - not able to be updated by user
- model_group_id: Optional[str] = None,
(New in version v3.3) The name of a column containing the model group ID for each row.
- model_regime_id: Optional[str] = None,
(New in version v3.3) The name of a column containing the model regime ID for each row.
- model_baselines: Optional[List[str]] = None,
(New in version v3.3) The list of the names of the columns containing the model baselines for each row.
- incremental_learning_only_mode: Optional[bool] = None,
(New in version v3.4) Keep only models that support incremental learning during Autopilot run.
- incremental_learning_on_best_model: Optional[bool] = None,
(New in version v3.4) Run incremental learning on the best model during Autopilot run.
- chunk_definition_idstring, optional
(New in version v3.4) Unique definition for chunks needed to run automated incremental learning.
- incremental_learning_early_stopping_roundsOptional[int] = None
(New in version v3.4) Early stopping rounds used in the automated incremental learning service.
Examples
import datarobot as dr advanced_options = dr.AdvancedOptions( weights='weights_column', offset=['offset_column'], exposure='exposure_column', response_cap=0.7, blueprint_threshold=2, smart_downsampled=True, majority_downsampling_rate=75.0)
- get(_AdvancedOptions__key, _AdvancedOptions__default=None)¶
Return the value for key if key is in the dictionary, else default.
- Return type
Optional
[Any
]
- pop(_AdvancedOptions__key)¶
If key is not found, d is returned if given, otherwise KeyError is raised
- Return type
Optional
[Any
]
- update_individual_options(**kwargs)¶
Update individual attributes of an instance of
AdvancedOptions
.- Return type
None
Anomaly Assessment¶
- class datarobot.models.anomaly_assessment.AnomalyAssessmentRecord(status, status_details, start_date, end_date, prediction_threshold, preview_location, delete_location, latest_explanations_location, **record_kwargs)¶
Object which keeps metadata about anomaly assessment insight for the particular subset, backtest and series and the links to proceed to get the anomaly assessment data.
New in version v2.25.
Notes
Record
contains:record_id
: the ID of the record.project_id
: the project ID of the record.model_id
: the model ID of the record.backtest
: the backtest of the record.source
: the source of the record.series_id
: the series id of the record for the multiseries projects.status
: the status of the insight.status_details
: the explanation of the status.start_date
: the ISO-formatted timestamp of the first prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.end_date
: the ISO-formatted timestamp of the last prediction in the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.prediction_threshold
: the threshold, all rows with anomaly scores greater or equal to it have shap explanations computed. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.preview_location
: URL to retrieve predictions preview for the subset. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.latest_explanations_location
: the URL to retrieve the latest predictions with the shap explanations. Will be None if status is not AnomalyAssessmentStatus.COMPLETED.delete_location
: the URL to delete anomaly assessment record and relevant insight data.
- Attributes
- record_id: str
The ID of the record.
- project_id: str
The ID of the project record belongs to.
- model_id: str
The ID of the model record belongs to.
- backtest: int or “holdout”
The backtest of the record.
- source: “training” or “validation”
The source of the record
- series_id: str or None
The series id of the record for the multiseries projects. Defined only for the multiseries projects.
- status: str
The status of the insight. One of
datarobot.enums.AnomalyAssessmentStatus
- status_details: str
The explanation of the status.
- start_date: str or None
See start_date info in Notes for more details.
- end_date: str or None
See end_date info in Notes for more details.
- prediction_threshold: float or None
See prediction_threshold info in Notes for more details.
- preview_location: str or None
See preview_location info in Notes for more details.
- latest_explanations_location: str or None
See latest_explanations_location info in Notes for more details.
- delete_location: str
The URL to delete anomaly assessment record and relevant insight data.
- classmethod list(project_id, model_id, backtest=None, source=None, series_id=None, limit=100, offset=0, with_data_only=False)¶
Retrieve the list of the anomaly assessment records for the project and model. Output can be filtered and limited.
- Parameters
- project_id: str
The ID of the project record belongs to.
- model_id: str
The ID of the model record belongs to.
- backtest: int or “holdout”
The backtest to filter records by.
- source: “training” or “validation”
The source to filter records by.
- series_id: str, optional
The series id to filter records by. Can be specified for multiseries projects.
- limit: int, optional
100 by default. At most this many results are returned.
- offset: int, optional
This many results will be skipped.
- with_data_only: bool, False by default
Filter by status == AnomalyAssessmentStatus.COMPLETED. If True, records with no data or not supported will be omitted.
- Returns
- AnomalyAssessmentRecord
The anomaly assessment record.
- Return type
List
[AnomalyAssessmentRecord
]
- classmethod compute(project_id, model_id, backtest, source, series_id=None)¶
Request anomaly assessment insight computation on the specified subset.
- Parameters
- project_id: str
The ID of the project to compute insight for.
- model_id: str
The ID of the model to compute insight for.
- backtest: int or “holdout”
The backtest to compute insight for.
- source: “training” or “validation”
The source to compute insight for.
- series_id: str, optional
The series id to compute insight for. Required for multiseries projects.
- Returns
- AnomalyAssessmentRecord
The anomaly assessment record.
- Return type
- delete()¶
Delete anomaly assessment record with preview and explanations.
- Return type
None
- get_predictions_preview()¶
Retrieve aggregated predictions statistics for the anomaly assessment record.
- Returns
- AnomalyAssessmentPredictionsPreview
- Return type
- get_latest_explanations()¶
Retrieve latest predictions along with shap explanations for the most anomalous records.
- Returns
- AnomalyAssessmentExplanations
- Return type
- get_explanations(start_date=None, end_date=None, points_count=None)¶
Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.
- Parameters
- start_date: str, optional
The start of the date range to get explanations in. Example:
2020-01-01T00:00:00.000000Z
- end_date: str, optional
The end of the date range to get explanations in. Example:
2020-10-01T00:00:00.000000Z
- points_count: int, optional
The number of the rows to return.
- Returns
- AnomalyAssessmentExplanations
- Return type
- get_explanations_data_in_regions(regions, prediction_threshold=0.0)¶
Get predictions along with explanations for the specified regions, sorted by predictions in descending order.
- Parameters
- regions: list of preview_bins
For each region explanations will be retrieved and merged.
- prediction_threshold: float, optional
If specified, only points with score greater or equal to the threshold will be returned.
- Returns
- dict in a form of {‘explanations’: explanations, ‘shap_base_value’: shap_base_value}
- Return type
- class datarobot.models.anomaly_assessment.AnomalyAssessmentExplanations(shap_base_value, data, start_date, end_date, count, **record_kwargs)¶
Object which keeps predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points.
New in version v2.25.
Notes
AnomalyAssessmentExplanations
contains:record_id
: the id of the corresponding anomaly assessment record.project_id
: the project ID of the corresponding anomaly assessment record.model_id
: the model ID of the corresponding anomaly assessment record.backtest
: the backtest of the corresponding anomaly assessment record.source
: the source of the corresponding anomaly assessment record.series_id
: the series id of the corresponding anomaly assessment record for the multiseries projects.start_date
: the ISO-formatted first timestamp in the response. Will be None of there is no data in the specified range.end_date
: the ISO-formatted last timestamp in the response. Will be None of there is no data in the specified range.count
: The number of points in the response.shap_base_value
: the shap base value.data
: list of DataPoint objects in the specified date range.
DataPoint
contains:shap_explanation
: None or an array of up to 10 ShapleyFeatureContribution objects. Only rows with the highest anomaly scores have Shapley explanations calculated. Value is None if prediction is lower than prediction_threshold.timestamp
(str) : ISO-formatted timestamp for the row.prediction
(float) : The output of the model for this row.
ShapleyFeatureContribution
contains:feature_value
(str) : the feature value for this row. First 50 characters are returned.strength
(float) : the shap value for this feature and row.feature
(str) : the feature name.
- Attributes
- record_id: str
The ID of the record.
- project_id: str
The ID of the project record belongs to.
- model_id: str
The ID of the model record belongs to.
- backtest: int or “holdout”
The backtest of the record.
- source: “training” or “validation”
The source of the record.
- series_id: str or None
The series id of the record for the multiseries projects. Defined only for the multiseries projects.
- start_date: str or None
The ISO-formatted datetime of the first row in the
data
.- end_date: str or None
The ISO-formatted datetime of the last row in the
data
.- data: array of `data_point` objects or None
See data info in Notes for more details.
- shap_base_value: float
Shap base value.
- count: int
The number of points in the
data
.
- classmethod get(project_id, record_id, start_date=None, end_date=None, points_count=None)¶
Retrieve predictions along with shap explanations for the most anomalous records in the specified date range/for defined number of points. Two out of three parameters: start_date, end_date or points_count must be specified.
- Parameters
- project_id: str
The ID of the project.
- record_id: str
The ID of the anomaly assessment record.
- start_date: str, optional
The start of the date range to get explanations in. Example:
2020-01-01T00:00:00.000000Z
- end_date: str, optional
The end of the date range to get explanations in. Example:
2020-10-01T00:00:00.000000Z
- points_count: int, optional
The number of the rows to return.
- Returns
- AnomalyAssessmentExplanations
- Return type
- class datarobot.models.anomaly_assessment.AnomalyAssessmentPredictionsPreview(start_date, end_date, preview_bins, **record_kwargs)¶
Aggregated predictions over time for the corresponding anomaly assessment record. Intended to find the bins with highest anomaly scores.
New in version v2.25.
Notes
AnomalyAssessmentPredictionsPreview
contains:record_id
: the id of the corresponding anomaly assessment record.project_id
: the project ID of the corresponding anomaly assessment record.model_id
: the model ID of the corresponding anomaly assessment record.backtest
: the backtest of the corresponding anomaly assessment record.source
: the source of the corresponding anomaly assessment record.series_id
: the series id of the corresponding anomaly assessment record for the multiseries projects.start_date
: the ISO-formatted timestamp of the first prediction in the subset.end_date
: the ISO-formatted timestamp of the last prediction in the subset.preview_bins
: list of PreviewBin objects. The aggregated predictions for the subset. Bins boundaries may differ from actual start/end dates because this is an aggregation.
PreviewBin
contains:start_date
(str) : the ISO-formatted datetime of the start of the bin.end_date
(str) : the ISO-formatted datetime of the end of the bin.avg_predicted
(float or None) : the average prediction of the model in the bin. None if there are no entries in the bin.max_predicted
(float or None) : the maximum prediction of the model in the bin. None if there are no entries in the bin.frequency
(int) : the number of the rows in the bin.
- Attributes
- record_id: str
The ID of the record.
- project_id: str
The ID of the project record belongs to.
- model_id: str
The ID of the model record belongs to.
- backtest: int or “holdout”
The backtest of the record.
- source: “training” or “validation”
The source of the record
- series_id: str or None
The series id of the record for the multiseries projects. Defined only for the multiseries projects.
- start_date: str
the ISO-formatted timestamp of the first prediction in the subset.
- end_date: str
the ISO-formatted timestamp of the last prediction in the subset.
- preview_bins: list of preview_bin objects.
The aggregated predictions for the subset. See more info in Notes.
- classmethod get(project_id, record_id)¶
Retrieve aggregated predictions over time.
- Parameters
- project_id: str
The ID of the project.
- record_id: str
The ID of the anomaly assessment record.
- Returns
- AnomalyAssessmentPredictionsPreview
- Return type
- find_anomalous_regions(max_prediction_threshold=0.0)¶
- Sort preview bins by max_predicted value and select those with max predicted value
greater or equal to max prediction threshold. Sort the result by max predicted value in descending order.
- Parameters
- max_prediction_threshold: float, optional
Return bins with maximum anomaly score greater or equal to max_prediction_threshold.
- Returns
- preview_bins: list of preview_bin
Filtered and sorted preview bins
- Return type
Application¶
- class datarobot.Application(id, application_type_id, user_id, model_deployment_id, name, created_by, created_at, updated_at, datasets, cloud_provider, deployment_ids, pool_used, permissions, has_custom_logo, org_id, deployment_status_id=None, description=None, related_entities=None, application_template_type=None, deployment_name=None, deactivation_status_id=None, created_first_name=None, creator_last_name=None, creator_userhash=None, deployments=None)¶
An entity associated with a DataRobot Application.
- Attributes
- idstr
The ID of the created application.
- application_type_idstr
The ID of the type of the application.
- user_idstr
The ID of the user which created the application.
- model_deployment_idstr
The ID of the associated model deployment.
- deactivation_status_idstr or None
The ID of the status object to track the asynchronous app deactivation process status. Will be None if the app was never deactivated.
- namestr
The name of the application.
- created_bystr
The username of the user created the application.
- created_atstr
The timestamp when the application was created.
- updated_atstr
The timestamp when the application was updated.
- datasetsList[str]
The list of datasets IDs associated with the application.
- creator_first_nameOptional[str]
Application creator first name. Optional.
- creator_last_nameOptional[str]
Application creator last name. Optional.
- creator_userhashOptional[str]
Application creator userhash. Optional.
- deployment_status_idstr
The ID of the status object to track the asynchronous deployment process status.
- descriptionstr
A description of the application.
- cloud_providerstr
The host of this application.
- deploymentsOptional[List[ApplicationDeployment]]
A list of deployment details. Optional.
- deployment_idsList[str]
A list of deployment IDs for this app.
- deployment_nameOptional[str]
Name of the deployment. Optional.
- application_template_typeOptional[str]
Application template type, purpose. Optional.
- pool_usedbool
Whether the pool where used for last app deployment.
- permissionsList[str]
The list of permitted actions, which the authenticated user can perform on this application. Permissions should be ApplicationPermission options.
- has_custom_logobool
Whether the app has a custom logo.
- related_entitiesOptional[ApplcationRelatedEntity]
IDs of entities, related to app for easy search.
- org_idstr
ID of the app’s organization.
- classmethod list(offset=None, limit=None, use_cases=None)¶
Retrieve a list of user applications.
- Parameters
- offsetOptional[int]
Optional. Retrieve applications in a list after this number.
- limitOptional[int]
Optional. Retrieve only this number of applications.
- use_cases: Optional[Union[UseCase, List[UseCase], str, List[str]]]
Optional. Filter available Applications by a specific Use Case or Use Cases. Accepts either the entity or the ID. If set to [None], the method filters the application’s datasets by those not linked to a UseCase.
- Returns
- applicationsList[Application]
The requested list of user applications.
- Return type
List
[Application
]
- classmethod get(application_id)¶
Retrieve a single application.
- Parameters
- application_idstr
The ID of the application to retrieve.
- Returns
- applicationApplication
The requested application.
- Return type
Batch Predictions¶
- class datarobot.models.BatchPredictionJob(data, completed_resource_url=None)¶
A Batch Prediction Job is used to score large data sets on prediction servers using the Batch Prediction API.
- Attributes
- idstr
the id of the job
- classmethod score(deployment, intake_settings=None, output_settings=None, csv_settings=None, timeseries_settings=None, num_concurrent=None, chunk_size=None, passthrough_columns=None, passthrough_columns_set=None, max_explanations=None, max_ngram_explanations=None, explanation_algorithm=None, threshold_high=None, threshold_low=None, prediction_threshold=None, prediction_warning_enabled=None, include_prediction_status=False, skip_drift_tracking=False, prediction_instance=None, abort_on_error=True, column_names_remapping=None, include_probabilities=True, include_probabilities_classes=None, download_timeout=120, download_read_timeout=660, upload_read_timeout=600, explanations_mode=None)¶
Create new batch prediction job, upload the scoring dataset and return a batch prediction job.
The default intake and output options are both localFile which requires the caller to pass the file parameter and either download the results using the download() method afterwards or pass a path to a file where the scored data will be downloaded to afterwards.
- Returns
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes
- deploymentDeployment or string ID
Deployment which will be used for scoring.
- intake_settingsdict (optional)
A dict configuring how data is coming from. Supported options:
type : string, either localFile, s3, azure, gcp, dataset, jdbc snowflake, synapse or bigquery
Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a dr.Dataset object.
To score from a local file, add the this parameter to the settings:
file : file-like object, string path to file or a pandas.DataFrame of scoring data
To score from S3, add the next parameters to the settings:
url : string, the URL to score (e.g.: s3://bucket/key)
credential_id : string (optional)
endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default)
To score from JDBC, add the next parameters to the settings:
data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).
query : string (optional if table, schema and/or catalog is specified), a self-supplied SELECT statement of the data set you wish to predict.
table : string (optional if query is specified), the name of specified database table.
schema : string (optional if query is specified), the name of specified database schema.
catalog : string (optional if query is specified), (new in v2.22) the name of specified database catalog.
fetch_size : int (optional), Changing the fetchSize can be used to balance throughput and memory usage.
credential_id : string (optional) the ID of the credentials holding information about a user with read-access to the JDBC data source (see Credentials).
- output_settingsdict (optional)
A dict configuring how scored data is to be saved. Supported options:
type : string, either localFile, s3, azure, gcp, jdbc, snowflake, synapse or bigquery
To save scored data to a local file, add this parameters to the settings:
path : string (optional), path to save the scored data as CSV. If a path is not specified, you must download the scored data yourself with job.download(). If a path is specified, the call will block until the job is done. if there are no other jobs currently processing for the targeted prediction instance, uploading, scoring, downloading will happen in parallel without waiting for a full job to complete. Otherwise, it will still block, but start downloading the scored data as soon as it starts generating data. This is the fastest method to get predictions.
To save scored data to S3, add the next parameters to the settings:
url : string, the URL for storing the results (e.g.: s3://bucket/key)
credential_id : string (optional)
endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default)
To save scored data to JDBC, add the next parameters to the settings:
data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).
table : string, the name of specified database table.
schema : string (optional), the name of specified database schema.
catalog : string (optional), (new in v2.22) the name of specified database catalog.
statement_type : string, the type of insertion statement to create, one of
datarobot.enums.AVAILABLE_STATEMENT_TYPES
.update_columns : list(string) (optional), a list of strings containing those column names to be updated in case statement_type is set to a value related to update or upsert.
where_columns : list(string) (optional), a list of strings containing those column names to be selected in case statement_type is set to a value related to insert or update.
credential_id : string, the ID of the credentials holding information about a user with write-access to the JDBC data source (see Credentials).
create_table_if_not_exists : bool (optional), If no existing table is detected, attempt to create it before writing data with the strategy defined in the statementType parameter.
- csv_settingsdict (optional)
CSV intake and output settings. Supported options:
delimiter : string (optional, default ,), fields are delimited by this character. Use the string tab to denote TSV (TAB separated values). Must be either a one-character string or the string tab.
quotechar : string (optional, default “), fields containing the delimiter must be quoted using this character.
encoding : string (optional, default utf-8), encoding for the CSV files. For example (but not limited to): shift_jis, latin_1 or mskanji.
- timeseries_settingsdict (optional)
Configuration for time-series scoring. Supported options:
type : string, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point or rows in the dataset without target. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.
forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions, by default value will be inferred from the dataset. May be passed if
timeseries_settings.type=forecast
.predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- num_concurrentint (optional)
Number of concurrent chunks to score simultaneously. Defaults to the available number of cores of the deployment. Lower it to leave resources for real-time scoring.
- chunk_sizestring or int (optional)
Which strategy should be used to determine the chunk size. Can be either a named strategy or a fixed size in bytes. - auto: use fixed or dynamic based on flipper - fixed: use 1MB for explanations, 5MB for regular requests - dynamic: use dynamic chunk sizes - int: use this many bytes per chunk
- passthrough_columnslist[string] (optional)
Keep these columns from the scoring dataset in the scored dataset. This is useful for correlating predictions with source data.
- passthrough_columns_setstring (optional)
To pass through every column from the scoring dataset, set this to all. Takes precedence over passthrough_columns if set.
- max_explanationsint (optional)
Compute prediction explanations for this amount of features.
- max_ngram_explanationsint or str (optional)
Compute text explanations for this amount of ngrams. Set to all to return all ngram explanations, or set to a positive integer value to limit the amount of ngram explanations returned. By default no ngram explanations will be computed and returned.
- threshold_highfloat (optional)
Only compute prediction explanations for predictions above this threshold. Can be combined with threshold_low.
- threshold_lowfloat (optional)
Only compute prediction explanations for predictions below this threshold. Can be combined with threshold_high.
- explanations_modePredictionExplanationsMode, optional
Mode of prediction explanations calculation for multiclass and clustering models, if not specified - server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).
- prediction_warning_enabledboolean (optional)
Add prediction warnings to the scored data. Currently only supported for regression models.
- include_prediction_statusboolean (optional)
Include the prediction_status column in the output, defaults to False.
- skip_drift_trackingboolean (optional)
Skips drift tracking on any predictions made from this job. This is useful when running non-production workloads to not affect drift tracking and cause unnecessary alerts. Defaults to False.
- prediction_instancedict (optional)
Defaults to instance specified by deployment or system configuration. Supported options:
hostName : string
sslEnabled : boolean (optional, default true). Set to false to run prediction requests from the batch prediction job without SSL.
datarobotKey : string (optional), if running a job against a prediction instance in the Managed AI Cloud, you must provide the organization level DataRobot-Key
apiKey : string (optional), by default, prediction requests will use the API key of the user that created the job. This allows you to make requests on behalf of other users.
- abort_on_errorboolean (optional)
Default behavior is to abort the job if too many rows fail scoring. This will free up resources for other jobs that may score successfully. Set to false to unconditionally score every row no matter how many errors are encountered. Defaults to True.
- column_names_remappingdict (optional)
Mapping with column renaming for output table. Defaults to {}.
- include_probabilitiesboolean (optional)
Flag that enables returning of all probability columns. Defaults to True.
- include_probabilities_classeslist (optional)
List the subset of classes if a user doesn’t want all the classes. Defaults to [].
- download_timeoutint (optional)
New in version 2.22.
If using localFile output, wait this many seconds for the download to become available. See download().
- download_read_timeoutint (optional, default 660)
New in version 2.22.
If using localFile output, wait this many seconds for the server to respond between chunks.
- upload_read_timeout: int (optional, default 600)
New in version 2.28.
If using localFile intake, wait this many seconds for the server to respond after whole dataset upload.
- prediction_threshold: float (optional)
New in version 3.4.0.
Threshold is the point that sets the class boundary for a predicted value. The model classifies an observation below the threshold as FALSE, and an observation above the threshold as TRUE. In other words, DataRobot automatically assigns the positive class label to any prediction exceeding the threshold. This value can be set between 0.0 and 1.0.
- Return type
- classmethod apply_time_series_data_prep_and_score(deployment, intake_settings, timeseries_settings, **kwargs)¶
Prepare the dataset with time series data prep, create new batch prediction job, upload the scoring dataset, and return a batch prediction job.
The supported intake_settings are of type localFile or dataset.
For timeseries_settings of type forecast the forecast_point must be specified.
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters.New in version v3.1.
- Returns
- BatchPredictionJob
Instance of BatchPredictionJob
- Raises
- InvalidUsageError
If the deployment does not support time series data prep. If the intake type is not supported for time series data prep.
- Attributes
- deploymentDeployment
Deployment which will be used for scoring.
- intake_settingsdict
A dict configuring where data is coming from. Supported options:
type : string, either localFile, dataset
Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a
Dataset
object.To score from a local file, add this parameter to the settings:
file : file-like object, string path to file or a pandas.DataFrame of scoring data.
- timeseries_settingsdict
Configuration for time-series scoring. Supported options:
type : string, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.
forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions. Must be passed if
timeseries_settings.type=forecast
.predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- Return type
- classmethod score_to_file(deployment, intake_path, output_path, **kwargs)¶
Create new batch prediction job, upload the scoring dataset and download the scored CSV file concurrently.
Will block until the entire file is scored.
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters.- Returns
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes
- deploymentDeployment or string ID
Deployment which will be used for scoring.
- intake_pathfile-like object/string path to file/pandas.DataFrame
Scoring data
- output_pathstr
Filename to save the result under
- classmethod apply_time_series_data_prep_and_score_to_file(deployment, intake_path, output_path, timeseries_settings, **kwargs)¶
Prepare the input dataset with time series data prep. Then, create a new batch prediction job using the prepared AI catalog item as input and concurrently download the scored CSV file.
The function call will return when the entire file is scored.
For timeseries_settings of type forecast the forecast_point must be specified.
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters.New in version v3.1.
- Returns
- BatchPredictionJob
Instance of BatchPredictionJob.
- Raises
- InvalidUsageError
If the deployment does not support time series data prep.
- Attributes
- deploymentDeployment
The deployment which will be used for scoring.
- intake_pathfile-like object/string path to file/pandas.DataFrame
The scoring data.
- output_pathstr
The filename under which you save the result.
- timeseries_settingsdict
Configuration for time-series scoring. Supported options:
type : string, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.
forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions. Must be passed if
timeseries_settings.type=forecast
.predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- Return type
- classmethod score_s3(deployment, source_url, destination_url, credential=None, endpoint_url=None, **kwargs)¶
Create new batch prediction job, with a scoring dataset from S3 and writing the result back to S3.
This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job)
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters.- Returns
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes
- deploymentDeployment or string ID
Deployment which will be used for scoring.
- source_urlstring
The URL for the prediction dataset (e.g.: s3://bucket/key)
- destination_urlstring
The URL for the scored dataset (e.g.: s3://bucket/key)
- credentialstring or Credential (optional)
The AWS Credential object or credential id
- endpoint_urlstring (optional)
Any non-default endpoint URL for S3 access (omit to use the default)
- classmethod score_azure(deployment, source_url, destination_url, credential=None, **kwargs)¶
Create new batch prediction job, with a scoring dataset from Azure blob storage and writing the result back to Azure blob storage.
This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job).
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters.- Returns
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes
- deploymentDeployment or string ID
Deployment which will be used for scoring.
- source_urlstring
The URL for the prediction dataset (e.g.: https://storage_account.blob.endpoint/container/blob_name)
- destination_urlstring
The URL for the scored dataset (e.g.: https://storage_account.blob.endpoint/container/blob_name)
- credentialstring or Credential (optional)
The Azure Credential object or credential id
- classmethod score_gcp(deployment, source_url, destination_url, credential=None, **kwargs)¶
Create new batch prediction job, with a scoring dataset from Google Cloud Storage and writing the result back to one.
This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job).
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters.- Returns
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes
- deploymentDeployment or string ID
Deployment which will be used for scoring.
- source_urlstring
The URL for the prediction dataset (e.g.: http(s)://storage.googleapis.com/[bucket]/[object])
- destination_urlstring
The URL for the scored dataset (e.g.: http(s)://storage.googleapis.com/[bucket]/[object])
- credentialstring or Credential (optional)
The GCP Credential object or credential id
- classmethod score_from_existing(batch_prediction_job_id)¶
Create a new batch prediction job based on the settings from a previously created one
- Returns
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes
- batch_prediction_job_id: str
ID of the previous batch prediction job
- Return type
- classmethod score_pandas(deployment, df, read_timeout=660, **kwargs)¶
Run a batch prediction job, with a scoring dataset from a pandas dataframe. The output from the prediction will be joined to the passed DataFrame and returned.
Use columnNamesRemapping to drop or rename columns in the output
This method blocks until the job has completed or raises an exception on errors.
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters.- Returns
- BatchPredictionJob
Instance of BatchPredictonJob
- pandas.DataFrame
The original dataframe merged with the predictions
- Attributes
- deploymentDeployment or string ID
Deployment which will be used for scoring.
- dfpandas.DataFrame
The dataframe to score
- Return type
Tuple
[BatchPredictionJob
,DataFrame
]
- classmethod score_with_leaderboard_model(model, intake_settings=None, output_settings=None, csv_settings=None, timeseries_settings=None, passthrough_columns=None, passthrough_columns_set=None, max_explanations=None, max_ngram_explanations=None, explanation_algorithm=None, threshold_high=None, threshold_low=None, prediction_threshold=None, prediction_warning_enabled=None, include_prediction_status=False, abort_on_error=True, column_names_remapping=None, include_probabilities=True, include_probabilities_classes=None, download_timeout=120, download_read_timeout=660, upload_read_timeout=600, explanations_mode=None)¶
Creates a new batch prediction job for a Leaderboard model by uploading the scoring dataset. Returns a batch prediction job.
The default intake and output options are both localFile, which requires the caller to pass the file parameter and either download the results using the download() method afterwards or pass a path to a file where the scored data will be downloaded to.
- Returns
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes
- modelModel or DatetimeModel or string ID
Model which will be used for scoring.
- intake_settingsdict (optional)
A dict configuring how data is coming from. Supported options:
type : string, either localFile, dataset, or dss.
Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a dr.Dataset object.
To score from a local file, add the this parameter to the settings:
file : file-like object, string path to file or a pandas.DataFrame of scoring data.
To score subset of training data, use dss intake type and specify following parameters:
project_id : project to fetch training data from. Access to project is required.
partition : subset of training data to score, one of
datarobot.enums.TrainingDataSubsets
.
- output_settingsdict (optional)
A dict configuring how scored data is to be saved. Supported options:
type : string, localFile
To save scored data to a local file, add this parameters to the settings:
path : string (optional) The path to save the scored data as a CSV file. If a path is not specified, you must download the scored data yourself with job.download(). If a path is specified, the call is blocked until the job is done. If there are no other jobs currently processing for the targeted prediction instance, uploading, scoring, and downloading will happen in parallel without waiting for a full job to complete. Otherwise, it will still block, but start downloading the scored data as soon as it starts generating data. This is the fastest method to get predictions.
- csv_settingsdict (optional)
CSV intake and output settings. Supported options:
delimiter : string (optional, default ,), fields are delimited by this character. Use the string tab to denote TSV (TAB separated values). Must be either a one-character string or the string tab.
quotechar : string (optional, default “), fields containing the delimiter must be quoted using this character.
encoding : string (optional, default utf-8), encoding for the CSV files. For example (but not limited to): shift_jis, latin_1 or mskanji.
- timeseries_settingsdict (optional)
Configuration for time-series scoring. Supported options:
type : string, must be forecast, historical (default if not passed is forecast), or training. forecast mode makes predictions using forecast_point or rows in the dataset without target. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range. training mode is a special case for predictions on subsets of training data. Note, that it must be used in conjunction with dss intake type only.
forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions, by default value will be inferred from the dataset. May be passed if
timeseries_settings.type=forecast
.predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- passthrough_columnslist[string] (optional)
Keep these columns from the scoring dataset in the scored dataset. This is useful for correlating predictions with source data.
- passthrough_columns_setstring (optional)
To pass through every column from the scoring dataset, set this to all. Takes precedence over passthrough_columns if set.
- max_explanationsint (optional)
Compute prediction explanations for this amount of features.
- max_ngram_explanationsint or str (optional)
Compute text explanations for this amount of ngrams. Set to all to return all ngram explanations, or set to a positive integer value to limit the amount of ngram explanations returned. By default no ngram explanations will be computed and returned.
- threshold_highfloat (optional)
Only compute prediction explanations for predictions above this threshold. Can be combined with threshold_low.
- threshold_lowfloat (optional)
Only compute prediction explanations for predictions below this threshold. Can be combined with threshold_high.
- explanations_modePredictionExplanationsMode, optional
Mode of prediction explanations calculation for multiclass and clustering models, if not specified - server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).
- prediction_warning_enabledboolean (optional)
Add prediction warnings to the scored data. Currently only supported for regression models.
- include_prediction_statusboolean (optional)
Include the prediction_status column in the output, defaults to False.
- abort_on_errorboolean (optional)
Default behavior is to abort the job if too many rows fail scoring. This will free up resources for other jobs that may score successfully. Set to false to unconditionally score every row no matter how many errors are encountered. Defaults to True.
- column_names_remappingdict (optional)
Mapping with column renaming for output table. Defaults to {}.
- include_probabilitiesboolean (optional)
Flag that enables returning of all probability columns. Defaults to True.
- include_probabilities_classeslist (optional)
List the subset of classes if you do not want all the classes. Defaults to [].
- download_timeoutint (optional)
New in version 2.22.
If using localFile output, wait this many seconds for the download to become available. See download().
- download_read_timeoutint (optional, default 660)
New in version 2.22.
If using localFile output, wait this many seconds for the server to respond between chunks.
- upload_read_timeout: int (optional, default 600)
New in version 2.28.
If using localFile intake, wait this many seconds for the server to respond after whole dataset upload.
- prediction_threshold: float (optional)
New in version 3.4.0.
Threshold is the point that sets the class boundary for a predicted value. The model classifies an observation below the threshold as FALSE, and an observation above the threshold as TRUE. In other words, DataRobot automatically assigns the positive class label to any prediction exceeding the threshold. This value can be set between 0.0 and 1.0.
- Return type
- classmethod get(batch_prediction_job_id)¶
Get batch prediction job
- Returns
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes
- batch_prediction_job_id: str
ID of batch prediction job
- Return type
- download(fileobj, timeout=120, read_timeout=660)¶
Downloads the CSV result of a prediction job
- Attributes
- fileobj: A file-like object where the CSV prediction results will be
written to. Examples include an in-memory buffer (e.g., io.BytesIO) or a file on disk (opened for binary writing).
- timeoutint (optional, default 120)
New in version 2.22.
Seconds to wait for the download to become available.
The download will not be available before the job has started processing. In case other jobs are occupying the queue, processing may not start immediately.
If the timeout is reached, the job will be aborted and RuntimeError is raised.
Set to -1 to wait infinitely.
- read_timeoutint (optional, default 660)
New in version 2.22.
Seconds to wait for the server to respond between chunks.
- Return type
None
- delete(ignore_404_errors=False)¶
Cancel this job. If this job has not finished running, it will be removed and canceled.
- Return type
None
- get_status()¶
Get status of batch prediction job
- Returns
- BatchPredictionJob status data
Dict with job status
- classmethod list_by_status(statuses=None)¶
Get jobs collection for specific set of statuses
- Returns
- BatchPredictionJob statuses
List of job statuses dicts with specific statuses
- Attributes
- statuses
List of statuses to filter jobs ([ABORTED|COMPLETED…]) if statuses is not provided, returns all jobs for user
- Return type
List
[BatchPredictionJob
]
- class datarobot.models.BatchPredictionJobDefinition(id=None, name=None, enabled=None, schedule=None, batch_prediction_job=None, created=None, updated=None, created_by=None, updated_by=None, last_failed_run_time=None, last_successful_run_time=None, last_started_job_status=None, last_scheduled_run_time=None)¶
- classmethod get(batch_prediction_job_definition_id)¶
Get batch prediction job definition
- Returns
- BatchPredictionJobDefinition
Instance of BatchPredictionJobDefinition
Examples
>>> import datarobot as dr >>> definition = dr.BatchPredictionJobDefinition.get('5a8ac9ab07a57a0001be501f') >>> definition BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
- Attributes
- batch_prediction_job_definition_id: str
ID of batch prediction job definition
- Return type
- classmethod list(search_name=None, deployment_id=None, limit=<datarobot.models.batch_prediction_job.MissingType object>, offset=0)¶
Get job all definitions
- Parameters
- search_namestr, optional
String for filtering job definitions Job definitions that contain the string in name will be returned. If not specified, all available job definitions will be returned.
- deployment_id: str
The ID of the deployment record belongs to.
- limit: int, optional
0 by default. At most this many results are returned.
- offset: int, optional
This many results will be skipped.
- Returns
- List[BatchPredictionJobDefinition]
List of job definitions the user has access to see
Examples
>>> import datarobot as dr >>> definition = dr.BatchPredictionJobDefinition.list() >>> definition [ BatchPredictionJobDefinition(60912e09fd1f04e832a575c1), BatchPredictionJobDefinition(6086ba053f3ef731e81af3ca) ]
- Return type
- classmethod create(enabled, batch_prediction_job, name=None, schedule=None)¶
Creates a new batch prediction job definition to be run either at scheduled interval or as a manual run.
- Returns
- BatchPredictionJobDefinition
Instance of BatchPredictionJobDefinition
Examples
>>> import datarobot as dr >>> job_spec = { ... "num_concurrent": 4, ... "deployment_id": "foobar", ... "intake_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ... "output_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ...} >>> schedule = { ... "day_of_week": [ ... 1 ... ], ... "month": [ ... "*" ... ], ... "hour": [ ... 16 ... ], ... "minute": [ ... 0 ... ], ... "day_of_month": [ ... 1 ... ] ...} >>> definition = BatchPredictionJobDefinition.create( ... enabled=False, ... batch_prediction_job=job_spec, ... name="some_definition_name", ... schedule=schedule ... ) >>> definition BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
- Attributes
- enabledbool (default False)
Whether or not the definition should be active on a scheduled basis. If True, schedule is required.
- batch_prediction_job: dict
The job specifications for your batch prediction job. It requires the same job input parameters as used with
score()
, only it will not initialize a job scoring, only store it as a definition for later use.- namestring (optional)
The name you want your job to be identified with. Must be unique across the organization’s existing jobs. If you don’t supply a name, a random one will be generated for you.
- scheduledict (optional)
The
schedule
payload defines at what intervals the job should run, which can be combined in various ways to construct complex scheduling terms if needed. In all of the elements in the objects, you can supply either an asterisk["*"]
denoting “every” time denomination or an array of integers (e.g.[1, 2, 3]
) to define a specific interval.The
schedule
payload is split up in the following items:Minute:
The minute(s) of the day that the job will run. Allowed values are either
["*"]
meaning every minute of the day or[0 ... 59]
Hour: The hour(s) of the day that the job will run. Allowed values are either
["*"]
meaning every hour of the day or[0 ... 23]
.Day of Month: The date(s) of the month that the job will run. Allowed values are either
[1 ... 31]
or["*"]
for all days of the month. This field is additive withdayOfWeek
, meaning the job will run both on the date(s) defined in this field and the day specified bydayOfWeek
(for example, dates 1st, 2nd, 3rd, plus every Tuesday). IfdayOfMonth
is set to["*"]
anddayOfWeek
is defined, the scheduler will trigger on every day of the month that matchesdayOfWeek
(for example, Tuesday the 2nd, 9th, 16th, 23rd, 30th). Invalid dates such as February 31st are ignored.Month: The month(s) of the year that the job will run. Allowed values are either
[1 ... 12]
or["*"]
for all months of the year. Strings, either 3-letter abbreviations or the full name of the month, can be used interchangeably (e.g., “jan” or “october”). Months that are not compatible withdayOfMonth
are ignored, for example{"dayOfMonth": [31], "month":["feb"]}
Day of Week: The day(s) of the week that the job will run. Allowed values are
[0 .. 6]
, where (Sunday=0), or["*"]
, for all days of the week. Strings, either 3-letter abbreviations or the full name of the day, can be used interchangeably (e.g., “sunday”, “Sunday”, “sun”, or “Sun”, all map to[0]
. This field is additive withdayOfMonth
, meaning the job will run both on the date specified bydayOfMonth
and the day defined in this field.
- Return type
- update(enabled, batch_prediction_job=None, name=None, schedule=None)¶
Updates a job definition with the changed specs.
Takes the same input as
create()
- Returns
- BatchPredictionJobDefinition
Instance of the updated BatchPredictionJobDefinition
Examples
>>> import datarobot as dr >>> job_spec = { ... "num_concurrent": 5, ... "deployment_id": "foobar_new", ... "intake_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ... "output_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ...} >>> schedule = { ... "day_of_week": [ ... 1 ... ], ... "month": [ ... "*" ... ], ... "hour": [ ... "*" ... ], ... "minute": [ ... 30, 59 ... ], ... "day_of_month": [ ... 1, 2, 6 ... ] ...} >>> definition = BatchPredictionJobDefinition.create( ... enabled=False, ... batch_prediction_job=job_spec, ... name="updated_definition_name", ... schedule=schedule ... ) >>> definition BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
- Attributes
- Return type
- run_on_schedule(schedule)¶
Sets the run schedule of an already created job definition.
If the job was previously not enabled, this will also set the job to enabled.
- Returns
- BatchPredictionJobDefinition
Instance of the updated BatchPredictionJobDefinition with the new / updated schedule.
Examples
>>> import datarobot as dr >>> definition = dr.BatchPredictionJobDefinition.create('...') >>> schedule = { ... "day_of_week": [ ... 1 ... ], ... "month": [ ... "*" ... ], ... "hour": [ ... "*" ... ], ... "minute": [ ... 30, 59 ... ], ... "day_of_month": [ ... 1, 2, 6 ... ] ...} >>> definition.run_on_schedule(schedule) BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
- Attributes
- scheduledict
Same as
schedule
increate()
.
- Return type
- run_once()¶
Manually submits a batch prediction job to the queue, based off of an already created job definition.
- Returns
- BatchPredictionJob
Instance of BatchPredictionJob
Examples
>>> import datarobot as dr >>> definition = dr.BatchPredictionJobDefinition.create('...') >>> job = definition.run_once() >>> job.wait_for_completion()
- Return type
- delete()¶
Deletes the job definition and disables any future schedules of this job if any. If a scheduled job is currently running, this will not be cancelled.
Examples
>>> import datarobot as dr >>> definition = dr.BatchPredictionJobDefinition.get('5a8ac9ab07a57a0001be501f') >>> definition.delete()
- Return type
None
Batch Monitoring¶
- class datarobot.models.BatchMonitoringJob(data, completed_resource_url=None)¶
A Batch Monitoring Job is used to monitor data sets outside DataRobot app.
- Attributes
- idstr
the id of the job
- classmethod get(project_id, job_id)¶
Get batch monitoring job
- Returns
- BatchMonitoringJob
Instance of BatchMonitoringJob
- Attributes
- job_id: str
ID of batch job
- Return type
- download(fileobj, timeout=120, read_timeout=660)¶
Downloads the results of a monitoring job as a CSV.
- Attributes
- fileobj: A file-like object where the CSV monitoring results will be
written to. Examples include an in-memory buffer (e.g., io.BytesIO) or a file on disk (opened for binary writing).
- timeoutint (optional, default 120)
Seconds to wait for the download to become available.
The download will not be available before the job has started processing. In case other jobs are occupying the queue, processing may not start immediately.
If the timeout is reached, the job will be aborted and RuntimeError is raised.
Set to -1 to wait infinitely.
- read_timeoutint (optional, default 660)
Seconds to wait for the server to respond between chunks.
- Return type
None
- classmethod run(deployment, intake_settings=None, output_settings=None, csv_settings=None, num_concurrent=None, chunk_size=None, abort_on_error=True, monitoring_aggregation=None, monitoring_columns=None, monitoring_output_settings=None, download_timeout=120, download_read_timeout=660, upload_read_timeout=600)¶
Create new batch monitoring job, upload the dataset, and return a batch monitoring job.
- Returns
- BatchMonitoringJob
Instance of BatchMonitoringJob
Examples
>>> import datarobot as dr >>> job_spec = { ... "intake_settings": { ... "type": "jdbc", ... "data_store_id": "645043933d4fbc3215f17e34", ... "catalog": "SANDBOX", ... "table": "10kDiabetes_output_actuals", ... "schema": "SCORING_CODE_UDF_SCHEMA", ... "credential_id": "645043b61a158045f66fb329" ... }, >>> "monitoring_columns": { ... "predictions_columns": [ ... { ... "class_name": "True", ... "column_name": "readmitted_True_PREDICTION" ... }, ... { ... "class_name": "False", ... "column_name": "readmitted_False_PREDICTION" ... } ... ], ... "association_id_column": "rowID", ... "actuals_value_column": "ACTUALS" ... } ... } >>> deployment_id = "foobar" >>> job = dr.BatchMonitoringJob.run(deployment_id, **job_spec) >>> job.wait_for_completion()
- Attributes
- deploymentDeployment or string ID
Deployment which will be used for monitoring.
- intake_settingsdict
A dict configuring how data is coming from. Supported options:
type : string, either localFile, s3, azure, gcp, dataset, jdbc snowflake, synapse or bigquery
Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a dr.Dataset object.
To monitor from a local file, add this parameter to the settings:
file : A file-like object, string path to a file or a pandas.DataFrame of scoring data.
To monitor from S3, add the next parameters to the settings:
url : string, the URL to score (e.g.: s3://bucket/key).
credential_id : string (optional).
endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default).
To monitor from JDBC, add the next parameters to the settings:
data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).
query : string (optional if table, schema and/or catalog is specified), a self-supplied SELECT statement of the data set you wish to predict.
table : string (optional if query is specified), the name of specified database table.
schema : string (optional if query is specified), the name of specified database schema.
catalog : string (optional if query is specified), (new in v2.22) the name of specified database catalog.
fetch_size : int (optional), Changing the fetchSize can be used to balance throughput and memory usage.
credential_id : string (optional) the ID of the credentials holding information about a user with read-access to the JDBC data source (see Credentials).
- output_settingsdict (optional)
A dict configuring how monitored data is to be saved. Supported options:
type : string, either localFile, s3, azure, gcp, jdbc, snowflake, synapse or bigquery
To save monitored data to a local file, add parameters to the settings:
path : string (optional), path to save the scored data as CSV. If a path is not specified, you must download the scored data yourself with job.download(). If a path is specified, the call will block until the job is done. if there are no other jobs currently processing for the targeted prediction instance, uploading, scoring, downloading will happen in parallel without waiting for a full job to complete. Otherwise, it will still block, but start downloading the scored data as soon as it starts generating data. This is the fastest method to get predictions.
To save monitored data to S3, add the next parameters to the settings:
url : string, the URL for storing the results (e.g.: s3://bucket/key).
credential_id : string (optional).
endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default).
To save monitored data to JDBC, add the next parameters to the settings:
data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).
table : string, the name of specified database table.
schema : string (optional), the name of specified database schema.
catalog : string (optional), (new in v2.22) the name of specified database catalog.
statement_type : string, the type of insertion statement to create, one of
datarobot.enums.AVAILABLE_STATEMENT_TYPES
.update_columns : list(string) (optional), a list of strings containing those column names to be updated in case statement_type is set to a value related to update or upsert.
where_columns : list(string) (optional), a list of strings containing those column names to be selected in case statement_type is set to a value related to insert or update.
credential_id : string, the ID of the credentials holding information about a user with write-access to the JDBC data source (see Credentials).
create_table_if_not_exists : bool (optional), If no existing table is detected, attempt to create it before writing data with the strategy defined in the statementType parameter.
- csv_settingsdict (optional)
CSV intake and output settings. Supported options:
delimiter : string (optional, default ,), fields are delimited by this character. Use the string tab to denote TSV (TAB separated values). Must be either a one-character string or the string tab.
quotechar : string (optional, default “), fields containing the delimiter must be quoted using this character.
encoding : string (optional, default utf-8), encoding for the CSV files. For example (but not limited to): shift_jis, latin_1 or mskanji.
- num_concurrentint (optional)
Number of concurrent chunks to score simultaneously. Defaults to the available number of cores of the deployment. Lower it to leave resources for real-time scoring.
- chunk_sizestring or int (optional)
Which strategy should be used to determine the chunk size. Can be either a named strategy or a fixed size in bytes. - auto: use fixed or dynamic based on flipper. - fixed: use 1MB for explanations, 5MB for regular requests. - dynamic: use dynamic chunk sizes. - int: use this many bytes per chunk.
- abort_on_errorboolean (optional)
Default behavior is to abort the job if too many rows fail scoring. This will free up resources for other jobs that may score successfully. Set to false to unconditionally score every row no matter how many errors are encountered. Defaults to True.
- download_timeoutint (optional)
New in version 2.22.
If using localFile output, wait this many seconds for the download to become available. See download().
- download_read_timeoutint (optional, default 660)
New in version 2.22.
If using localFile output, wait this many seconds for the server to respond between chunks.
- upload_read_timeout: int (optional, default 600)
New in version 2.28.
If using localFile intake, wait this many seconds for the server to respond after whole dataset upload.
- Return type
- cancel(ignore_404_errors=False)¶
Cancel this job. If this job has not finished running, it will be removed and canceled.
- Return type
None
- get_status()¶
Get status of batch monitoring job
- Returns
- BatchMonitoringJob status data
Dict with job status
- Return type
Any
- class datarobot.models.BatchMonitoringJobDefinition(id=None, name=None, enabled=None, schedule=None, batch_monitoring_job=None, created=None, updated=None, created_by=None, updated_by=None, last_failed_run_time=None, last_successful_run_time=None, last_started_job_status=None, last_scheduled_run_time=None)¶
- classmethod get(batch_monitoring_job_definition_id)¶
Get batch monitoring job definition
- Returns
- BatchMonitoringJobDefinition
Instance of BatchMonitoringJobDefinition
Examples
>>> import datarobot as dr >>> definition = dr.BatchMonitoringJobDefinition.get('5a8ac9ab07a57a0001be501f') >>> definition BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1)
- Attributes
- batch_monitoring_job_definition_id: str
ID of batch monitoring job definition
- Return type
- classmethod list()¶
Get job all monitoring job definitions
- Returns
- List[BatchMonitoringJobDefinition]
List of job definitions the user has access to see
Examples
>>> import datarobot as dr >>> definition = dr.BatchMonitoringJobDefinition.list() >>> definition [ BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1), BatchMonitoringJobDefinition(6086ba053f3ef731e81af3ca) ]
- Return type
- classmethod create(enabled, batch_monitoring_job, name=None, schedule=None)¶
Creates a new batch monitoring job definition to be run either at scheduled interval or as a manual run.
- Returns
- BatchMonitoringJobDefinition
Instance of BatchMonitoringJobDefinition
Examples
>>> import datarobot as dr >>> job_spec = { ... "num_concurrent": 4, ... "deployment_id": "foobar", ... "intake_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ... "output_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ...} >>> schedule = { ... "day_of_week": [ ... 1 ... ], ... "month": [ ... "*" ... ], ... "hour": [ ... 16 ... ], ... "minute": [ ... 0 ... ], ... "day_of_month": [ ... 1 ... ] ...} >>> definition = BatchMonitoringJobDefinition.create( ... enabled=False, ... batch_monitoring_job=job_spec, ... name="some_definition_name", ... schedule=schedule ... ) >>> definition BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1)
- Attributes
- enabledbool (default False)
Whether the definition should be active on a scheduled basis. If True, schedule is required.
- batch_monitoring_job: dict
The job specifications for your batch monitoring job. It requires the same job input parameters as used with BatchMonitoringJob
- namestring (optional)
The name you want your job to be identified with. Must be unique across the organization’s existing jobs. If you don’t supply a name, a random one will be generated for you.
- scheduledict (optional)
The
schedule
payload defines at what intervals the job should run, which can be combined in various ways to construct complex scheduling terms if needed. In all the elements in the objects, you can supply either an asterisk["*"]
denoting “every” time denomination or an array of integers (e.g.[1, 2, 3]
) to define a specific interval.The
schedule
payload is split up in the following items:Minute:
The minute(s) of the day that the job will run. Allowed values are either
["*"]
meaning every minute of the day or[0 ... 59]
Hour: The hour(s) of the day that the job will run. Allowed values are either
["*"]
meaning every hour of the day or[0 ... 23]
.Day of Month: The date(s) of the month that the job will run. Allowed values are either
[1 ... 31]
or["*"]
for all days of the month. This field is additive withdayOfWeek
, meaning the job will run both on the date(s) defined in this field and the day specified bydayOfWeek
(for example, dates 1st, 2nd, 3rd, plus every Tuesday). IfdayOfMonth
is set to["*"]
anddayOfWeek
is defined, the scheduler will trigger on every day of the month that matchesdayOfWeek
(for example, Tuesday the 2nd, 9th, 16th, 23rd, 30th). Invalid dates such as February 31st are ignored.Month: The month(s) of the year that the job will run. Allowed values are either
[1 ... 12]
or["*"]
for all months of the year. Strings, either 3-letter abbreviations or the full name of the month, can be used interchangeably (e.g., “jan” or “october”). Months that are not compatible withdayOfMonth
are ignored, for example{"dayOfMonth": [31], "month":["feb"]}
Day of Week: The day(s) of the week that the job will run. Allowed values are
[0 .. 6]
, where (Sunday=0), or["*"]
, for all days of the week. Strings, either 3-letter abbreviations or the full name of the day, can be used interchangeably (e.g., “sunday”, “Sunday”, “sun”, or “Sun”, all map to[0]
. This field is additive withdayOfMonth
, meaning the job will run both on the date specified bydayOfMonth
and the day defined in this field.
- Return type
- update(enabled, batch_monitoring_job=None, name=None, schedule=None)¶
Updates a job definition with the changed specs.
Takes the same input as
create()
- Returns
- BatchMonitoringJobDefinition
Instance of the updated BatchMonitoringJobDefinition
Examples
>>> import datarobot as dr >>> job_spec = { ... "num_concurrent": 5, ... "deployment_id": "foobar_new", ... "intake_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ... "output_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ...} >>> schedule = { ... "day_of_week": [ ... 1 ... ], ... "month": [ ... "*" ... ], ... "hour": [ ... "*" ... ], ... "minute": [ ... 30, 59 ... ], ... "day_of_month": [ ... 1, 2, 6 ... ] ...} >>> definition = BatchMonitoringJobDefinition.create( ... enabled=False, ... batch_monitoring_job=job_spec, ... name="updated_definition_name", ... schedule=schedule ... ) >>> definition BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1)
- Attributes
- Return type
- run_on_schedule(schedule)¶
Sets the run schedule of an already created job definition.
If the job was previously not enabled, this will also set the job to enabled.
- Returns
- BatchMonitoringJobDefinition
Instance of the updated BatchMonitoringJobDefinition with the new / updated schedule.
Examples
>>> import datarobot as dr >>> definition = dr.BatchMonitoringJobDefinition.create('...') >>> schedule = { ... "day_of_week": [ ... 1 ... ], ... "month": [ ... "*" ... ], ... "hour": [ ... "*" ... ], ... "minute": [ ... 30, 59 ... ], ... "day_of_month": [ ... 1, 2, 6 ... ] ...} >>> definition.run_on_schedule(schedule) BatchMonitoringJobDefinition(60912e09fd1f04e832a575c1)
- Attributes
- scheduledict
Same as
schedule
increate()
.
- Return type
- run_once()¶
Manually submits a batch monitoring job to the queue, based off of an already created job definition.
- Returns
- BatchMonitoringJob
Instance of BatchMonitoringJob
Examples
>>> import datarobot as dr >>> definition = dr.BatchMonitoringJobDefinition.create('...') >>> job = definition.run_once() >>> job.wait_for_completion()
- Return type
- delete()¶
Deletes the job definition and disables any future schedules of this job if any. If a scheduled job is currently running, this will not be cancelled.
Examples
>>> import datarobot as dr >>> definition = dr.BatchMonitoringJobDefinition.get('5a8ac9ab07a57a0001be501f') >>> definition.delete()
- Return type
None
Status Check Job¶
- class datarobot.models.StatusCheckJob(job_id, resource_type=None)¶
Tracks asynchronous task status
- Attributes
- job_idstr
The ID of the status the job belongs to.
- wait_for_completion(max_wait=600)¶
Waits for job to complete.
- Parameters
- max_waitint, optional
How long to wait for the job to finish. If the time expires, DataRobot returns the current status.
- Returns
- statusJobStatusResult
Returns the current status of the job.
- Return type
- get_status()¶
Retrieve JobStatusResult object with the latest job status data from the server.
- Return type
- class datarobot.models.JobStatusResult(status: Optional[str], status_id: Optional[str], completed_resource_url: Optional[str], message: Optional[str])¶
This class represents a result of status check for submitted async jobs.
- status: Optional[str]¶
Alias for field number 0
- status_id: Optional[str]¶
Alias for field number 1
- completed_resource_url: Optional[str]¶
Alias for field number 2
- message: Optional[str]¶
Alias for field number 3
Blueprint¶
- class datarobot.models.Blueprint(id=None, processes=None, model_type=None, project_id=None, blueprint_category=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, recommended_featurelist_id=None, supports_composable_ml=None, supports_incremental_learning=None)¶
A Blueprint which can be used to fit models
- Attributes
- idstr
the id of the blueprint
- processeslist of str
the processes used by the blueprint
- model_typestr
the model produced by the blueprint
- project_idstr
the project the blueprint belongs to
- blueprint_categorystr
(New in version v2.6) Describes the category of the blueprint and the kind of model it produces.
- recommended_featurelist_id: str or null
(New in v2.18) The ID of the feature list recommended for this blueprint. If this field is not present, then there is no recommended feature list.
- supports_composable_mlbool or None
(New in version v2.26) whether this blueprint is supported in the Composable ML.
- supports_incremental_learningbool or None
(New in version v3.3) whether this blueprint supports incremental learning.
- classmethod get(project_id, blueprint_id)¶
Retrieve a blueprint.
- Parameters
- project_idstr
The project’s id.
- blueprint_idstr
Id of blueprint to retrieve.
- Returns
- blueprintBlueprint
The queried blueprint.
- Return type
- get_json()¶
Get the blueprint json representation used by this model.
- Returns
- BlueprintJson
Json representation of the blueprint stages.
- Return type
Dict
[str
,Tuple
[List
[str
],List
[str
],str
]]
- get_chart()¶
Retrieve a chart.
- Returns
- BlueprintChart
The current blueprint chart.
- Return type
- get_documents()¶
Get documentation for tasks used in the blueprint.
- Returns
- list of BlueprintTaskDocument
All documents available for blueprint.
- Return type
List
[BlueprintTaskDocument
]
- classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters
- datadict
Correctly snake_cased keys and their values.
- Return type
TypeVar
(T
, bound=APIObject
)
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
TypeVar
(T
, bound=APIObject
)
- class datarobot.models.BlueprintTaskDocument(title=None, task=None, description=None, parameters=None, links=None, references=None)¶
Document describing a task from a blueprint.
- Attributes
- titlestr
Title of document.
- taskstr
Name of the task described in document.
- descriptionstr
Task description.
- parameterslist of dict(name, type, description)
Parameters that task can receive in human-readable format.
- linkslist of dict(name, url)
External links used in document
- referenceslist of dict(name, url)
References used in document. When no link available url equals None.
- class datarobot.models.BlueprintChart(nodes, edges)¶
A Blueprint chart that can be used to understand data flow in blueprint.
- Attributes
- nodeslist of dict (id, label)
Chart nodes, id unique in chart.
- edgeslist of tuple (id1, id2)
Directions of data flow between blueprint chart nodes.
- classmethod get(project_id, blueprint_id)¶
Retrieve a blueprint chart.
- Parameters
- project_idstr
The project’s id.
- blueprint_idstr
Id of blueprint to retrieve chart.
- Returns
- BlueprintChart
The queried blueprint chart.
- Return type
- to_graphviz()¶
Get blueprint chart in graphviz DOT format.
- Returns
- unicode
String representation of chart in graphviz DOT language.
- Return type
str
- class datarobot.models.ModelBlueprintChart(nodes, edges)¶
A Blueprint chart that can be used to understand data flow in model. Model blueprint chart represents reduced repository blueprint chart with only elements that used to build this particular model.
- Attributes
- nodeslist of dict (id, label)
Chart nodes, id unique in chart.
- edgeslist of tuple (id1, id2)
Directions of data flow between blueprint chart nodes.
- classmethod get(project_id, model_id)¶
Retrieve a model blueprint chart.
- Parameters
- project_idstr
The project’s id.
- model_idstr
Id of model to retrieve model blueprint chart.
- Returns
- ModelBlueprintChart
The queried model blueprint chart.
- Return type
- to_graphviz()¶
Get blueprint chart in graphviz DOT format.
- Returns
- unicode
String representation of chart in graphviz DOT language.
- Return type
str
Calendar File¶
- class datarobot.CalendarFile(calendar_end_date=None, calendar_start_date=None, created=None, id=None, name=None, num_event_types=None, num_events=None, project_ids=None, role=None, multiseries_id_columns=None)¶
Represents the data for a calendar file.
For more information about calendar files, see the calendar documentation.
- Attributes
- idstr
The id of the calendar file.
- calendar_start_datestr
The earliest date in the calendar.
- calendar_end_datestr
The last date in the calendar.
- createdstr
The date this calendar was created, i.e. uploaded to DR.
- namestr
The name of the calendar.
- num_event_typesint
The number of different event types.
- num_eventsint
The number of events this calendar has.
- project_idslist of strings
A list containing the projectIds of the projects using this calendar.
- multiseries_id_columns: list of str or None
A list of columns in calendar which uniquely identify events for different series. Currently, only one column is supported. If multiseries id columns are not provided, calendar is considered to be single series.
- rolestr
The access role the user has for this calendar.
- classmethod create(file_path, calendar_name=None, multiseries_id_columns=None)¶
Creates a calendar using the given file. For information about calendar files, see the calendar documentation
The provided file must be a CSV in the format:
Date, Event, Series ID, Event Duration <date>, <event_type>, <series id>, <event duration> <date>, <event_type>, , <event duration>
A header row is required, and the “Series ID” and “Event Duration” columns are optional.
Once the CalendarFile has been created, pass its ID with the
DatetimePartitioningSpecification
when setting the target for a time series project in order to use it.- Parameters
- file_pathstring
A string representing a path to a local csv file.
- calendar_namestring, optional
A name to assign to the calendar. Defaults to the name of the file if not provided.
- multiseries_id_columnslist of str or None
A list of the names of multiseries id columns to define which series an event belongs to. Currently only one multiseries id column is supported.
- Returns
- calendar_fileCalendarFile
Instance with initialized data.
- Raises
- AsyncProcessUnsuccessfulError
Raised if there was an error processing the provided calendar file.
Examples
# Creating a calendar with a specified name cal = dr.CalendarFile.create('/home/calendars/somecalendar.csv', calendar_name='Some Calendar Name') cal.id >>> 5c1d4904211c0a061bc93013 cal.name >>> Some Calendar Name # Creating a calendar without specifying a name cal = dr.CalendarFile.create('/home/calendars/somecalendar.csv') cal.id >>> 5c1d4904211c0a061bc93012 cal.name >>> somecalendar.csv # Creating a calendar with multiseries id columns cal = dr.CalendarFile.create('/home/calendars/somemultiseriescalendar.csv', calendar_name='Some Multiseries Calendar Name', multiseries_id_columns=['series_id']) cal.id >>> 5da9bb21962d746f97e4daee cal.name >>> Some Multiseries Calendar Name cal.multiseries_id_columns >>> ['series_id']
- Return type
- classmethod create_calendar_from_dataset(dataset_id, dataset_version_id=None, calendar_name=None, multiseries_id_columns=None, delete_on_error=False)¶
Creates a calendar using the given dataset. For information about calendar files, see the calendar documentation
The provided dataset have the following format:
Date, Event, Series ID, Event Duration <date>, <event_type>, <series id>, <event duration> <date>, <event_type>, , <event duration>
The “Series ID” and “Event Duration” columns are optional.
Once the CalendarFile has been created, pass its ID with the
DatetimePartitioningSpecification
when setting the target for a time series project in order to use it.- Parameters
- dataset_idstring
The identifier of the dataset from which to create the calendar.
- dataset_version_idstring, optional
The identifier of the dataset version from which to create the calendar.
- calendar_namestring, optional
A name to assign to the calendar. Defaults to the name of the dataset if not provided.
- multiseries_id_columnslist of str, optional
A list of the names of multiseries id columns to define which series an event belongs to. Currently only one multiseries id column is supported.
- delete_on_errorboolean, optional
Whether delete calendar file from Catalog if it’s not valid.
- Returns
- calendar_fileCalendarFile
Instance with initialized data.
- Raises
- AsyncProcessUnsuccessfulError
Raised if there was an error processing the provided calendar file.
Examples
# Creating a calendar from a dataset dataset = dr.Dataset.create_from_file('/home/calendars/somecalendar.csv') cal = dr.CalendarFile.create_calendar_from_dataset( dataset.id, calendar_name='Some Calendar Name' ) cal.id >>> 5c1d4904211c0a061bc93013 cal.name >>> Some Calendar Name # Creating a calendar from a new dataset version new_dataset_version = dr.Dataset.create_version_from_file( dataset.id, '/home/calendars/anothercalendar.csv' ) cal = dr.CalendarFile.create( new_dataset_version.id, dataset_version_id=new_dataset_version.version_id ) cal.id >>> 5c1d4904211c0a061bc93012 cal.name >>> anothercalendar.csv
- Return type
- classmethod create_calendar_from_country_code(country_code, start_date, end_date)¶
Generates a calendar based on the provided country code and dataset start date and end dates. The provided country code should be uppercase and 2-3 characters long. See
CalendarFile.get_allowed_country_codes
for a list of allowed country codes.- Parameters
- country_codestring
The country code for the country to use for generating the calendar.
- start_datedatetime.datetime
The earliest date to include in the generated calendar.
- end_datedatetime.datetime
The latest date to include in the generated calendar.
- Returns
- calendar_fileCalendarFile
Instance with initialized data.
- Return type
- classmethod get_allowed_country_codes(offset=None, limit=None)¶
Retrieves the list of allowed country codes that can be used for generating the preloaded calendars.
- Parameters
- offsetint
Optional, defaults to 0. This many results will be skipped.
- limitint
Optional, defaults to 100, maximum 1000. At most this many results are returned.
- Returns
- list
A list dicts, each of which represents an allowed country codes. Each item has the following structure:
name
: (str) The name of the country.code
: (str) The code for this country. This is the value that should be supplied toCalendarFile.create_calendar_from_country_code
.
- Return type
List
[CountryCode
]
- classmethod get(calendar_id)¶
Gets the details of a calendar, given the id.
- Parameters
- calendar_idstr
The identifier of the calendar.
- Returns
- calendar_fileCalendarFile
The requested calendar.
- Raises
- DataError
Raised if the calendar_id is invalid, i.e. the specified CalendarFile does not exist.
Examples
cal = dr.CalendarFile.get(some_calendar_id) cal.id >>> some_calendar_id
- Return type
- classmethod list(project_id=None, batch_size=None)¶
Gets the details of all calendars this user has view access for.
- Parameters
- project_idstr, optional
If provided, will filter for calendars associated only with the specified project.
- batch_sizeint, optional
The number of calendars to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of calendars. If not specified, an appropriate default will be chosen by the server.
- Returns
- calendar_listlist of
CalendarFile
A list of CalendarFile objects.
- calendar_listlist of
Examples
calendars = dr.CalendarFile.list() len(calendars) >>> 10
- Return type
List
[CalendarFile
]
- classmethod delete(calendar_id)¶
Deletes the calendar specified by calendar_id.
- Parameters
- calendar_idstr
The id of the calendar to delete. The requester must have OWNER access for this calendar.
- Raises
- ClientError
Raised if an invalid calendar_id is provided.
Examples
# Deleting with a valid calendar_id status_code = dr.CalendarFile.delete(some_calendar_id) status_code >>> 204 dr.CalendarFile.get(some_calendar_id) >>> ClientError: Item not found
- Return type
None
- classmethod update_name(calendar_id, new_calendar_name)¶
Changes the name of the specified calendar to the specified name. The requester must have at least READ_WRITE permissions on the calendar.
- Parameters
- calendar_idstr
The id of the calendar to update.
- new_calendar_namestr
The new name to set for the specified calendar.
- Returns
- status_codeint
200 for success
- Raises
- ClientError
Raised if an invalid calendar_id is provided.
Examples
response = dr.CalendarFile.update_name(some_calendar_id, some_new_name) response >>> 200 cal = dr.CalendarFile.get(some_calendar_id) cal.name >>> some_new_name
- Return type
int
Shares the calendar with the specified users, assigning the specified roles.
- Parameters
- calendar_idstr
The id of the calendar to update
- access_list:
A list of dr.SharingAccess objects. Specify None for the role to delete a user’s access from the specified CalendarFile. For more information on specific access levels, see the sharing documentation.
- Returns
- status_codeint
200 for success
- Raises
- ClientError
Raised if unable to update permissions for a user.
- AssertionError
Raised if access_list is invalid.
Examples
# assuming some_user is a valid user, share this calendar with some_user sharing_list = [dr.SharingAccess(some_user_username, dr.enums.SHARING_ROLE.READ_WRITE)] response = dr.CalendarFile.share(some_calendar_id, sharing_list) response.status_code >>> 200 # delete some_user from this calendar, assuming they have access of some kind already delete_sharing_list = [dr.SharingAccess(some_user_username, None)] response = dr.CalendarFile.share(some_calendar_id, delete_sharing_list) response.status_code >>> 200 # Attempt to add an invalid user to a calendar invalid_sharing_list = [dr.SharingAccess(invalid_username, dr.enums.SHARING_ROLE.READ_WRITE)] dr.CalendarFile.share(some_calendar_id, invalid_sharing_list) >>> ClientError: Unable to update access for this calendar
- Return type
int
- classmethod get_access_list(calendar_id, batch_size=None)¶
Retrieve a list of users that have access to this calendar.
- Parameters
- calendar_idstr
The id of the calendar to retrieve the access list for.
- batch_sizeint, optional
The number of access records to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of calendars. If not specified, an appropriate default will be chosen by the server.
- Returns
- access_control_listlist of
SharingAccess
A list of
SharingAccess
objects.
- access_control_listlist of
- Raises
- ClientError
Raised if user does not have access to calendar or calendar does not exist.
- Return type
List
[SharingAccess
]
- class datarobot.models.calendar_file.CountryCode() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
Automated Documentation¶
- class datarobot.models.automated_documentation.AutomatedDocument(entity_id=None, document_type=None, output_format=None, locale=None, template_id=None, id=None, filepath=None, created_at=None)¶
An automated documentation object.
New in version v2.24.
- Attributes
- document_typestr or None
Type of automated document. You can specify:
MODEL_COMPLIANCE
,AUTOPILOT_SUMMARY
depending on your account settings. Required for document generation.- entity_idstr or None
ID of the entity to generate the document for. It can be model ID or project ID. Required for document generation.
- output_formatstr or None
Format of the generate document, either
docx
orhtml
. Required for document generation.- localestr or None
Localization of the document, dependent on your account settings. Default setting is
EN_US
.- template_idstr or None
Template ID to use for the document outline. Defaults to standard DataRobot template. See the documentation for
ComplianceDocTemplate
for more information.- idstr or None
ID of the document. Required to download or delete a document.
- filepathstr or None
Path to save a downloaded document to. Either include a file path and name or the file will be saved to the directory from which the script is launched.
- created_atdatetime or None
Document creation timestamp.
- classmethod list_available_document_types()¶
Get a list of all available document types and locales.
- Returns
- List of dicts
Examples
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) doc_types = dr.AutomatedDocument.list_available_document_types()
- Return type
List
[DocumentOption
]
- property is_model_compliance_initialized: Tuple[bool, str]¶
Check if model compliance documentation pre-processing is initialized. Model compliance documentation pre-processing must be initialized before generating documentation for a custom model.
- Returns
- Tuple of (boolean, string)
boolean flag is whether model compliance documentation pre-processing is initialized
string value is the initialization status
- Return type
Tuple
[bool
,str
]
- initialize_model_compliance()¶
Initialize model compliance documentation pre-processing. Must be called before generating documentation for a custom model.
- Returns
- Tuple of (boolean, string)
boolean flag is whether model compliance documentation pre-processing is initialized
string value is the initialization status
Examples
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) # NOTE: entity_id is either a model id or a model package (version) id doc = dr.AutomatedDocument( document_type="MODEL_COMPLIANCE", entity_id="6f50cdb77cc4f8d1560c3ed5", output_format="docx", locale="EN_US") doc.initialize_model_compliance()
- Return type
Tuple
[bool
,str
]
- generate(max_wait=600)¶
Request generation of an automated document.
Required attributes to request document generation:
document_type
,entity_id
, andoutput_format
.- Returns
requests.models.Response
Examples
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) doc = dr.AutomatedDocument( document_type="MODEL_COMPLIANCE", entity_id="6f50cdb77cc4f8d1560c3ed5", output_format="docx", locale="EN_US", template_id="50efc9db8aff6c81a374aeec", filepath="/Users/username/Documents/example.docx" ) doc.generate() doc.download()
- Return type
Response
- download()¶
Download a generated Automated Document. Document ID is required to download a file.
- Returns
requests.models.Response
Examples
Generating and downloading the generated document:
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) doc = dr.AutomatedDocument( document_type="AUTOPILOT_SUMMARY", entity_id="6050d07d9da9053ebb002ef7", output_format="docx", filepath="/Users/username/Documents/Project_Report_1.docx" ) doc.generate() doc.download()
Downloading an earlier generated document when you know the document ID:
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) doc = dr.AutomatedDocument(id='5e8b6a34d2426053ab9a39ed') doc.download()
Notice that
filepath
was not set for this document. In this case, the file is saved to the directory from which the script was launched.Downloading a document chosen from a list of earlier generated documents:
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) model_id = "6f5ed3de855962e0a72a96fe" docs = dr.AutomatedDocument.list_generated_documents(entity_ids=[model_id]) doc = docs[0] doc.filepath = "/Users/me/Desktop/Recommended_model_doc.docx" doc.download()
- Return type
Response
- delete()¶
Delete a document using its ID.
- Returns
requests.models.Response
Examples
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) doc = dr.AutomatedDocument(id="5e8b6a34d2426053ab9a39ed") doc.delete()
If you don’t know the document ID, you can follow the same workflow to get the ID as in the examples for the
AutomatedDocument.download
method.- Return type
Response
- classmethod list_generated_documents(document_types=None, entity_ids=None, output_formats=None, locales=None, offset=None, limit=None)¶
Get information about all previously generated documents available for your account. The information includes document ID and type, ID of the entity it was generated for, time of creation, and other information.
- Parameters
- document_typesList of str or None
Query for one or more document types.
- entity_idsList of str or None
Query generated documents by one or more entity IDs.
- output_formatsList of str or None
Query for one or more output formats.
- localesList of str or None
Query generated documents by one or more locales.
- offset: int or None
Number of items to skip. Defaults to 0 if not provided.
- limit: int or None
Number of items to return, maximum number of items is 1000.
- Returns
- List of AutomatedDocument objects, where each object contains attributes described in
AutomatedDocument
Examples
To get a list of all generated documents:
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) docs = AutomatedDocument.list_generated_documents()
To get a list of all
AUTOPILOT_SUMMARY
documents:import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) docs = AutomatedDocument.list_generated_documents(document_types=["AUTOPILOT_SUMMARY"])
To get a list of 5 recently created automated documents in
html
format:import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) docs = AutomatedDocument.list_generated_documents(output_formats=["html"], limit=5)
To get a list of automated documents created for specific entities (projects or models):
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) docs = AutomatedDocument.list_generated_documents( entity_ids=["6051d3dbef875eb3be1be036", "6051d3e1fbe65cd7a5f6fde6", "6051d3e7f86c04486c2f9584"] )
Note, that the list of results contains
AutomatedDocument
objects, which means that you can execute class-related methods on them. Here’s how you can list, download, and then delete from the server all automated documents related to a certain entity:import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) ids = ["6051d3dbef875eb3be1be036", "5fe1d3d55cd810ebdb60c517f"] docs = AutomatedDocument.list_generated_documents(entity_ids=ids) for doc in docs: doc.download() doc.delete()
- Return type
List
[AutomatedDocument
]
- class datarobot.models.automated_documentation.DocumentOption() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
Challenger¶
- class datarobot.models.deployment.challenger.Challenger(id, deployment_id=None, name=None, model=None, model_package=None, prediction_environment=None)¶
A challenger is an alternative model being compared to the model currently deployed
- Attributes
- idstr
The ID of the challenger.
- deployment_idstr
The ID of the deployment.
- namestr
The name of the challenger.
- modeldict
The model of the challenger.
- model_packagedict
The model package of the challenger.
- prediction_environmentdict
The prediction environment of the challenger.
- classmethod create(deployment_id, model_package_id, prediction_environment_id, name, max_wait=600)¶
Create a challenger for a deployment
- Parameters
- deployment_idstr
The ID of the deployment
- model_package_idstr
The model package id of the challenger model
- prediction_environment_idstr
The prediction environment id of the challenger model
- namestr
The name of the challenger model
- max_waitint, optional
The amount of seconds to wait for successful resolution of a challenger creation job.
Examples
from datarobot import Challenger challenger = Challenger.create( deployment_id="5c939e08962d741e34f609f0", name="Elastic-Net Classifier", model_package_id="5c0a969859b00004ba52e41b", prediction_environment_id="60b012436635fc00909df555" )
- Return type
- classmethod get(deployment_id, challenger_id)¶
Get a challenger for a deployment
- Parameters
- deployment_idstr
The ID of the deployment
- challenger_idstr
The ID of the challenger
- Returns
- Challenger
The challenger object
Examples
from datarobot import Challenger challenger = Challenger.get( deployment_id="5c939e08962d741e34f609f0", challenger_id="5c939e08962d741e34f609f0" ) challenger.id >>>'5c939e08962d741e34f609f0' challenger.model_package['name'] >>> 'Elastic-Net Classifier'
- Return type
- classmethod list(deployment_id)¶
List all challengers for a deployment
- Parameters
- deployment_idstr
The ID of the deployment
- Returns
- challengers: list
A list of challenger objects
Examples
from datarobot import Challenger challengers = Challenger.list(deployment_id="5c939e08962d741e34f609f0") challengers[0].id >>>'5c939e08962d741e34f609f0' challengers[0].model_package['name'] >>> 'Elastic-Net Classifier'
- Return type
List
[Challenger
]
- delete()¶
Delete a challenger for a deployment
- Return type
None
- update(name=None, prediction_environment_id=None)¶
Update name and prediction environment of a challenger
- Parameters
- name: str, optional
The name of the challenger model
- prediction_environment_id: str, optional
The prediction environment id of the challenger model
- Return type
None
Class Mapping Aggregation Settings¶
For multiclass projects with a lot of unique values in target column you can specify the parameters for aggregation of rare values to improve the modeling performance and decrease the runtime and resource usage of resulting models.
- class datarobot.helpers.ClassMappingAggregationSettings(max_unaggregated_class_values=None, min_class_support=None, excluded_from_aggregation=None, aggregation_class_name=None)¶
Class mapping aggregation settings. For multiclass projects allows fine control over which target values will be preserved as classes. Classes which aren’t preserved will be - aggregated into a single “catch everything else” class in case of multiclass - or will be ignored in case of multilabel. All attributes are optional, if not specified - server side defaults will be used.
- Attributes
- max_unaggregated_class_valuesint, optional
Maximum amount of unique values allowed before aggregation kicks in.
- min_class_supportint, optional
Minimum number of instances necessary for each target value in the dataset. All values with less instances will be aggregated.
- excluded_from_aggregationlist, optional
List of target values that should be guaranteed to kept as is, regardless of other settings.
- aggregation_class_namestr, optional
If some of the values will be aggregated - this is the name of the aggregation class that will replace them.
Client Configuration¶
- datarobot.client.Client(token=None, endpoint=None, config_path=None, connect_timeout=None, user_agent_suffix=None, ssl_verify=None, max_retries=None, token_type=None, default_use_case=None, enable_api_consumer_tracking=None, trace_context=None)¶
Configures the global API client for the Python SDK. The client will be configured in one of the following ways, in order of priority.
- Parameters
- tokenstr, optional
API token.
- endpointstr, optional
Base URL of API.
- config_pathstr, optional
An alternate location of the config file.
- connect_timeoutint, optional
How long the client should be willing to wait before giving up on establishing a connection with the server.
- user_agent_suffixstr, optional
Additional text that is appended to the User-Agent HTTP header when communicating with the DataRobot REST API. This can be useful for identifying different applications that are built on top of the DataRobot Python Client, which can aid debugging and help track usage.
- ssl_verifybool or str, optional
Whether to check SSL certificate. Could be set to path with certificates of trusted certification authorities. Default: True.
- max_retriesint or urllib3.util.retry.Retry, optional
Either an integer number of times to retry connection errors, or a urllib3.util.retry.Retry object to configure retries.
- token_type: str, optional
Authentication token type: Token, Bearer. “Bearer” is for DataRobot OAuth2 token, “Token” for token generated in Developer Tools. Default: “Token”.
- default_use_case: str, optional
The entity ID of the default Use Case to use with any requests made by the client.
- enable_api_consumer_tracking: bool, optional
Enable and disable user metrics tracking within the datarobot module. Default: False.
- trace_context: str, optional
An ID or other string for identifying which code template or AI Accelerator was used to make a request.
- Returns
- The
RESTClientObject
instance created.
- The
Notes
Token and endpoint must be specified from one source only. This is a restriction to prevent token leakage if environment variables or config file are used.
The DataRobotClientConfig params will be looking up to find the configuration parameters in one of the following ways,
From call kwargs if specified;
From a YAML file at the path specified in the
config_path
kwarg;From a YAML file at the path specified in the environment variables
DATAROBOT_CONFIG_FILE
;From environment variables;
From the default values in the default YAML file at the path $HOME/.config/datarobot/drconfig.yaml.
This can also have the side effect of setting a default Use Case for client API requests.
- Return type
- datarobot.client.get_client()¶
Returns the global HTTP client for the Python SDK, instantiating it if necessary.
- Return type
- datarobot.client.set_client(client)¶
Configure the global HTTP client for the Python SDK. Returns previous instance.
- Return type
Optional
[RESTClientObject
]
- datarobot.client.client_configuration(*args, **kwargs)¶
This context manager can be used to temporarily change the global HTTP client.
In multithreaded scenarios, it is highly recommended to use a fresh manager object per thread.
DataRobot does not recommend nesting these contexts.
- Parameters
- argsParameters passed to
datarobot.client.Client()
- kwargsKeyword arguments passed to
datarobot.client.Client()
- argsParameters passed to
Examples
from datarobot.client import client_configuration from datarobot.models import Project with client_configuration(token="api-key-here", endpoint="https://host-name.com"): Project.list()
from datarobot.client import Client, client_configuration from datarobot.models import Project Client() # Interact with DataRobot using the default configuration. Project.list() with client_configuration(config_path="/path/to/a/drconfig.yaml"): # Interact with DataRobot using a different configuration. Project.list()
- class datarobot.rest.RESTClientObject(auth, endpoint, connect_timeout=6.05, verify=True, user_agent_suffix=None, max_retries=None, authentication_type=None)¶
- Parameters
- connect_timeout
timeout for http request and connection
- headers
headers for outgoing requests
- open_in_browser()¶
Opens the DataRobot app in a web browser, or logs the URL if a browser is not available.
- Return type
None
Clustering¶
- class datarobot.models.ClusteringModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, is_n_clusters_dynamically_determined=None, blueprint_id=None, metrics=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, n_clusters=None, has_empty_clusters=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)¶
ClusteringModel extends
Model
class. It provides provides properties and methods specific to clustering projects.- compute_insights(max_wait=600)¶
Compute and retrieve cluster insights for model. This method awaits completion of job computing cluster insights and returns results after it is finished. If computation takes longer than specified
max_wait
exception will be raised.- Parameters
- project_id: str
Project to start creation in.
- model_id: str
Project’s model to start creation in.
- max_wait: int
Maximum number of seconds to wait before giving up
- Returns
- List of ClusterInsight
- Raises
- ClientError
Server rejected creation due to client error. Most likely cause is bad
project_id
ormodel_id
.- AsyncFailureError
If any of the responses from the server are unexpected
- AsyncProcessUnsuccessfulError
If the cluster insights computation has failed or was cancelled.
- AsyncTimeoutError
If the cluster insights computation did not resolve in time
- Return type
List
[ClusterInsight
]
- property insights: List[ClusterInsight]¶
Return actual list of cluster insights if already computed.
- Returns
- List of ClusterInsight
- Return type
List
[ClusterInsight
]
- property clusters: List[Cluster]¶
Return actual list of Clusters.
- Returns
- List of Cluster
- Return type
List
[Cluster
]
- update_cluster_names(cluster_name_mappings)¶
Change many cluster names at once based on list of name mappings.
- Parameters
- cluster_name_mappings: List of tuples
Cluster names mapping consisting of current cluster name and old cluster name. Example:
cluster_name_mappings = [ ("current cluster name 1", "new cluster name 1"), ("current cluster name 2", "new cluster name 2")]
- Returns
- List of Cluster
- Raises
- datarobot.errors.ClientError
Server rejected update of cluster names. Possible reasons include: incorrect format of mapping, mapping introduces duplicates.
- Return type
List
[Cluster
]
- update_cluster_name(current_name, new_name)¶
Change cluster name from current_name to new_name.
- Parameters
- current_name: str
Current cluster name.
- new_name: str
New cluster name.
- Returns
- List of Cluster
- Raises
- datarobot.errors.ClientError
Server rejected update of cluster names.
- Return type
List
[Cluster
]
- class datarobot.models.cluster.Cluster(**kwargs)¶
Representation of a single cluster.
- Attributes
- name: str
Current cluster name
- percent: float
Percent of data contained in the cluster. This value is reported after cluster insights are computed for the model.
- classmethod list(project_id, model_id)¶
Retrieve a list of clusters in the model.
- Parameters
- project_id: str
ID of the project that the model is part of.
- model_id: str
ID of the model.
- Returns
- List of clusters
- Return type
List
[Cluster
]
- classmethod update_multiple_names(project_id, model_id, cluster_name_mappings)¶
Update many clusters at once based on list of name mappings.
- Parameters
- project_id: str
ID of the project that the model is part of.
- model_id: str
ID of the model.
- cluster_name_mappings: List of tuples
Cluster name mappings, consisting of current and previous names for each cluster. Example:
cluster_name_mappings = [ ("current cluster name 1", "new cluster name 1"), ("current cluster name 2", "new cluster name 2")]
- Returns
- List of clusters
- Raises
- datarobot.errors.ClientError
Server rejected update of cluster names.
- ValueError
Invalid cluster name mapping provided.
- Return type
List
[Cluster
]
- classmethod update_name(project_id, model_id, current_name, new_name)¶
Change cluster name from current_name to new_name
- Parameters
- project_id: str
ID of the project that the model is part of.
- model_id: str
ID of the model.
- current_name: str
Current cluster name
- new_name: str
New cluster name
- Returns
- List of Cluster
- Return type
List
[Cluster
]
- class datarobot.models.cluster_insight.ClusterInsight(**kwargs)¶
Holds data on all insights related to feature as well as breakdown per cluster.
- Parameters
- feature_name: str
Name of a feature from the dataset.
- feature_type: str
Type of feature.
- insightsList of classes (ClusterInsight)
List provides information regarding the importance of a specific feature in relation to each cluster. Results help understand how the model is grouping data and what each cluster represents.
- feature_impact: float
Impact of a feature ranging from 0 to 1.
- classmethod compute(project_id, model_id, max_wait=600)¶
Starts creation of cluster insights for the model and if successful, returns computed ClusterInsights. This method allows calculation to continue for a specified time and if not complete, cancels the request.
- Parameters
- project_id: str
ID of the project to begin creation of cluster insights for.
- model_id: str
ID of the project model to begin creation of cluster insights for.
- max_wait: int
Maximum number of seconds to wait canceling the request.
- Returns
- List[ClusterInsight]
- Raises
- ClientError
Server rejected creation due to client error. Most likely cause is bad
project_id
ormodel_id
.- AsyncFailureError
Indicates whether any of the responses from the server are unexpected.
- AsyncProcessUnsuccessfulError
Indicates whether the cluster insights computation failed or was cancelled.
- AsyncTimeoutError
Indicates whether the cluster insights computation did not resolve within the specified time limit (max_wait).
- Return type
List
[ClusterInsight
]
Compliance Documentation Templates¶
- class datarobot.models.compliance_doc_template.ComplianceDocTemplate(id, creator_id, creator_username, name, org_id=None, sections=None)¶
A compliance documentation template. Templates are used to customize contents of
AutomatedDocument
.New in version v2.14.
Notes
Each
section
dictionary has the following schema:title
: title of the sectiontype
: type of section. Must be one of “datarobot”, “user” or “table_of_contents”.
Each type of section has a different set of attributes described bellow.
Section of type
"datarobot"
represent a section owned by DataRobot. DataRobot sections have the following additional attributes:content_id
: The identifier of the content in this section. You can get the default template withget_default
for a complete list of possible DataRobot section content ids.sections
: list of sub-section dicts nested under the parent section.
Section of type
"user"
represent a section with user-defined content. Those sections may contain text generated by user and have the following additional fields:regularText
: regular text of the section, optionally separated by\n
to split paragraphs.highlightedText
: highlighted text of the section, optionally separated by\n
to split paragraphs.sections
: list of sub-section dicts nested under the parent section.
Section of type
"table_of_contents"
represent a table of contents and has no additional attributes.- Attributes
- idstr
the id of the template
- namestr
the name of the template.
- creator_idstr
the id of the user who created the template
- creator_usernamestr
username of the user who created the template
- org_idstr
the id of the organization the template belongs to
- sectionslist of dicts
the sections of the template describing the structure of the document. Section schema is described in Notes section above.
- classmethod get_default(template_type=None)¶
Get a default DataRobot template. This template is used for generating compliance documentation when no template is specified.
- Parameters
- template_typestr or None
Type of the template. Currently supported values are “normal” and “time_series”
- Returns
- templateComplianceDocTemplate
the default template object with
sections
attribute populated with default sections.
- Return type
- classmethod create_from_json_file(name, path)¶
Create a template with the specified name and sections in a JSON file.
This is useful when working with sections in a JSON file. Example:
default_template = ComplianceDocTemplate.get_default() default_template.sections_to_json_file('path/to/example.json') # ... edit example.json in your editor my_template = ComplianceDocTemplate.create_from_json_file( name='my template', path='path/to/example.json' )
- Parameters
- namestr
the name of the template. Must be unique for your user.
- pathstr
the path to find the JSON file at
- Returns
- templateComplianceDocTemplate
the created template
- Return type
- classmethod create(name, sections)¶
Create a template with the specified name and sections.
- Parameters
- namestr
the name of the template. Must be unique for your user.
- sectionslist
list of section objects
- Returns
- templateComplianceDocTemplate
the created template
- Return type
- classmethod get(template_id)¶
Retrieve a specific template.
- Parameters
- template_idstr
the id of the template to retrieve
- Returns
- templateComplianceDocTemplate
the retrieved template
- Return type
- classmethod list(name_part=None, limit=None, offset=None)¶
Get a paginated list of compliance documentation template objects.
- Parameters
- name_partstr or None
Return only the templates with names matching specified string. The matching is case-insensitive.
- limitint
The number of records to return. The server will use a (possibly finite) default if not specified.
- offsetint
The number of records to skip.
- Returns
- templateslist of ComplianceDocTemplate
the list of template objects
- Return type
List
[ComplianceDocTemplate
]
- sections_to_json_file(path, indent=2)¶
Save sections of the template to a json file at the specified path
- Parameters
- pathstr
the path to save the file to
- indentint
indentation to use in the json file.
- Return type
None
- update(name=None, sections=None)¶
Update the name or sections of an existing doc template.
Note that default or non-existent templates can not be updated.
- Parameters
- namestr, optional
the new name for the template
- sectionslist of dicts
list of sections
- Return type
None
- delete()¶
Delete the compliance documentation template.
- Return type
None
Confusion Chart¶
- class datarobot.models.confusion_chart.ConfusionChart(source, data, source_model_id)¶
Confusion Chart data for model.
Notes
ClassMetrics
is a dict containing the following:class_name
(string) name of the classactual_count
(int) number of times this class is seen in the validation datapredicted_count
(int) number of times this class has been predicted for the validation dataf1
(float) F1 scorerecall
(float) recall scoreprecision
(float) precision scorewas_actual_percentages
(list of dict) one vs all actual percentages in format specified below.other_class_name
(string) the name of the other classpercentage
(float) the percentage of the times this class was predicted when is was actually class (from 0 to 1)
was_predicted_percentages
(list of dict) one vs all predicted percentages in format specified below.other_class_name
(string) the name of the other classpercentage
(float) the percentage of the times this class was actual predicted (from 0 to 1)
confusion_matrix_one_vs_all
(list of list) 2d list representing 2x2 one vs all matrix.This represents the True/False Negative/Positive rates as integer for each class. The data structure looks like:
[ [ True Negative, False Positive ], [ False Negative, True Positive ] ]
- Attributes
- sourcestr
Confusion Chart data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.
- raw_datadict
All of the raw data for the Confusion Chart
- confusion_matrixlist of list
The N x N confusion matrix
- classeslist
The names of each of the classes
- class_metricslist of dicts
List of dicts with schema described as
ClassMetrics
above.- source_model_idstr
ID of the model this Confusion chart represents; in some cases, insights from the parent of a frozen model may be used
Credentials¶
- class datarobot.models.Credential(credential_id=None, name=None, credential_type=None, creation_date=None, description=None)¶
- classmethod list()¶
Returns list of available credentials.
- Returns
- credentialslist of Credential instances
contains a list of available credentials.
Examples
>>> import datarobot as dr >>> data_sources = dr.Credential.list() >>> data_sources [ Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'), Credential('5e42cc4dcf8a5f3256865840', 'my_jdbc_cred', 'jdbc'), ]
- Return type
List
[Credential
]
- classmethod get(credential_id)¶
Gets the Credential.
- Parameters
- credential_idstr
the identifier of the credential.
- Returns
- credentialCredential
the requested credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.get('5a8ac9ab07a57a0001be501f') >>> cred Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'),
- Return type
- delete()¶
Deletes the Credential the store.
- Parameters
- credential_idstr
the identifier of the credential.
- Returns
- credentialCredential
the requested credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.get('5a8ac9ab07a57a0001be501f') >>> cred.delete()
- Return type
None
- classmethod create_basic(name, user, password, description=None)¶
Creates the credentials.
- Parameters
- namestr
the name to use for this set of credentials.
- userstr
the username to store for this set of credentials.
- passwordstr
the password to store for this set of credentials.
- descriptionstr, optional
the description to use for this set of credentials.
- Returns
- credentialCredential
the created credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.create_basic( ... name='my_basic_cred', ... user='username', ... password='password', ... ) >>> cred Credential('5e429d6ecf8a5f36c5693e03', 'my_basic_cred', 'basic'),
- Return type
- classmethod create_oauth(name, token, refresh_token, description=None)¶
Creates the OAUTH credentials.
- Parameters
- namestr
the name to use for this set of credentials.
- token: str
the OAUTH token
- refresh_token: str
The OAUTH token
- descriptionstr, optional
the description to use for this set of credentials.
- Returns
- credentialCredential
the created credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.create_oauth( ... name='my_oauth_cred', ... token='XXX', ... refresh_token='YYY', ... ) >>> cred Credential('5e429d6ecf8a5f36c5693e03', 'my_oauth_cred', 'oauth'),
- Return type
- classmethod create_s3(name, aws_access_key_id=None, aws_secret_access_key=None, aws_session_token=None, config_id=None, description=None)¶
Creates the S3 credentials.
- Parameters
- namestr
the name to use for this set of credentials.
- aws_access_key_idstr, optional
the AWS access key id.
- aws_secret_access_keystr, optional
the AWS secret access key.
- aws_session_tokenstr, optional
the AWS session token.
- config_id: str, optional
The ID of the saved shared secure configuration. If specified, cannot include awsAccessKeyId, awsSecretAccessKey or awsSessionToken.
- descriptionstr, optional
the description to use for this set of credentials.
- Returns
- credentialCredential
the created credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.create_s3( ... name='my_s3_cred', ... aws_access_key_id='XXX', ... aws_secret_access_key='YYY', ... aws_session_token='ZZZ', ... ) >>> cred Credential('5e429d6ecf8a5f36c5693e03', 'my_s3_cred', 's3'),
- Return type
- classmethod create_azure(name, azure_connection_string, description=None)¶
Creates the Azure storage credentials.
- Parameters
- namestr
the name to use for this set of credentials.
- azure_connection_stringstr
the Azure connection string.
- descriptionstr, optional
the description to use for this set of credentials.
- Returns
- credentialCredential
the created credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.create_azure( ... name='my_azure_cred', ... azure_connection_string='XXX', ... ) >>> cred Credential('5e429d6ecf8a5f36c5693e03', 'my_azure_cred', 'azure'),
- Return type
- classmethod create_snowflake_key_pair(name, user=None, private_key=None, passphrase=None, config_id=None, description=None)¶
Creates the Snowflake Key Pair credentials.
- Parameters
- namestr
the name to use for this set of credentials.
- user: str, optional
the Snowflake login name
- private_key: str, optional
the private key copied exactly from user private key file. Since it contains multiple lines, when assign to a variable, put the key string inside triple-quotes
- passphrase: str, optional
the string used to encrypt the private key
- config_id: str, optional
The ID of the saved shared secure configuration. If specified, cannot include user, privateKeyStr or passphrase.
- descriptionstr, optional
the description to use for this set of credentials.
- Returns
- credentialCredential
the created credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.create_snowflake_key_pair( ... name='key_pair_cred', ... user='XXX', ... private_key='YYY', ... passphrase='ZZZ', ... ) >>> cred Credential('5e429d6ecf8a5f36c5693e03', 'key_pair_cred', 'snowflake_key_pair_user_account'),
- Return type
- classmethod create_databricks_access_token(name, databricks_access_token, description=None)¶
Creates the Databricks access token credentials.
- Parameters
- namestr
the name to use for this set of credentials.
- databricks_access_token: str, optional
the Databricks personal access token
- descriptionstr, optional
the description to use for this set of credentials.
- Returns
- credentialCredential
the created credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.create_databricks_access_token( ... name='access_token_cred', ... databricks_access_token='XXX', ... ) >>> cred Credential('5e429d6ecf8a5f36c5693e03', 'access_token_cred', 'databricks_access_token_account'),
- Return type
- classmethod create_databricks_service_principal(name, client_id=None, client_secret=None, config_id=None, description=None)¶
Creates the Databricks access token credentials.
- Parameters
- namestr
the name to use for this set of credentials.
- client_id: str, optional
the client ID for Databricks Service Principal
- client_secret: str, optional
the client secret for Databricks Service Principal
- config_id: str, optional
The ID of the saved shared secure configuration. If specified, cannot include clientId and clientSecret.
- descriptionstr, optional
the description to use for this set of credentials.
- Returns
- credentialCredential
the created credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.create_databricks_service_principal( ... name='svc_principal_cred', ... client_id='XXX', ... client_secret='XXX', ... ) >>> cred Credential('5e429d6ecf8a5f36c5693e03', 'svc_principal_cred', 'databricks_service_principal_account'),
- Return type
- classmethod create_gcp(name, gcp_key=None, description=None)¶
Creates the GCP credentials.
- Parameters
- namestr
the name to use for this set of credentials.
- gcp_keystr | dict
the GCP key in json format or parsed as dict.
- descriptionstr, optional
the description to use for this set of credentials.
- Returns
- credentialCredential
the created credential.
Examples
>>> import datarobot as dr >>> cred = dr.Credential.create_gcp( ... name='my_gcp_cred', ... gcp_key='XXX', ... ) >>> cred Credential('5e429d6ecf8a5f36c5693e03', 'my_gcp_cred', 'gcp'),
- Return type
- update(name=None, description=None, **kwargs)¶
Update the credential values of an existing credential. Updates this object in place.
New in version v3.2.
- Parameters
- namestr
The name to use for this set of credentials.
- descriptionstr, optional
The description to use for this set of credentials; if omitted, and name is not omitted, then it clears any previous description for that name.
- kwargsKeyword arguments specific to the given credential_type that should be updated.
- Return type
None
Prediction Environment¶
- class datarobot.models.PredictionEnvironment(id, name, platform, description=None, permissions=None, is_deleted=None, supported_model_formats=None, import_meta=None, management_meta=None, health=None, is_managed_by_management_agent=None, plugin=None, datastore_id=None, credential_id=None)¶
A prediction environment entity.
New in version v3.3.0.
- Attributes
- id: str
The ID of the prediction environment.
- name: str
The name of the prediction environment.
- description: str, optional
The description of the prediction environment.
- platform: str, optional
Indicates which platform is in use (AWS, GCP, DataRobot, etc.).
- permissions: list, optional
A set of permissions for the prediction environment.
- is_deleted: boolean, optional
The flag that shows if this prediction environment deleted.
- supported_model_formats: list[PredictionEnvironmentModelFormats], optional
The list of supported model formats.
- is_managed_by_management_agentboolean, optional
Determines if the prediction environment should be managed by the management agent. False by default.
- datastore_idstr, optional
The ID of the data store connection configuration. Only applicable for external prediction environments managed by DataRobot.
- credential_idstr, optional
The ID of the credential associated with the data connection. Only applicable for external prediction environments managed by DataRobot.
- classmethod list()¶
Returns list of available external prediction environments.
- Returns
- prediction_environmentslist of PredictionEnvironment instances
contains a list of available prediction environments.
Examples
>>> import datarobot as dr >>> prediction_environments = dr.PredictionEnvironment.list() >>> prediction_environments [ PredictionEnvironment('5e429d6ecf8a5f36c5693e03', 'demo_pe', 'aws', 'env for demo testing'), PredictionEnvironment('5e42cc4dcf8a5f3256865840', 'azure_pe', 'azure', 'env for azure demo testing'), ]
- Return type
List
[PredictionEnvironment
]
- classmethod get(pe_id)¶
Gets the PredictionEnvironment by id.
- Parameters
- pe_idstr
the identifier of the PredictionEnvironment.
- Returns
- prediction_environmentPredictionEnvironment
the requested prediction environment object.
Examples
>>> import datarobot as dr >>> pe = dr.PredictionEnvironment.get('5a8ac9ab07a57a1231be501f') >>> pe PredictionEnvironment('5a8ac9ab07a57a1231be501f', 'my_predict_env', 'aws', 'demo env'),
- Return type
- delete()¶
Deletes the prediction environment.
Examples
>>> import datarobot as dr >>> pe = dr.PredictionEnvironment.get('5a8ac9ab07a57a1231be501f') >>> pe.delete()
- Return type
None
- classmethod create(name, platform, description=None, plugin=None, supported_model_formats=None, is_managed_by_management_agent=False, datastore=None, credential=None)¶
Create a prediction environment.
- Parameters
- namestr
The name of the prediction environment.
- descriptionstr, optional
The description of the prediction environment.
- platformstr
Indicates which platform is in use (AWS, GCP, DataRobot, etc.).
- pluginstr
Optional. The plugin name to use.
- supported_model_formatslist[PredictionEnvironmentModelFormats], optional
The list of supported model formats. When not provided, the default value is inferred based on platform, (DataRobot platform: DataRobot, Custom Models; All other platforms: DataRobot, Custom Models, External Models).
- is_managed_by_management_agentboolean, optional
Determines if this prediction environment should be managed by the management agent. default: False
- datastoreDataStore|str, optional]
The datastore object or ID of the data store connection configuration. Only applicable for external Prediction Environments managed by DataRobot.
- credentialCredential|str, optional]
The credential object or ID of the credential associated with the data connection. Only applicable for external Prediction Environments managed by DataRobot.
- Returns
- prediction_environmentPredictionEnvironment
the prediction environment was created
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
Examples
>>> import datarobot as dr >>> pe = dr.PredictionEnvironment.create( ... name='my_predict_env', ... platform=PredictionEnvironmentPlatform.AWS, ... description='demo prediction env', ... ) >>> pe PredictionEnvironment('5e429d6ecf8a5f36c5693e99', 'my_predict_env', 'aws', 'demo prediction env'),
- Return type
Champion Model Package¶
- class datarobot.models.deployment.champion_model_package.ChampionModelPackage(id, registered_model_id, registered_model_version, name, model_id, model_execution_type, is_archived, import_meta, source_meta, model_kind, target, model_description, datasets, timeseries, is_deprecated, bias_and_fairness=None, build_status=None, user_provided_id=None, updated_at=None, updated_by=None, tags=None, mlpkg_file_contents=None)¶
Represents a champion model package.
- Parameters
- idstr
The ID of the registered model version.
- registered_model_idstr
The ID of the parent registered model.
- registered_model_versionint
The version of the registered model.
- namestr
The name of the registered model version.
- model_idstr
The ID of the model.
- model_execution_typestr
The type of model package (version). dedicated (native DataRobot models) and custom_inference_model` (user added inference models) both execute on DataRobot prediction servers, while external does not.
- is_archivedbool
- Whether the model package (version) is permanently archived (cannot be used in deployment or
replacement).
- import_metaImportMeta
Information from when this model package (version) was first saved.
- source_metaSourceMeta
Meta information from where the model was generated.
- model_kindModelKind
Model attribute information.
- targetTarget
Target information for the registered model version.
- model_descriptionModelDescription
Model description information.
- datasetsDataset
Dataset information for the registered model version.
- timeseriesTimeseries
Time series information for the registered model version.
- bias_and_fairnessBiasAndFairness
Bias and fairness information for the registered model version.
- is_deprecatedbool
- Whether the model package (version) is deprecated (cannot be used in deployment or
replacement).
- build_statusstr or None
Model package (version) build status. One of complete, inProgress, failed.
- user_provided_idstr or None
User provided ID for the registered model version.
- updated_atstr or None
The time the registered model version was last updated.
- updated_byUserMetadata or None
The user who last updated the registered model version.
- tagsList[TagWithId] or None
The tags associated with the registered model version.
- mlpkg_file_contentsstr or None
The contents of the model package file.
Custom Metrics¶
- class datarobot.models.deployment.custom_metrics.CustomMetric(id, name, units, baseline_values, is_model_specific, type, directionality, time_step='hour', description=None, association_id=None, value=None, sample_count=None, timestamp=None, batch=None, deployment_id=None)¶
A DataRobot custom metric.
New in version v3.4.
- Attributes
- id: str
The ID of the custom metric.
- deployment_id: str
The ID of the deployment.
- name: str
The name of the custom metric.
- units: str
The units, or the y-axis label, of the given custom metric.
- baseline_values: BaselinesValues
The baseline value used to add “reference dots” to the values over time chart.
- is_model_specific: bool
Determines whether the metric is related to the model or deployment.
- type: CustomMetricAggregationType
The aggregation type of the custom metric.
- directionality: CustomMetricDirectionality
The directionality of the custom metric.
- time_step: CustomMetricBucketTimeStep
Custom metric time bucket size.
- description: str
A description of the custom metric.
- association_id: DatasetColumn
A custom metric association_id column source when reading values from columnar dataset.
- timestamp: DatasetColumn
A custom metric timestamp column source when reading values from columnar dataset.
- value: DatasetColumn
A custom metric value source when reading values from columnar dataset.
- sample_count: DatasetColumn
A custom metric sample source when reading values from columnar dataset.
- batch: str
A custom metric batch ID source when reading values from columnar dataset.
- classmethod create(name, deployment_id, units, is_model_specific, aggregation_type, directionality, time_step='hour', description=None, baseline_value=None, value_column_name=None, sample_count_column_name=None, timestamp_column_name=None, timestamp_format=None, batch_column_name=None)¶
Create a custom metric for a deployment
- Parameters
- name: str
The name of the custom metric.
- deployment_id: str
The id of the deployment.
- units: str
The units, or the y-axis label, of the given custom metric.
- baseline_value: float
The baseline value used to add “reference dots” to the values over time chart.
- is_model_specific: bool
Determines whether the metric is related to the model or deployment.
- aggregation_type: CustomMetricAggregationType
The aggregation type of the custom metric.
- directionality: CustomMetricDirectionality
The directionality of the custom metric.
- time_step: CustomMetricBucketTimeStep
Custom metric time bucket size.
- description: Optional[str]
A description of the custom metric.
- value_column_name: Optional[str]
A custom metric value column name when reading values from columnar dataset.
- sample_count_column_name: Optional[str]
Points to a weight column name if users provide pre-aggregated metric values from columnar dataset.
- timestamp_column_name: Optional[str]
A custom metric timestamp column name when reading values from columnar dataset.
- timestamp_format: Optional[str]
A custom metric timestamp format when reading values from columnar dataset.
- batch_column_name: Optional[str]
A custom metric batch ID column name when reading values from columnar dataset.
- Returns
- CustomMetric
The custom metric object.
Examples
from datarobot.models.deployment import CustomMetric from datarobot.enums import CustomMetricAggregationType, CustomMetricDirectionality custom_metric = CustomMetric.create( deployment_id="5c939e08962d741e34f609f0", name="Sample metric", units="Y", baseline_value=12, is_model_specific=True, aggregation_type=CustomMetricAggregationType.AVERAGE, directionality=CustomMetricDirectionality.HIGHER_IS_BETTER )
- Return type
- classmethod get(deployment_id, custom_metric_id)¶
Get a custom metric for a deployment
- Parameters
- deployment_id: str
The ID of the deployment.
- custom_metric_id: str
The ID of the custom metric.
- Returns
- CustomMetric
The custom metric object.
Examples
from datarobot.models.deployment import CustomMetric custom_metric = CustomMetric.get( deployment_id="5c939e08962d741e34f609f0", custom_metric_id="65f17bdcd2d66683cdfc1113" ) custom_metric.id >>>'65f17bdcd2d66683cdfc1113'
- Return type
- classmethod list(deployment_id)¶
List all custom metrics for a deployment
- Parameters
- deployment_id: str
The ID of the deployment.
- Returns
- custom_metrics: list
A list of custom metrics objects.
Examples
from datarobot.models.deployment import CustomMetric custom_metrics = CustomMetric.list(deployment_id="5c939e08962d741e34f609f0") custom_metrics[0].id >>>'65f17bdcd2d66683cdfc1113'
- Return type
List
[CustomMetric
]
- classmethod delete(deployment_id, custom_metric_id)¶
Delete a custom metric associated with a deployment.
- Parameters
- deployment_id: str
The ID of the deployment.
- custom_metric_id: str
The ID of the custom metric.
- Returns
- None
Examples
from datarobot.models.deployment import CustomMetric CustomMetric.delete( deployment_id="5c939e08962d741e34f609f0", custom_metric_id="65f17bdcd2d66683cdfc1113" )
- Return type
None
- update(name=None, units=None, aggregation_type=None, directionality=None, time_step=None, description=None, baseline_value=None, value_column_name=None, sample_count_column_name=None, timestamp_column_name=None, timestamp_format=None, batch_column_name=None)¶
Update metadata of a custom metric
- Parameters
- name: Optional[str]
The name of the custom metric.
- units: Optional[str]
The units, or the y-axis label, of the given custom metric.
- baseline_value: Optional[float]
The baseline value used to add “reference dots” to the values over time chart.
- aggregation_type: Optional[CustomMetricAggregationType]
The aggregation type of the custom metric.
- directionality: Optional[CustomMetricDirectionality]
The directionality of the custom metric.
- time_step: Optional[CustomMetricBucketTimeStep]
Custom metric time bucket size.
- description: Optional[str]
A description of the custom metric.
- value_column_name: Optional[str]
A custom metric value column name when reading values from columnar dataset.
- sample_count_column_name: Optional[str]
Points to a weight column name if users provide pre-aggregated metric values from columnar dataset.
- timestamp_column_name: Optional[str]
A custom metric timestamp column name when reading values from columnar dataset.
- timestamp_format: Optional[str]
A custom metric timestamp format when reading values from columnar dataset.
- batch_column_name: Optional[str]
A custom metric batch ID column name when reading values from columnar dataset.
- Returns
- CustomMetric
The custom metric object.
Examples
from datarobot.models.deployment import CustomMetric from datarobot.enums import CustomMetricAggregationType, CustomMetricDirectionality custom_metric = CustomMetric.get( deployment_id="5c939e08962d741e34f609f0", custom_metric_id="65f17bdcd2d66683cdfc1113" ) custom_metric = custom_metric.update( deployment_id="5c939e08962d741e34f609f0", name="Sample metric", units="Y", baseline_value=12, is_model_specific=True, aggregation_type=CustomMetricAggregationType.AVERAGE, directionality=CustomMetricDirectionality.HIGHER_IS_BETTER )
- Return type
- unset_baseline()¶
Unset the baseline value of a custom metric
- Returns
- None
Examples
from datarobot.models.deployment import CustomMetric from datarobot.enums import CustomMetricAggregationType, CustomMetricDirectionality custom_metric = CustomMetric.get( deployment_id="5c939e08962d741e34f609f0", custom_metric_id="65f17bdcd2d66683cdfc1113" ) custom_metric.baseline_values >>> [{'value': 12.0}] custom_metric.unset_baseline() custom_metric.baseline_values >>> []
- Return type
None
- submit_values(data, model_id=None, model_package_id=None, dry_run=False, segments=None)¶
Submit aggregated custom metrics values from JSON.
- Parameters
- data: pd.DataFrame or List[CustomMetricBucket]
The data containing aggregated custom metric values.
- model_id: Optional[str]
For a model metric: the ID of the associated champion/challenger model, used to update the metric values. For a deployment metric: the ID of the model is not needed.
- model_package_id: Optional[str]
For a model metric: the ID of the associated champion/challenger model, used to update the metric values. For a deployment metric: the ID of the model package is not needed.
- dry_run: Optional[bool]
Specifies whether or not metric data is submitted in production mode (where data is saved).
- segments: Optional[CustomMetricSegmentFromJSON]
A list of segments for a custom metric used in segmented analysis.
- Returns
- None
Examples
from datarobot.models.deployment import CustomMetric custom_metric = CustomMetric.get( deployment_id="5c939e08962d741e34f609f0", custom_metric_id="65f17bdcd2d66683cdfc1113" ) # data for values over time data = [{ 'value': 12, 'sample_size': 3, 'timestamp': '2024-03-15T14:00:00' }] # data witch association ID data = [{ 'value': 12, 'sample_size': 3, 'timestamp': '2024-03-15T14:00:00', 'association_id': '65f44d04dbe192b552e752ed' }] # data for batches data = [{ 'value': 12, 'sample_size': 3, 'batch': '65f44c93fedc5de16b673a0d' }] # for deployment specific metrics custom_metric.submit_values(data=data) # for model specific metrics pass model_package_id or model_id custom_metric.submit_values(data=data, model_package_id="6421df32525c58cc6f991f25") # dry run custom_metric.submit_values(data=data, model_package_id="6421df32525c58cc6f991f25", dry_run=True) # for segmented analysis segments = [{"name": "custom_seg", "value": "val_1"}] custom_metric.submit_values(data=data, model_package_id="6421df32525c58cc6f991f25", segments=segments)
- Return type
None
- submit_single_value(value, model_id=None, model_package_id=None, dry_run=False, segments=None)¶
Submit a single custom metric value at the current moment.
- Parameters
- value: float
Single numeric custom metric value.
- model_id: Optional[str]
For a model metric: the ID of the associated champion/challenger model, used to update the metric values. For a deployment metric: the ID of the model is not needed.
- model_package_id: Optional[str]
For a model metric: the ID of the associated champion/challenger model, used to update the metric values. For a deployment metric: the ID of the model package is not needed.
- dry_run: Optional[bool]
Specifies whether or not metric data is submitted in production mode (where data is saved).
- segments: Optional[CustomMetricSegmentFromJSON]
A list of segments for a custom metric used in segmented analysis.
- Returns
- None
Examples
from datarobot.models.deployment import CustomMetric custom_metric = CustomMetric.get( deployment_id="5c939e08962d741e34f609f0", custom_metric_id="65f17bdcd2d66683cdfc1113" ) # for deployment specific metrics custom_metric.submit_single_value(value=121) # for model specific metrics pass model_package_id or model_id custom_metric.submit_single_value(value=121, model_package_id="6421df32525c58cc6f991f25") # dry run custom_metric.submit_single_value(value=121, model_package_id="6421df32525c58cc6f991f25", dry_run=True) # for segmented analysis segments = [{"name": "custom_seg", "value": "val_1"}] custom_metric.submit_single_value(value=121, model_package_id="6421df32525c58cc6f991f25", segments=segments)
- Return type
None
- submit_values_from_catalog(dataset_id, model_id=None, model_package_id=None, batch_id=None, segments=None)¶
Submit aggregated custom metrics values from dataset (AI catalog). The names of the columns in the dataset should correspond to the names of the columns that were defined in the custom metric. In addition, the format of the timestamps should also be the same as defined in the metric.
- Parameters
- dataset_id: str
The ID of the source dataset.
- model_id: Optional[str]
For a model metric: the ID of the associated champion/challenger model, used to update the metric values. For a deployment metric: the ID of the model is not needed.
- model_package_id: Optional[str]
For a model metric: the ID of the associated champion/challenger model, used to update the metric values. For a deployment metric: the ID of the model package is not needed.
- batch_id: Optional[str]
Specifies a batch ID associated with all values provided by this dataset, an alternative to providing batch IDs as a column within a dataset (at the record level).
- segments: Optional[CustomMetricSegmentFromDataset]
A list of segments for a custom metric used in segmented analysis.
- Returns
- None
Examples
from datarobot.models.deployment import CustomMetric custom_metric = CustomMetric.get( deployment_id="5c939e08962d741e34f609f0", custom_metric_id="65f17bdcd2d66683cdfc1113" ) # for deployment specific metrics custom_metric.submit_values_from_catalog(dataset_id="61093144cabd630828bca321") # for model specific metrics pass model_package_id or model_id custom_metric.submit_values_from_catalog( dataset_id="61093144cabd630828bca321", model_package_id="6421df32525c58cc6f991f25" ) # for segmented analysis segments = [{"name": "custom_seg", "column": "column_with_segment_values"}] custom_metric.submit_values_from_catalog( dataset_id="61093144cabd630828bca321", model_package_id="6421df32525c58cc6f991f25", segments=segments )
- Return type
None
- get_values_over_time(start, end, model_package_id=None, model_id=None, segment_attribute=None, segment_value=None, bucket_size='P7D')¶
Retrieve values of a single custom metric over a time period.
- Parameters
- start: datetime or str
Start of the time period.
- end: datetime or str
End of the time period.
- model_id: Optional[str]
The ID of the model.
- model_package_id: Optional[str]
The ID of the model package.
- bucket_size: Optional[str]
Time duration of a bucket, in ISO 8601 time duration format.
- segment_attribute: Optional[str]
The name of the segment on which segment analysis is being performed.
- segment_value: Optional[str]
The value of the segment_attribute to segment on.
- Returns
- custom_metric_over_time: CustomMetricValuesOverTime
The queried custom metric values over time information.
Examples
from datarobot.models.deployment import CustomMetric from datetime import datetime, timedelta now=datetime.now() custom_metric = CustomMetric.get( deployment_id="5c939e08962d741e34f609f0", custom_metric_id="65f17bdcd2d66683cdfc1113" ) values_over_time = custom_metric.get_values_over_time(start=now - timedelta(days=7), end=now) values_over_time.bucket_values >>> {datetime.datetime(2024, 3, 22, 14, 0, tzinfo=tzutc()): 1.0, >>> datetime.datetime(2024, 3, 22, 15, 0, tzinfo=tzutc()): 123.0}} values_over_time.bucket_sample_sizes >>> {datetime.datetime(2024, 3, 22, 14, 0, tzinfo=tzutc()): 1, >>> datetime.datetime(2024, 3, 22, 15, 0, tzinfo=tzutc()): 1}} values_over_time.get_buckets_as_dataframe() >>> start end value sample_size >>> 0 2024-03-21 16:00:00+00:00 2024-03-21 17:00:00+00:00 NaN NaN >>> 1 2024-03-21 17:00:00+00:00 2024-03-21 18:00:00+00:00 NaN NaN
- Return type
- get_summary(start, end, model_package_id=None, model_id=None, segment_attribute=None, segment_value=None)¶
Retrieve the summary of a custom metric over a time period.
- Parameters
- start: datetime or str
Start of the time period.
- end: datetime or str
End of the time period.
- model_id: Optional[str]
The ID of the model.
- model_package_id: Optional[str]
The ID of the model package.
- segment_attribute: Optional[str]
The name of the segment on which segment analysis is being performed.
- segment_value: Optional[str]
The value of the segment_attribute to segment on.
- Returns
- custom_metric_summary: CustomMetricSummary
The summary of the custom metric.
Examples
from datarobot.models.deployment import CustomMetric from datetime import datetime, timedelta now=datetime.now() custom_metric = CustomMetric.get( deployment_id="5c939e08962d741e34f609f0", custom_metric_id="65f17bdcd2d66683cdfc1113" ) summary = custom_metric.get_summary(start=now - timedelta(days=7), end=now) print(summary) >> "CustomMetricSummary(2024-03-21 15:52:13.392178+00:00 - 2024-03-22 15:52:13.392168+00:00: {'id': '65fd9b1c0c1a840bc6751ce0', 'name': 'Test METRIC', 'value': 215.0, 'sample_count': 13, 'baseline_value': 12.0, 'percent_change': 24.02})"
- Return type
- get_values_over_batch(batch_ids=None, model_package_id=None, model_id=None, segment_attribute=None, segment_value=None)¶
Retrieve values of a single custom metric over batches.
- Parameters
- batch_idsOptional[List[str]]
Specify a list of batch IDs to pull the data for.
- model_id: Optional[str]
The ID of the model.
- model_package_id: Optional[str]
The ID of the model package.
- segment_attribute: Optional[str]
The name of the segment on which segment analysis is being performed.
- segment_value: Optional[str]
The value of the segment_attribute to segment on.
- Returns
- custom_metric_over_batch: CustomMetricValuesOverBatch
The queried custom metric values over batch information.
Examples
from datarobot.models.deployment import CustomMetric custom_metric = CustomMetric.get( deployment_id="5c939e08962d741e34f609f0", custom_metric_id="65f17bdcd2d66683cdfc1113" ) # all batch metrics all model specific values_over_batch = custom_metric.get_values_over_batch(model_package_id='6421df32525c58cc6f991f25') values_over_batch.bucket_values >>> {'6572db2c9f9d4ad3b9de33d0': 35.0, '6572db2c9f9d4ad3b9de44e1': 105.0} values_over_batch.bucket_sample_sizes >>> {'6572db2c9f9d4ad3b9de33d0': 6, '6572db2c9f9d4ad3b9de44e1': 8} values_over_batch.get_buckets_as_dataframe() >>> batch_id batch_name value sample_size >>> 0 6572db2c9f9d4ad3b9de33d0 Batch 1 - 03/26/2024 13:04:46 35.0 6 >>> 1 6572db2c9f9d4ad3b9de44e1 Batch 2 - 03/26/2024 13:06:04 105.0 8
- Return type
- get_batch_summary(batch_ids=None, model_package_id=None, model_id=None, segment_attribute=None, segment_value=None)¶
Retrieve the summary of a custom metric over a batch.
- Parameters
- batch_idsOptional[List[str]]
Specify a list of batch IDs to pull the data for.
- model_id: Optional[str]
The ID of the model.
- model_package_id: Optional[str]
The ID of the model package.
- segment_attribute: Optional[str]
The name of the segment on which segment analysis is being performed.
- segment_value: Optional[str]
The value of the segment_attribute to segment on.
- Returns
- custom_metric_summary: CustomMetricBatchSummary
The batch summary of the custom metric.
Examples
from datarobot.models.deployment import CustomMetric custom_metric = CustomMetric.get( deployment_id="5c939e08962d741e34f609f0", custom_metric_id="65f17bdcd2d66683cdfc1113" ) # all batch metrics all model specific batch_summary = custom_metric.get_batch_summary(model_package_id='6421df32525c58cc6f991f25') print(batch_summary) >> CustomMetricBatchSummary({'id': '6605396413434b3a7b74342c', 'name': 'batch metric', 'value': 41.25, 'sample_count': 28, 'baseline_value': 123.0, 'percent_change': -66.46})
- Return type
- class datarobot.models.deployment.custom_metrics.CustomMetricValuesOverTime(buckets=None, summary=None, metric=None, deployment_id=None, segment_attribute=None, segment_value=None)¶
Custom metric over time information.
New in version v3.4.
- Attributes
- buckets: List[Bucket]
A list of bucketed time periods and the custom metric values aggregated over that period.
- summary: Summary
The summary of values over time retrieval.
- metric: Dict
A custom metric definition.
- deployment_id: str
The ID of the deployment.
- segment_attribute: str
The name of the segment on which segment analysis is being performed.
- segment_value: str
The value of the segment_attribute to segment on.
- classmethod get(deployment_id, custom_metric_id, start, end, model_id=None, model_package_id=None, segment_attribute=None, segment_value=None, bucket_size='P7D')¶
Retrieve values of a single custom metric over a time period.
- Parameters
- custom_metric_id: str
The ID of the custom metric.
- deployment_id: str
The ID of the deployment.
- start: datetime or str
Start of the time period.
- end: datetime or str
End of the time period.
- model_id: Optional[str]
The ID of the model.
- model_package_id: Optional[str]
The ID of the model package.
- bucket_size: Optional[str]
Time duration of a bucket, in ISO 8601 time duration format.
- segment_attribute: Optional[str]
The name of the segment on which segment analysis is being performed.
- segment_value: Optional[str]
The value of the segment_attribute to segment on.
- Returns
- custom_metric_over_time: CustomMetricValuesOverTime
The queried custom metric values over time information.
- Return type
- property bucket_values: Dict[datetime, int]¶
The metric value for all time buckets, keyed by start time of the bucket.
- Returns
- bucket_values: Dict
- Return type
Dict
[datetime
,int
]
- property bucket_sample_sizes: Dict[datetime, int]¶
The sample size for all time buckets, keyed by start time of the bucket.
- Returns
- bucket_sample_sizes: Dict
- Return type
Dict
[datetime
,int
]
- get_buckets_as_dataframe()¶
Retrieves all custom metrics buckets in a pandas DataFrame.
- Returns
- buckets: pd.DataFrame
- Return type
DataFrame
- class datarobot.models.deployment.custom_metrics.CustomMetricSummary(period, metric, deployment_id=None)¶
The summary of a custom metric.
New in version v3.4.
- Attributes
- period: Period
A time period defined by a start and end tie
- metric: Dict
The summary of the custom metric.
- classmethod get(deployment_id, custom_metric_id, start, end, model_id=None, model_package_id=None, segment_attribute=None, segment_value=None)¶
Retrieve the summary of a custom metric over a time period.
- Parameters
- custom_metric_id: str
The ID of the custom metric.
- deployment_id: str
The ID of the deployment.
- start: datetime or str
Start of the time period.
- end: datetime or str
End of the time period.
- model_id: Optional[str]
The ID of the model.
- model_package_id: Optional[str]
The ID of the model package.
- segment_attribute: Optional[str]
The name of the segment on which segment analysis is being performed.
- segment_value: Optional[str]
The value of the segment_attribute to segment on.
- Returns
- custom_metric_summary: CustomMetricSummary
The summary of the custom metric.
- Return type
- class datarobot.models.deployment.custom_metrics.CustomMetricValuesOverBatch(buckets=None, metric=None, deployment_id=None, segment_attribute=None, segment_value=None)¶
Custom metric over batch information.
New in version v3.4.
- Attributes
- buckets: List[BatchBucket]
A list of buckets with custom metric values aggregated over batches.
- metric: Dict
A custom metric definition.
- deployment_id: str
The ID of the deployment.
- segment_attribute: str
The name of the segment on which segment analysis is being performed.
- segment_value: str
The value of the segment_attribute to segment on.
- classmethod get(deployment_id, custom_metric_id, batch_ids=None, model_id=None, model_package_id=None, segment_attribute=None, segment_value=None)¶
Retrieve values of a single custom metric over batches.
- Parameters
- custom_metric_id: str
The ID of the custom metric.
- deployment_id: str
The ID of the deployment.
- batch_idsOptional[List[str]]
Specify a list of batch IDs to pull the data for.
- model_id: Optional[str]
The ID of the model.
- model_package_id: Optional[str]
The ID of the model package.
- segment_attribute: Optional[str]
The name of the segment on which segment analysis is being performed.
- segment_value: Optional[str]
The value of the segment_attribute to segment on.
- Returns
- custom_metric_over_batch: CustomMetricValuesOverBatch
The queried custom metric values over batch information.
- Return type
- property bucket_values: Dict[str, int]¶
The metric value for all batch buckets, keyed by batch ID
- Returns
- bucket_values: Dict
- Return type
Dict
[str
,int
]
- property bucket_sample_sizes: Dict[str, int]¶
The sample size for all batch buckets, keyed by batch ID.
- Returns
- bucket_sample_sizes: Dict
- Return type
Dict
[str
,int
]
- get_buckets_as_dataframe()¶
Retrieves all custom metrics buckets in a pandas DataFrame.
- Returns
- buckets: pd.DataFrame
- Return type
DataFrame
- class datarobot.models.deployment.custom_metrics.CustomMetricBatchSummary(metric, deployment_id=None)¶
The batch summary of a custom metric.
New in version v3.4.
- Attributes
- metric: Dict
The summary of the batch custom metric.
- classmethod get(deployment_id, custom_metric_id, batch_ids=None, model_id=None, model_package_id=None, segment_attribute=None, segment_value=None)¶
Retrieve the summary of a custom metric over a batch.
- Parameters
- custom_metric_id: str
The ID of the custom metric.
- deployment_id: str
The ID of the deployment.
- batch_idsOptional[List[str]]
Specify a list of batch IDs to pull the data for.
- model_id: Optional[str]
The ID of the model.
- model_package_id: Optional[str]
The ID of the model package.
- segment_attribute: Optional[str]
The name of the segment on which segment analysis is being performed.
- segment_value: Optional[str]
The value of the segment_attribute to segment on.
- Returns
- custom_metric_summary: CustomMetricBatchSummary
The batch summary of the custom metric.
- Return type
- class datarobot.models.deployment.custom_metrics.HostedCustomMetricTemplate(id, name, description, custom_metric_metadata, default_environment, items, template_metric_type)¶
Template for hosted custom metric.
- classmethod list(search=None, order_by=None, metric_type=None, offset=None, limit=None)¶
List all hosted custom metric templates.
- Parameters
- search: Optional[str]
Search string.
- order_by: Optional[ListHostedCustomMetricTemplatesSortQueryParams]
Ordering field.
- metric_type: Optional[HostedCustomMetricsTemplateMetricTypeQueryParams]
Type of the metric.
- offset: Optional[int]
Offset for pagination.
- limit: Optional[int]
Limit for pagination.
- Returns
- templates: List[HostedCustomMetricTemplate]
- Return type
- classmethod get(template_id)¶
Get a hosted custom metric template by ID.
- Parameters
- template_id: str
ID of the template.
- Returns
- templateHostedCustomMetricTemplate
- Return type
- class datarobot.models.deployment.custom_metrics.HostedCustomMetric(id, deployment, units, type, is_model_specific, directionality, time_step, created_at, created_by, name, custom_job_id, description=None, schedule=None, baseline_values=None, timestamp=None, value=None, sample_count=None, batch=None, parameter_overrides=None)¶
Hosted custom metric.
- classmethod list(job_id, skip=None, limit=None)¶
List all hosted custom metrics for a job.
- Parameters
- job_id: str
ID of the job.
- Returns
- metrics: List[HostedCustomMetric]
- Return type
List
[HostedCustomMetric
]
- classmethod create_from_template(template_id, deployment_id, job_name, custom_metric_name, job_description=None, custom_metric_description=None, sidecar_deployment_id=None, baseline_value=None, timestamp=None, value=None, sample_count=None, batch=None, schedule=None, parameter_overrides=None)¶
Create a hosted custom metric from a template. A shortcut for 2 calls: Job.from_custom_metric_template(template_id) HostedCustomMetrics.create_from_custom_job()
- Parameters
- template_id: str
ID of the template.
- deployment_id: str
ID of the deployment.
- job_name: str
Name of the job.
- custom_metric_name: str
Name of the metric.
- job_description: Optional[str]
Description of the job.
- custom_metric_description: Optional[str]
Description of the metric.
- sidecar_deployment_id: Optional[str]
ID of the sidecar deployment.
- baseline_value: Optional[float]
Baseline value.
- timestamp: Optional[MetricTimestampSpoofing]
Timestamp details.
- value: Optional[ValueField]
Value details.
- sample_count: Optional[SampleCountField]
Sample count details.
- batch: Optional[BatchField]
Batch details.
- schedule: Optional[Schedule]
Schedule details.
- parameter_overrides: Optional[List[RuntimeParameterValue]]
Parameter overrides.
- Returns
- metric: HostedCustomMetric
- Return type
- classmethod create_from_custom_job(custom_job_id, deployment_id, name, description=None, baseline_value=None, timestamp=None, value=None, sample_count=None, batch=None, schedule=None, parameter_overrides=None)¶
Create a hosted custom metric from existing custom job.
- Parameters
- custom_job_id: str
ID of the custom job.
- deployment_id: str
ID of the deployment.
- name: str
Name of the metric.
- description: Optional[str]
Description of the metric.
- baseline_value: Optional[float]
Baseline value.
- timestamp: Optional[MetricTimestampSpoofing]
Timestamp details.
- value: Optional[ValueField]
Value details.
- sample_count: Optional[SampleCountField]
Sample count details.
- batch: Optional[BatchField]
Batch details.
- schedule: Optional[Schedule]
Schedule details.
- parameter_overrides: Optional[List[RuntimeParameterValue]]
Parameter overrides.
- Returns
- metric: HostedCustomMetric
- Return type
- update(name=None, description=None, units=None, directionality=None, aggregation_type=None, baseline_value=None, timestamp=None, value=None, sample_count=None, batch=None, schedule=None, parameter_overrides=None)¶
Update the hosted custom metric.
- Parameters
- name: Optional[str]
Name of the metric.
- description: Optional[str]
Description of the metric.
- units: Optional[str]
Units of the metric.
- directionality: Optional[str]
Directionality of the metric.
- aggregation_type: Optional[CustomMetricAggregationType]
Aggregation type of the metric.
- baseline_value: Optional[float]
Baseline values.
- timestamp: Optional[MetricTimestampSpoofing]
Timestamp details.
- value: Optional[ValueField]
Value details.
- sample_count: Optional[SampleCountField]
Sample count details.
- batch: Optional[BatchField]
Batch details.
- schedule: Optional[Schedule]
Schedule details.
- parameter_overrides: Optional[List[RuntimeParameterValue]]
Parameter overrides.
- Returns
- metric: HostedCustomMetric
- Return type
- delete()¶
Delete the hosted custom metric.
- Return type
None
- class datarobot.models.deployment.custom_metrics.DeploymentDetails(id, name, creator_first_name=None, creator_last_name=None, creator_username=None, creator_gravatar_hash=None, created_at=None)¶
Information about a hosted custom metric deployment.
- class datarobot.models.deployment.custom_metrics.MetricBaselineValue(value)¶
The baseline values for a custom metric.
- class datarobot.models.deployment.custom_metrics.SampleCountField(column_name)¶
A weight column used with columnar datasets if pre-aggregated metric values are provided.
- class datarobot.models.deployment.custom_metrics.ValueField(column_name)¶
A custom metric value source for when reading values from a columnar dataset like a file.
- class datarobot.models.deployment.custom_metrics.MetricTimestampSpoofing(column_name, time_format=None)¶
Custom metric timestamp spoofing. Occurs when reading values from a file, like a dataset. By default, replicates pd.to_datetime formatting behavior.
- class datarobot.models.deployment.custom_metrics.BatchField(column_name)¶
A custom metric batch ID source for when reading values from a columnar dataset like a file.
- class datarobot.models.deployment.custom_metrics.HostedCustomMetricBlueprint(id, directionality, units, type, time_step, is_model_specific, custom_job_id, created_at, updated_at, created_by, updated_by)¶
Hosted custom metric blueprints provide an option to share custom metric settings between multiple custom metrics sharing the same custom jobs. When a custom job of a hosted custom metric type is connected to the deployment, all the custom metric parameters from the blueprint are automatically copied.
- classmethod get(custom_job_id)¶
Get a hosted custom metric blueprint.
- Parameters
- custom_job_id: str
ID of the custom job.
- Returns
- blueprint: HostedCustomMetricBlueprint
- Return type
- classmethod create(custom_job_id, directionality, units, type, time_step, is_model_specific)¶
Create a hosted custom metric blueprint.
- Parameters
- custom_job_id: str
ID of the custom job.
- directionality: str
Directionality of the metric.
- units: str
Units of the metric.
- type: str
Type of the metric.
- time_step: str
Time step of the metric.
- is_model_specific: bool
Whether the metric is model specific.
- Returns
- blueprint: HostedCustomMetricBlueprint
- Return type
- update(directionality=None, units=None, type=None, time_step=None, is_model_specific=None)¶
Update a hosted custom metric blueprint.
- Parameters
- directionality: Optional[str]
Directionality of the metric.
- units: Optional[str]
Units of the metric.
- type: Optional[str]
Type of the metric.
- time_step: Optional[str]
Time step of the metric.
- is_model_specific: Optional[bool]
Determines whether the metric is model specific.
- Returns
- updated_blueprint: HostedCustomMetricBlueprint
- Return type
Registry Jobs¶
- class datarobot.models.registry.job.Job(id, name, created_at, items, description=None, environment_id=None, environment_version_id=None, entry_point=None, runtime_parameters=None)¶
A DataRobot job.
New in version v3.4.
- Attributes
- id: str
The ID of the job.
- name: str
The name of the job.
- created_at: str
ISO-8601 formatted timestamp of when the version was created
- items: List[JobFileItem]
A list of file items attached to the job.
- description: str, optional
A job description.
- environment_id: str, optional
The ID of the environment to use with the job.
- environment_version_id: str, optional
The ID of the environment version to use with the job.
- classmethod create(name, environment_id=None, environment_version_id=None, folder_path=None, files=None, file_data=None, runtime_parameter_values=None)¶
Create a job.
New in version v3.4.
- Parameters
- name: str
The name of the job.
- environment_id: Optional[str]
The environment ID to use for job runs. The ID must be specified in order to run the job.
- environment_version_id: Optional[str]
The environment version ID to use for job runs. If not specified, the latest version of the execution environment will be used.
- folder_path: Optional[str]
The path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path.
- files: Optional[Union[List[Tuple[str, str]], List[str]]]
The files to be uploaded to the job. The files can be defined in 2 ways: 1. List of tuples where 1st element is the local path of the file to be uploaded and the 2nd element is the file path in the job file system. 2. List of local paths of the files to be uploaded. In this case files are added to the root of the model file system.
- file_data: Optional[Dict[str, str]]
The files content to be uploaded to the job. Defined as a dictionary where keys are the file paths in the job file system. and values are the files content.
- runtime_parameter_values: Optional[List[RuntimeParameterValue]]
Additional parameters to be injected into a model at runtime. The fieldName must match a fieldName that is listed in the runtimeParameterDefinitions section of the model-metadata.yaml file.
- Returns
- Job
created job
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
- classmethod list()¶
List jobs.
New in version v3.4.
- Returns
- List[Job]
a list of jobs
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
List
[Job
]
- classmethod get(job_id)¶
Get job by id.
New in version v3.4.
- Parameters
- job_id: str
The ID of the job.
- Returns
- Job
retrieved job
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- Return type
- update(name=None, entry_point=None, environment_id=None, environment_version_id=None, description=None, folder_path=None, files=None, file_data=None, runtime_parameter_values=None)¶
Update job properties.
New in version v3.4.
- Parameters
- name: str
The job name.
- entry_point: Optional[str]
The job file item ID to use as an entry point of the job.
- environment_id: Optional[str]
The environment ID to use for job runs. Must be specified in order to run the job.
- environment_version_id: Optional[str]
The environment version ID to use for job runs. If not specified, the latest version of the execution environment will be used.
- description: str
The job description.
- folder_path: Optional[str]
The path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path.
- files: Optional[Union[List[Tuple[str, str]], List[str]]]
The files to be uploaded to the job. The files can be defined in 2 ways: 1. List of tuples where 1st element is the local path of the file to be uploaded and the 2nd element is the file path in the job file system. 2. List of local paths of the files to be uploaded. In this case files are added to the root of the job file system.
- file_data: Optional[Dict[str, str]]
The files content to be uploaded to the job. Defined as a dictionary where keys are the file paths in the job file system. and values are the files content.
- runtime_parameter_values: Optional[List[RuntimeParameterValue]]
Additional parameters to be injected into a model at runtime. The fieldName must match a fieldName that is listed in the runtimeParameterDefinitions section of the model-metadata.yaml file.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- Return type
None
- delete()¶
Delete job.
New in version v3.4.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- refresh()¶
Update job with the latest data from server.
New in version v3.4.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
None
- classmethod create_from_custom_metric_gallery_template(template_id, name, description=None, sidecar_deployment_id=None)¶
Create a job from a custom metric gallery template.
- Parameters
- template_id: str
ID of the template.
- name: str
Name of the job.
- description: Optional[str]
Description of the job.
- sidecar_deployment_id: Optional[str]
ID of the sidecar deployment. Only relevant for templates that use sidecar deployments.
- Returns
- Job
retrieved job
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- Return type
- list_schedules()¶
List schedules for the job.
- Returns
- List[JobSchedule]
a list of schedules for the job.
- Return type
List
[JobSchedule
]
- class datarobot.models.registry.job.JobFileItem(id, file_name, file_path, file_source, created_at)¶
A file item attached to a DataRobot job.
New in version v3.4.
- Attributes
- id: str
The ID of the file item.
- file_name: str
The name of the file item.
- file_path: str
The path of the file item.
- file_source: str
The source of the file item.
- created_at: str
ISO-8601 formatted timestamp of when the version was created.
- class datarobot.models.registry.job_run.JobRun(id, custom_job_id, created_at, items, status, duration, description=None, runtime_parameters=None)¶
A DataRobot job run.
New in version v3.4.
- Attributes
- id: str
The ID of the job run.
- custom_job_id: str
The ID of the parent job.
- description: str
A description of the job run.
- created_at: str
ISO-8601 formatted timestamp of when the version was created
- items: List[JobFileItem]
A list of file items attached to the job.
- status: JobRunStatus
The status of the job run.
- duration: float
The duration of the job run.
- classmethod create(job_id, max_wait=600, runtime_parameter_values=None)¶
Create a job run.
New in version v3.4.
- Parameters
- job_id: str
The ID of the job.
- max_wait: int, optional
max time to wait for a terminal status (“succeeded”, “failed”, “interrupted”, “canceled”). If set to None - method will return without waiting.
- runtime_parameter_values: Optional[List[RuntimeParameterValue]]
Additional parameters to be injected into a model at runtime. The fieldName must match a fieldName that is listed in the runtimeParameterDefinitions section of the model-metadata.yaml file.
- Returns
- Job
created job
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- ValueError
if execution environment or entry point is not specified for the job
- Return type
- classmethod list(job_id)¶
List job runs.
New in version v3.4.
- Parameters
- job_id: str
The ID of the job.
- Returns
- List[Job]
A list of job runs.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
List
[JobRun
]
- classmethod get(job_id, job_run_id)¶
Get job run by id.
New in version v3.4.
- Parameters
- job_id: str
The ID of the job.
- job_run_id: str
The ID of the job run.
- Returns
- Job
The retrieved job run.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- Return type
- update(description=None)¶
Update job run properties.
New in version v3.4.
- Parameters
- description: str
new job run description
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- Return type
None
- cancel()¶
Cancel job run.
New in version v3.4.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- refresh()¶
Update job run with the latest data from server.
New in version v3.4.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
None
- get_logs()¶
Get log of the job run.
New in version v3.4.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
Optional
[str
]
- delete_logs()¶
Get log of the job run.
New in version v3.4.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
None
- class datarobot.models.registry.job_run.JobRunStatus(value)¶
Enum of the job run statuses
- class datarobot.models.registry.job.JobSchedule(id, custom_job_id, updated_at, updated_by, created_at, created_by, scheduled_job_id, schedule=None, deployment=None, parameter_overrides=None)¶
A job schedule.
New in version v3.5.
- Attributes
- id: str
The ID of the job schedule.
- custom_job_id: str
The ID of the custom job.
- updated_at: str
ISO-8601 formatted timestamp of when the schedule was updated.
- updated_by: Dict[str, Any]
The user who updated the schedule.
- created_at: str
ISO-8601 formatted timestamp of when the schedule was created.
- created_by: Dict[str, Any]
The user who created the schedule.
- scheduled_job_id: str
The ID of the scheduled job.
- deployment: Dict[str, Any]
The deployment of the scheduled job.
- schedule: Schedule
The schedule of the job.
- parameter_overrides: List[RuntimeParameterValue]
The parameter overrides for this schedule.
- update(schedule=None, parameter_overrides=None)¶
Update the job schedule.
- Parameters
- schedule: Optional[Schedule]
The schedule of the job.
- parameter_overrides: Optional[List[RuntimeParameterValue]]
The parameter overrides for this schedule.
- Return type
- delete()¶
Delete the job schedule. Returns ——- None
- Return type
None
- classmethod create(custom_job_id, schedule, parameter_overrides=None)¶
Create a job schedule.
- Parameters
- custom_job_id: str
The ID of the custom job.
- schedule: Schedule
The schedule of the job.
- parameter_overrides: Optional[List[RuntimeParameterValue]]
The parameter overrides for this schedule.
- Return type
Custom Models¶
- class datarobot.models.custom_model_version.CustomModelFileItem(id, file_name, file_path, file_source, created_at=None)¶
A file item attached to a DataRobot custom model version.
New in version v2.21.
- Attributes
- id: str
The ID of the file item.
- file_name: str
The name of the file item.
- file_path: str
The path of the file item.
- file_source: str
The source of the file item.
- created_at: str, optional
ISO-8601 formatted timestamp of when the version was created.
- class datarobot.CustomInferenceModel(**kwargs)¶
A custom inference model.
New in version v2.21.
- Attributes
- id: str
The ID of the custom model.
- name: str
The name of the custom model.
- language: str
The programming language of the custom inference model. Can be “python”, “r”, “java” or “other”.
- description: str
The description of the custom inference model.
- target_type: datarobot.TARGET_TYPE
Target type of the custom inference model. Values: [datarobot.TARGET_TYPE.BINARY, datarobot.TARGET_TYPE.REGRESSION, datarobot.TARGET_TYPE.MULTICLASS, datarobot.TARGET_TYPE.UNSTRUCTURED, datarobot.TARGET_TYPE.ANOMALY, datarobot.TARGET_TYPE.TEXT_GENERATION]
- target_name: str, optional
Target feature name. It is optional(ignored if provided) for datarobot.TARGET_TYPE.UNSTRUCTURED or datarobot.TARGET_TYPE.ANOMALY target type.
- latest_version: datarobot.CustomModelVersion or None
The latest version of the custom model if the model has a latest version.
- deployments_count: int
Number of a deployments of the custom models.
- target_name: str
The custom model target name.
- positive_class_label: str
For binary classification projects, a label of a positive class.
- negative_class_label: str
For binary classification projects, a label of a negative class.
- prediction_threshold: float
For binary classification projects, a threshold used for predictions.
- training_data_assignment_in_progress: bool
Flag describing if training data assignment is in progress.
- training_dataset_id: str, optional
The ID of a dataset assigned to the custom model.
- training_dataset_version_id: str, optional
The ID of a dataset version assigned to the custom model.
- training_data_file_name: str, optional
The name of assigned training data file.
- training_data_partition_column: str, optional
The name of a partition column in a training dataset assigned to the custom model.
- created_by: str
The username of a user who created the custom model.
- updated_at: str
ISO-8601 formatted timestamp of when the custom model was updated
- created_at: str
ISO-8601 formatted timestamp of when the custom model was created
- network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional
Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.PUBLIC].
- maximum_memory: int, optional
The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s.
- replicas: int, optional
A fixed number of replicas that will be deployed in the cluster
- is_training_data_for_versions_permanently_enabled: bool, optional
Whether training data assignment on the version level is permanently enabled for the model.
- classmethod list(is_deployed=None, search_for=None, order_by=None)¶
List custom inference models available to the user.
New in version v2.21.
- Parameters
- is_deployed: bool, optional
Flag for filtering custom inference models. If set to True, only deployed custom inference models are returned. If set to False, only not deployed custom inference models are returned.
- search_for: str, optional
String for filtering custom inference models - only custom inference models that contain the string in name or description will be returned. If not specified, all custom models will be returned
- order_by: str, optional
Property to sort custom inference models by. Supported properties are “created” and “updated”. Prefix the attribute name with a dash to sort in descending order, e.g. order_by=’-created’. By default, the order_by parameter is None which will result in custom models being returned in order of creation time descending.
- Returns
- List[CustomInferenceModel]
A list of custom inference models.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status
- datarobot.errors.ServerError
If the server responded with 5xx status
- Return type
List
[CustomInferenceModel
]
- classmethod get(custom_model_id)¶
Get custom inference model by id.
New in version v2.21.
- Parameters
- custom_model_id: str
The ID of the custom inference model.
- Returns
- CustomInferenceModel
Retrieved custom inference model.
- Raises
- datarobot.errors.ClientError
The ID the server responded with 4xx status.
- datarobot.errors.ServerError
The ID the server responded with 5xx status.
- Return type
- download_latest_version(file_path)¶
Download the latest custom inference model version.
New in version v2.21.
- Parameters
- file_path: str
Path to create a file with custom model version content.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- classmethod create(name, target_type, target_name=None, language=None, description=None, positive_class_label=None, negative_class_label=None, prediction_threshold=None, class_labels=None, class_labels_file=None, network_egress_policy=None, maximum_memory=None, replicas=None, is_training_data_for_versions_permanently_enabled=None)¶
Create a custom inference model.
New in version v2.21.
- Parameters
- name: str
Name of the custom inference model.
- target_type: datarobot.TARGET_TYPE
Target type of the custom inference model. Values: [datarobot.TARGET_TYPE.BINARY, datarobot.TARGET_TYPE.REGRESSION, datarobot.TARGET_TYPE.MULTICLASS, datarobot.TARGET_TYPE.UNSTRUCTURED, datarobot.TARGET_TYPE.TEXT_GENERATION]
- target_name: str, optional
Target feature name. It is optional(ignored if provided) for datarobot.TARGET_TYPE.UNSTRUCTURED target type.
- language: str, optional
Programming language of the custom learning model.
- description: str, optional
Description of the custom learning model.
- positive_class_label: str, optional
Custom inference model positive class label for binary classification.
- negative_class_label: str, optional
Custom inference model negative class label for binary classification.
- prediction_threshold: float, optional
Custom inference model prediction threshold.
- class_labels: List[str], optional
Custom inference model class labels for multiclass classification. Cannot be used with class_labels_file.
- class_labels_file: str, optional
Path to file containing newline separated class labels for multiclass classification. Cannot be used with class_labels.
- network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional
Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.PUBLIC]
- maximum_memory: int, optional
The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s.
- replicas: int, optional
A fixed number of replicas that will be deployed in the cluster.
- is_training_data_for_versions_permanently_enabled: bool, optional
Permanently enable training data assignment on the version level for the current model, instead of training data assignment on the model level.
- Returns
- CustomInferenceModel
Created a custom inference model.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
- classmethod copy_custom_model(custom_model_id)¶
Create a custom inference model by copying existing one.
New in version v2.21.
- Parameters
- custom_model_id: str
The ID of the custom inference model to copy.
- Returns
- CustomInferenceModel
Created a custom inference model.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
- update(name=None, language=None, description=None, target_name=None, positive_class_label=None, negative_class_label=None, prediction_threshold=None, class_labels=None, class_labels_file=None, is_training_data_for_versions_permanently_enabled=None)¶
Update custom inference model properties.
New in version v2.21.
- Parameters
- name: str, optional
New custom inference model name.
- language: str, optional
New custom inference model programming language.
- description: str, optional
New custom inference model description.
- target_name: str, optional
New custom inference model target name.
- positive_class_label: str, optional
New custom inference model positive class label.
- negative_class_label: str, optional
New custom inference model negative class label.
- prediction_threshold: float, optional
New custom inference model prediction threshold.
- class_labels: List[str], optional
custom inference model class labels for multiclass classification Cannot be used with class_labels_file
- class_labels_file: str, optional
Path to file containing newline separated class labels for multiclass classification. Cannot be used with class_labels
- is_training_data_for_versions_permanently_enabled: bool, optional
Permanently enable training data assignment on the version level for the current model, instead of training data assignment on the model level.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- refresh()¶
Update custom inference model with the latest data from server.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- delete()¶
Delete custom inference model.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- assign_training_data(dataset_id, partition_column=None, max_wait=600)¶
Assign training data to the custom inference model.
New in version v2.21.
- Parameters
- dataset_id: str
The ID of the training dataset to be assigned.
- partition_column: str, optional
The name of a partition column in the training dataset.
- max_wait: int, optional
The max time to wait for a training data assignment. If set to None, then method will return without waiting. Defaults to 10 min.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status
- datarobot.errors.ServerError
If the server responded with 5xx status
- Return type
None
- class datarobot.CustomModelTest(**kwargs)¶
An custom model test.
New in version v2.21.
- Attributes
- id: str
test id
- custom_model_image_id: str
id of a custom model image
- image_type: str
the type of the image, either CUSTOM_MODEL_IMAGE_TYPE.CUSTOM_MODEL_IMAGE if the testing attempt is using a CustomModelImage as its model or CUSTOM_MODEL_IMAGE_TYPE.CUSTOM_MODEL_VERSION if the testing attempt is using a CustomModelVersion with dependency management
- overall_status: str
a string representing testing status. Status can be - ‘not_tested’: the check not run - ‘failed’: the check failed - ‘succeeded’: the check succeeded - ‘warning’: the check resulted in a warning, or in non-critical failure - ‘in_progress’: the check is in progress
- detailed_status: dict
detailed testing status - maps the testing types to their status and message. The keys of the dict are one of ‘errorCheck’, ‘nullValueImputation’, ‘longRunningService’, ‘sideEffects’. The values are dict with ‘message’ and ‘status’ keys.
- created_by: str
a user who created a test
- dataset_id: str, optional
id of a dataset used for testing
- dataset_version_id: str, optional
id of a dataset version used for testing
- completed_at: str, optional
ISO-8601 formatted timestamp of when the test has completed
- created_at: str, optional
ISO-8601 formatted timestamp of when the version was created
- network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional
Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.PUBLIC].
- maximum_memory: int, optional
The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s
- replicas: int, optional
A fixed number of replicas that will be deployed in the cluster
- classmethod create(custom_model_id, custom_model_version_id, dataset_id=None, max_wait=600, network_egress_policy=None, maximum_memory=None, replicas=None)¶
Create and start a custom model test.
New in version v2.21.
- Parameters
- custom_model_id: str
the id of the custom model
- custom_model_version_id: str
the id of the custom model version
- dataset_id: str, optional
The id of the testing dataset for non-unstructured custom models. Ignored and not required for unstructured models.
- max_wait: int, optional
max time to wait for a test completion. If set to None - method will return without waiting.
- network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional
Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.PUBLIC].
- maximum_memory: int, optional
The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s
- replicas: int, optional
A fixed number of replicas that will be deployed in the cluster
- Returns
- CustomModelTest
created custom model test
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- classmethod list(custom_model_id)¶
List custom model tests.
New in version v2.21.
- Parameters
- custom_model_id: str
the id of the custom model
- Returns
- List[CustomModelTest]
a list of custom model tests
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- classmethod get(custom_model_test_id)¶
Get custom model test by id.
New in version v2.21.
- Parameters
- custom_model_test_id: str
the id of the custom model test
- Returns
- CustomModelTest
retrieved custom model test
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- get_log()¶
Get log of a custom model test.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- get_log_tail()¶
Get log tail of a custom model test.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- cancel()¶
Cancel custom model test that is in progress.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- refresh()¶
Update custom model test with the latest data from server.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- class datarobot.CustomModelVersion(**kwargs)¶
A version of a DataRobot custom model.
New in version v2.21.
- Attributes
- id: str
The ID of the custom model version.
- custom_model_id: str
The ID of the custom model.
- version_minor: int
A minor version number of the custom model version.
- version_major: int
A major version number of the custom model version.
- is_frozen: bool
A flag if the custom model version is frozen.
- items: List[CustomModelFileItem]
A list of file items attached to the custom model version.
- base_environment_id: str
The ID of the environment to use with the model.
- base_environment_version_id: str
The ID of the environment version to use with the model.
- label: str, optional
A short human readable string to label the version.
- description: str, optional
The custom model version description.
- created_at: str, optional
ISO-8601 formatted timestamp of when the version was created.
- dependencies: List[CustomDependency]
The parsed dependencies of the custom model version if the version has a valid requirements.txt file.
- network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional
Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.PUBLIC].
- maximum_memory: int, optional
The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s.
- replicas: int, optional
A fixed number of replicas that will be deployed in the cluster.
- required_metadata_values: List[RequiredMetadataValue]
Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
- training_data: TrainingData, optional
The information about the training data assigned to the model version.
- holdout_data: HoldoutData, optional
The information about the holdout data assigned to the model version.
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
- classmethod create_clean(custom_model_id, base_environment_id=None, is_major_update=True, folder_path=None, files=None, network_egress_policy=None, maximum_memory=None, replicas=None, required_metadata_values=None, training_dataset_id=None, partition_column=None, holdout_dataset_id=None, keep_training_holdout_data=None, max_wait=600, runtime_parameter_values=None, base_environment_version_id=None)¶
Create a custom model version without files from previous versions.
Create a version with training or holdout data: If training/holdout data related parameters are provided, the training data is assigned asynchronously. In this case: * if max_wait is not None, the function returns once the job is finished. * if max_wait is None, the function returns immediately. Progress can be polled by the user (see examples).
If training data assignment fails, new version is still created, but it is not allowed to create a model package (version) for the model version and to deploy it. To check for training data assignment error, check version.training_data.assignment_error[“message”].
New in version v2.21.
- Parameters
- custom_model_id: str
The ID of the custom model.
- base_environment_id: str
The base environment to use with this model version. At least one of “base_environment_id” and “base_environment_version_id” must be provided. If both are specified, the version must belong to the environment.
- base_environment_version_id: str
The base environment version ID to use with this model version. At least one of “base_environment_id” and “base_environment_version_id” must be provided. If both are specified, the version must belong to the environment. If not specified: in case previous model versions exist, the value from the latest model version is inherited, otherwise, latest successfully built version of the environment specified in “base_environment_id” is used.
- is_major_update: bool, optional
The flag defining if a custom model version will be a minor or a major version. Default to True
- folder_path: str, optional
The path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path.
- files: list, optional
The list of tuples, where values in each tuple are the local filesystem path and the path the file should be placed in the model. If the list is of strings, then basenames will be used for tuples. Example: [(“/home/user/Documents/myModel/file1.txt”, “file1.txt”), (“/home/user/Documents/myModel/folder/file2.txt”, “folder/file2.txt”)] or [“/home/user/Documents/myModel/file1.txt”, “/home/user/Documents/myModel/folder/file2.txt”]
- network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional
Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.PUBLIC].
- maximum_memory: int, optional
The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s.
- replicas: int, optional
A fixed number of replicas that will be deployed in the cluster.
- required_metadata_values: List[RequiredMetadataValue]
Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
- training_dataset_id: str, optional
The ID of the training dataset to assign to the custom model.
- partition_column: str, optional
Name of a partition column in a training dataset assigned to the custom model. Can only be assigned for structured models.
- holdout_dataset_id: str, optional
The ID of the holdout dataset to assign to the custom model. Can only be assigned for unstructured models.
- keep_training_holdout_data: bool, optional
If the version should inherit training and holdout data from the previous version. Defaults to True. This field is only applicable if the model has training data for versions enabled, otherwise the field value will be ignored.
- max_wait: int, optional
Max time to wait for training data assignment. If set to None - method will return without waiting. Defaults to 10 minutes.
- runtime_parameter_values: List[RuntimeParameterValue]
Additional parameters to be injected into a model at runtime. The fieldName must match a fieldName that is listed in the runtimeParameterDefinitions section of the model-metadata.yaml file.
- Returns
- CustomModelVersion
Created custom model version.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- datarobot.errors.InvalidUsageError
If wrong parameters are provided.
- datarobot.errors.TrainingDataAssignmentError
If training data assignment fails.
Examples
Create a version with blocking (default max_wait=600) training data assignment:
import datarobot as dr from datarobot.errors import TrainingDataAssignmentError dr.Client(token=my_token, endpoint=endpoint) try: version = dr.CustomModelVersion.create_clean( custom_model_id="6444482e5583f6ee2e572265", base_environment_id="642209acc563893014a41e24", training_dataset_id="6421f2149a4f9b1bec6ad6dd", ) except TrainingDataAssignmentError as e: print(e)
Create a version with non-blocking training data assignment:
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) version = dr.CustomModelVersion.create_clean( custom_model_id="6444482e5583f6ee2e572265", base_environment_id="642209acc563893014a41e24", training_dataset_id="6421f2149a4f9b1bec6ad6dd", max_wait=None, ) while version.training_data.assignment_in_progress: time.sleep(10) version.refresh() if version.training_data.assignment_error: print(version.training_data.assignment_error["message"])
- Return type
- classmethod create_from_previous(custom_model_id, base_environment_id=None, is_major_update=True, folder_path=None, files=None, files_to_delete=None, network_egress_policy=None, maximum_memory=None, replicas=None, required_metadata_values=None, training_dataset_id=None, partition_column=None, holdout_dataset_id=None, keep_training_holdout_data=None, max_wait=600, runtime_parameter_values=None, base_environment_version_id=None)¶
Create a custom model version containing files from a previous version.
Create a version with training/holdout data: If training/holdout data related parameters are provided, the training data is assigned asynchronously. In this case: * if max_wait is not None, function returns once job is finished. * if max_wait is None, function returns immediately, progress can be polled by the user, see examples.
If training data assignment fails, new version is still created, but it is not allowed to create a model package (version) for the model version and to deploy it. To check for training data assignment error, check version.training_data.assignment_error[“message”].
New in version v2.21.
- Parameters
- custom_model_id: str
The ID of the custom model.
- base_environment_id: str
The base environment to use with this model version. At least one of “base_environment_id” and “base_environment_version_id” must be provided. If both are specified, the version must belong to the environment.
- base_environment_version_id: str
The base environment version ID to use with this model version. At least one of “base_environment_id” and “base_environment_version_id” must be provided. If both are specified, the version must belong to the environment. If not specified: in case previous model versions exist, the value from the latest model version is inherited, otherwise, latest successfully built version of the environment specified in “base_environment_id” is used.
- is_major_update: bool, optional
The flag defining if a custom model version will be a minor or a major version. Defaults to True.
- folder_path: str, optional
The path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path.
- files: list, optional
The list of tuples, where values in each tuple are the local filesystem path and the path the file should be placed in the model. If list is of strings, then basenames will be used for tuples Example: [(“/home/user/Documents/myModel/file1.txt”, “file1.txt”), (“/home/user/Documents/myModel/folder/file2.txt”, “folder/file2.txt”)] or [“/home/user/Documents/myModel/file1.txt”, “/home/user/Documents/myModel/folder/file2.txt”]
- files_to_delete: list, optional
The list of a file items ids to be deleted. Example: [“5ea95f7a4024030aba48e4f9”, “5ea6b5da402403181895cc51”]
- network_egress_policy: datarobot.NETWORK_EGRESS_POLICY, optional
Determines whether the given custom model is isolated, or can access the public network. Values: [datarobot.NETWORK_EGRESS_POLICY.NONE, datarobot.NETWORK_EGRESS_POLICY.PUBLIC].
- maximum_memory: int, optional
The maximum memory that might be allocated by the custom-model. If exceeded, the custom-model will be killed by k8s
- replicas: int, optional
A fixed number of replicas that will be deployed in the cluster
- required_metadata_values: List[RequiredMetadataValue]
Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
- training_dataset_id: str, optional
The ID of the training dataset to assign to the custom model.
- partition_column: str, optional
Name of a partition column in a training dataset assigned to the custom model. Can only be assigned for structured models.
- holdout_dataset_id: str, optional
The ID of the holdout dataset to assign to the custom model. Can only be assigned for unstructured models.
- keep_training_holdout_data: bool, optional
If the version should inherit training and holdout data from the previous version. Defaults to True. This field is only applicable if the model has training data for versions enabled, otherwise the field value will be ignored.
- max_wait: int, optional
Max time to wait for training data assignment. If set to None - method will return without waiting. Defaults to 10 minutes.
- runtime_parameter_values: List[RuntimeParameterValue]
Additional parameters to be injected into the model at runtime. The fieldName must match a fieldName that is listed in the runtimeParameterDefinitions section of the model-metadata.yaml file. This list will be merged with any existing runtime values set from the prior version, so it is possible to specify a null value to unset specific parameters and fall back to the defaultValue from the definition.
- Returns
- CustomModelVersion
created custom model version
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- datarobot.errors.InvalidUsageError
If wrong parameters are provided.
- datarobot.errors.TrainingDataAssignmentError
If training data assignment fails.
Examples
Create a version with blocking (default max_wait=600) training data assignment:
import datarobot as dr from datarobot.errors import TrainingDataAssignmentError dr.Client(token=my_token, endpoint=endpoint) try: version = dr.CustomModelVersion.create_from_previous( custom_model_id="6444482e5583f6ee2e572265", base_environment_id="642209acc563893014a41e24", training_dataset_id="6421f2149a4f9b1bec6ad6dd", ) except TrainingDataAssignmentError as e: print(e)
Create a version with non-blocking training data assignment:
import datarobot as dr dr.Client(token=my_token, endpoint=endpoint) version = dr.CustomModelVersion.create_from_previous( custom_model_id="6444482e5583f6ee2e572265", base_environment_id="642209acc563893014a41e24", training_dataset_id="6421f2149a4f9b1bec6ad6dd", max_wait=None, ) while version.training_data.assignment_in_progress: time.sleep(10) version.refresh() if version.training_data.assignment_error: print(version.training_data.assignment_error["message"])
- Return type
- classmethod list(custom_model_id)¶
List custom model versions.
New in version v2.21.
- Parameters
- custom_model_id: str
The ID of the custom model.
- Returns
- List[CustomModelVersion]
A list of custom model versions.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
List
[CustomModelVersion
]
- classmethod get(custom_model_id, custom_model_version_id)¶
Get custom model version by id.
New in version v2.21.
- Parameters
- custom_model_id: str
The ID of the custom model.
- custom_model_version_id: str
The id of the custom model version to retrieve.
- Returns
- CustomModelVersion
Retrieved custom model version.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
- download(file_path)¶
Download custom model version.
New in version v2.21.
- Parameters
- file_path: str
Path to create a file with custom model version content.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- update(description=None, required_metadata_values=None)¶
Update custom model version properties.
New in version v2.21.
- Parameters
- description: str, optional
New custom model version description.
- required_metadata_values: List[RequiredMetadataValue], optional
Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- refresh()¶
Update custom model version with the latest data from server.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- get_feature_impact(with_metadata=False)¶
Get custom model feature impact.
New in version v2.23.
- Parameters
- with_metadatabool
The flag indicating if the result should include the metadata as well.
- Returns
- feature_impactslist of dict
The feature impact data. Each item is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
List
[Dict
[str
,Any
]]
- calculate_feature_impact(max_wait=600)¶
Calculate custom model feature impact.
New in version v2.23.
- Parameters
- max_wait: int, optional
Max time to wait for feature impact calculation. If set to None - method will return without waiting. Defaults to 10 min
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
None
- class datarobot.models.execution_environment.RequiredMetadataKey(**kwargs)¶
Definition of a metadata key that custom models using this environment must define
New in version v2.25.
- Attributes
- field_name: str
The required field key. This value will be added as an environment variable when running custom models.
- display_name: str
A human readable name for the required field.
- class datarobot.models.CustomModelVersionConversion(**kwargs)¶
A conversion of a DataRobot custom model version.
New in version v2.27.
- Attributes
- id: str
The ID of the custom model version conversion.
- custom_model_version_id: str
The ID of the custom model version.
- created: str
ISO-8601 timestamp of when the custom model conversion created.
- main_program_item_id: str or None
The ID of the main program item.
- log_message: str or None
The conversion output log message.
- generated_metadata: dict or None
The dict contains two items: ‘outputDataset’ & ‘outputColumns’.
- conversion_succeeded: bool
Whether the conversion succeeded or not.
- conversion_in_progress: bool
Whether a given conversion is in progress or not.
- should_stop: bool
Whether the user asked to stop a conversion.
- classmethod run_conversion(custom_model_id, custom_model_version_id, main_program_item_id, max_wait=None)¶
Initiate a new custom model version conversion.
- Parameters
- custom_model_idstr
The associated custom model ID.
- custom_model_version_idstr
The associated custom model version ID.
- main_program_item_idstr
The selected main program item ID. This should be one of the SAS items in the associated custom model version.
- max_wait: int or None
Max wait time in seconds. If None, then don’t wait.
- Returns
- conversion_idstr
The ID of the newly created conversion entity.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
str
- classmethod stop_conversion(custom_model_id, custom_model_version_id, conversion_id)¶
Stop a conversion that is in progress.
- Parameters
- custom_model_idstr
The ID of the associated custom model.
- custom_model_version_idstr
The ID of the associated custom model version.
- conversion_id
The ID of a conversion that is in-progress.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
Response
- classmethod get(custom_model_id, custom_model_version_id, conversion_id)¶
Get custom model version conversion by id.
New in version v2.27.
- Parameters
- custom_model_id: str
The ID of the custom model.
- custom_model_version_id: str
The ID of the custom model version.
- conversion_id: str
The ID of the conversion to retrieve.
- Returns
- CustomModelVersionConversion
Retrieved custom model version conversion.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
- classmethod get_latest(custom_model_id, custom_model_version_id)¶
Get latest custom model version conversion for a given custom model version.
New in version v2.27.
- Parameters
- custom_model_id: str
The ID of the custom model.
- custom_model_version_id: str
The ID of the custom model version.
- Returns
- CustomModelVersionConversion or None
Retrieved latest conversion for a given custom model version.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
Optional
[CustomModelVersionConversion
]
- classmethod list(custom_model_id, custom_model_version_id)¶
Get custom model version conversions list per custom model version.
New in version v2.27.
- Parameters
- custom_model_id: str
The ID of the custom model.
- custom_model_version_id: str
The ID of the custom model version.
- Returns
- List[CustomModelVersionConversion]
Retrieved conversions for a given custom model version.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
- class datarobot.CustomModelVersionDependencyBuild(**kwargs)¶
Metadata about a DataRobot custom model version’s dependency build
New in version v2.22.
- Attributes
- custom_model_id: str
The ID of the custom model.
- custom_model_version_id: str
The ID of the custom model version.
- build_status: str
The status of the custom model version’s dependency build.
- started_at: str
ISO-8601 formatted timestamp of when the build was started.
- completed_at: str, optional
ISO-8601 formatted timestamp of when the build has completed.
- classmethod get_build_info(custom_model_id, custom_model_version_id)¶
Retrieve information about a custom model version’s dependency build
New in version v2.22.
- Parameters
- custom_model_id: str
The ID of the custom model.
- custom_model_version_id: str
The ID of the custom model version.
- Returns
- CustomModelVersionDependencyBuild
The dependency build information.
- Return type
- classmethod start_build(custom_model_id, custom_model_version_id, max_wait=600)¶
Start the dependency build for a custom model version dependency build
New in version v2.22.
- Parameters
- custom_model_id: str
The ID of the custom model
- custom_model_version_id: str
the ID of the custom model version
- max_wait: int, optional
Max time to wait for a build completion. If set to None - method will return without waiting.
- Return type
Optional
[CustomModelVersionDependencyBuild
]
- get_log()¶
Get log of a custom model version dependency build.
New in version v2.22.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
str
- cancel()¶
Cancel custom model version dependency build that is in progress.
New in version v2.22.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- refresh()¶
Update custom model version dependency build with the latest data from server.
New in version v2.22.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- class datarobot.ExecutionEnvironment(**kwargs)¶
An execution environment entity.
New in version v2.21.
- Attributes
- id: str
the id of the execution environment
- name: str
the name of the execution environment
- description: str, optional
the description of the execution environment
- programming_language: str, optional
the programming language of the execution environment. Can be “python”, “r”, “java” or “other”
- is_public: bool, optional
public accessibility of environment, visible only for admin user
- created_at: str, optional
ISO-8601 formatted timestamp of when the execution environment version was created
- latest_version: ExecutionEnvironmentVersion, optional
the latest version of the execution environment
- classmethod create(name, description=None, programming_language=None, required_metadata_keys=None)¶
Create an execution environment.
New in version v2.21.
- Parameters
- name: str
execution environment name
- description: str, optional
execution environment description
- programming_language: str, optional
programming language of the environment to be created. Can be “python”, “r”, “java” or “other”. Default value - “other”
- required_metadata_keys: List[RequiredMetadataKey]
Definition of a metadata keys that custom models using this environment must define
- Returns
- ExecutionEnvironment
created execution environment
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- classmethod list(search_for=None)¶
List execution environments available to the user.
New in version v2.21.
- Parameters
- search_for: str, optional
the string for filtering execution environment - only execution environments that contain the string in name or description will be returned.
- Returns
- List[ExecutionEnvironment]
a list of execution environments.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- classmethod get(execution_environment_id)¶
Get execution environment by it’s id.
New in version v2.21.
- Parameters
- execution_environment_id: str
ID of the execution environment to retrieve
- Returns
- ExecutionEnvironment
retrieved execution environment
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- delete()¶
Delete execution environment.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- update(name=None, description=None, required_metadata_keys=None)¶
Update execution environment properties.
New in version v2.21.
- Parameters
- name: str, optional
new execution environment name
- description: str, optional
new execution environment description
- required_metadata_keys: List[RequiredMetadataKey]
Definition of a metadata keys that custom models using this environment must define
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- refresh()¶
Update execution environment with the latest data from server.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- class datarobot.ExecutionEnvironmentVersion(**kwargs)¶
A version of a DataRobot execution environment.
New in version v2.21.
- Attributes
- id: str
the id of the execution environment version
- environment_id: str
the id of the execution environment the version belongs to
- build_status: str
the status of the execution environment version build
- label: str, optional
the label of the execution environment version
- description: str, optional
the description of the execution environment version
- created_at: str, optional
ISO-8601 formatted timestamp of when the execution environment version was created
- docker_context_size: int, optional
The size of the uploaded Docker context in bytes if available or None if not
- docker_image_size: int, optional
The size of the built Docker image in bytes if available or None if not
- classmethod create(execution_environment_id, docker_context_path, label=None, description=None, max_wait=600)¶
Create an execution environment version.
New in version v2.21.
- Parameters
- execution_environment_id: str
the id of the execution environment
- docker_context_path: str
the path to a docker context archive or folder
- label: str, optional
short human readable string to label the version
- description: str, optional
execution environment version description
- max_wait: int, optional
max time to wait for a final build status (“success” or “failed”). If set to None - method will return without waiting.
- Returns
- ExecutionEnvironmentVersion
created execution environment version
- Raises
- datarobot.errors.AsyncTimeoutError
if version did not reach final state during timeout seconds
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- classmethod list(execution_environment_id, build_status=None)¶
List execution environment versions available to the user.
New in version v2.21.
- Parameters
- execution_environment_id: str
the id of the execution environment
- build_status: str, optional
build status of the execution environment version to filter by. See datarobot.enums.EXECUTION_ENVIRONMENT_VERSION_BUILD_STATUS for valid options
- Returns
- List[ExecutionEnvironmentVersion]
a list of execution environment versions.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- classmethod get(execution_environment_id, version_id)¶
Get execution environment version by id.
New in version v2.21.
- Parameters
- execution_environment_id: str
the id of the execution environment
- version_id: str
the id of the execution environment version to retrieve
- Returns
- ExecutionEnvironmentVersion
retrieved execution environment version
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- download(file_path)¶
Download execution environment version.
New in version v2.21.
- Parameters
- file_path: str
path to create a file with execution environment version content
- Returns
- ExecutionEnvironmentVersion
retrieved execution environment version
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- get_build_log()¶
Get execution environment version build log and error.
New in version v2.21.
- Returns
- Tuple[str, str]
retrieved execution environment version build log and error. If there is no build error - None is returned.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- refresh()¶
Update execution environment version with the latest data from server.
New in version v2.21.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- class datarobot.models.custom_model_version.HoldoutData(dataset_id=None, dataset_version_id=None, dataset_name=None, partition_column=None)¶
Holdout data assigned to a DataRobot custom model version.
New in version v3.2.
- Attributes
- dataset_id: str
The ID of the dataset.
- dataset_version_id: str
The ID of the dataset version.
- dataset_name: str
The name of the dataset.
- partition_column: str
The name of the partitions column.
- class datarobot.models.custom_model_version.TrainingData(dataset_id=None, dataset_version_id=None, dataset_name=None, assignment_in_progress=None, assignment_error=None)¶
Training data assigned to a DataRobot custom model version.
New in version v3.2.
- Attributes
- dataset_id: str
The ID of the dataset.
- dataset_version_id: str
The ID of the dataset version.
- dataset_name: str
The name of the dataset.
- assignment_in_progress: bool
The status of the assignment in progress.
- assignment_error: dict
The assignment error message.
- class datarobot.models.custom_model_version.RuntimeParameter(**kwargs)¶
- Definition of a runtime parameter used for the custom model version, it includes
the override value if provided.
New in version v3.4.0.
- Attributes
- field_name: str
The runtime parameter name. This value is added as an environment variable when running custom models.
- type: str
The value type accepted by the runtime parameter.
- description: str
Describes how the runtime parameter impacts the running model.
- allow_empty: bool
Indicates if the runtime parameter must be set before registration.
- min_value: float
The minimum value for a numeric field.
- max_value: float
The maximum value for a numeric field.
- default_value: str, bool, float or None
The default value for the given field.
- override_value: str, bool, float or None
The value set by the user that overrides the default set in the runtime parameter definition.
- current_value: str, bool, float or None
After the default and the override values are applied, this is the value of the runtime parameter.
- credential_type: str
Describes the type of credential, used only for credentials parameters.
- class datarobot.models.custom_model_version.RuntimeParameterValue(**kwargs)¶
The definition of a runtime parameter value used for the custom model version, this defines the runtime parameter override.
New in version v3.4.0.
- Attributes
- field_name: str
The runtime parameter name. This value is added as an environment variable when running custom models.
- type: str
The value type accepted by the runtime parameter.
- value: str, bool or float
After the default and the override values are applied, this is the value of the runtime parameter.
Custom Tasks¶
- class datarobot.CustomTask(id, target_type, latest_version, created_at, updated_at, name, description, language, created_by, calibrate_predictions=None)¶
A custom task. This can be in a partial state or a complete state. When the latest_version is None, the empty task has been initialized with some metadata. It is not yet use-able for actual training. Once the first CustomTaskVersion has been created, you can put the CustomTask in UserBlueprints to train Models in Projects
New in version v2.26.
- Attributes
- id: str
id of the custom task
- name: str
name of the custom task
- language: str
programming language of the custom task. Can be “python”, “r”, “java” or “other”
- description: str
description of the custom task
- target_type: datarobot.enums.CUSTOM_TASK_TARGET_TYPE
the target type of the custom task. One of:
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.BINARY
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.REGRESSION
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.MULTICLASS
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.ANOMALY
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.TRANSFORM
- latest_version: datarobot.CustomTaskVersion or None
latest version of the custom task if the task has a latest version. If the latest version is None, the custom task is not ready for use in user blueprints. You must create its first CustomTaskVersion before you can use the CustomTask
- created_by: str
The username of the user who created the custom task.
- updated_at: str
An ISO-8601 formatted timestamp of when the custom task was updated.
- created_at: str
ISO-8601 formatted timestamp of when the custom task was created
- calibrate_predictions: bool
whether anomaly predictions should be calibrated to be between 0 and 1 by DR. only applies to custom estimators with target type datarobot.enums.CUSTOM_TASK_TARGET_TYPE.ANOMALY
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
- classmethod list(order_by=None, search_for=None)¶
List custom tasks available to the user.
New in version v2.26.
- Parameters
- search_for: str, optional
string for filtering custom tasks - only tasks that contain the string in name or description will be returned. If not specified, all custom task will be returned
- order_by: str, optional
property to sort custom tasks by. Supported properties are “created” and “updated”. Prefix the attribute name with a dash to sort in descending order, e.g. order_by=’-created’. By default, the order_by parameter is None which will result in custom tasks being returned in order of creation time descending
- Returns
- List[CustomTask]
a list of custom tasks.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
List
[CustomTask
]
- classmethod get(custom_task_id)¶
Get custom task by id.
New in version v2.26.
- Parameters
- custom_task_id: str
id of the custom task
- Returns
- CustomTask
retrieved custom task
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- Return type
- classmethod copy(custom_task_id)¶
Create a custom task by copying existing one.
New in version v2.26.
- Parameters
- custom_task_id: str
id of the custom task to copy
- Returns
- CustomTask
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
- classmethod create(name, target_type, language=None, description=None, calibrate_predictions=None, **kwargs)¶
Creates only the metadata for a custom task. This task will not be use-able until you have created a CustomTaskVersion attached to this task.
New in version v2.26.
- Parameters
- name: str
name of the custom task
- target_type: datarobot.enums.CUSTOM_TASK_TARGET_TYPE
the target typed based on the following values. Anything else will raise an error
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.BINARY
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.REGRESSION
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.MULTICLASS
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.ANOMALY
datarobot.enums.CUSTOM_TASK_TARGET_TYPE.TRANSFORM
- language: str, optional
programming language of the custom task. Can be “python”, “r”, “java” or “other”
- description: str, optional
description of the custom task
- calibrate_predictions: bool, optional
whether anomaly predictions should be calibrated to be between 0 and 1 by DR. if None, uses default value from DR app (True). only applies to custom estimators with target type datarobot.enums.CUSTOM_TASK_TARGET_TYPE.ANOMALY
- Returns
- CustomTask
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- Return type
- update(name=None, language=None, description=None, **kwargs)¶
Update custom task properties.
New in version v2.26.
- Parameters
- name: str, optional
new custom task name
- language: str, optional
new custom task programming language
- description: str, optional
new custom task description
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- Return type
None
- refresh()¶
Update custom task with the latest data from server.
New in version v2.26.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
None
- delete()¶
Delete custom task.
New in version v2.26.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
None
- download_latest_version(file_path)¶
Download the latest custom task version.
New in version v2.26.
- Parameters
- file_path: str
the full path of the target zip file
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- Return type
None
- get_access_list()¶
Retrieve access control settings of this custom task.
New in version v2.27.
- Returns
- list ofclass:SharingAccess <datarobot.SharingAccess>
- Return type
List
[SharingAccess
]
Update the access control settings of this custom task.
New in version v2.27.
- Parameters
- access_listlist of
SharingAccess
A list of SharingAccess to update.
- access_listlist of
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
Examples
Transfer access to the custom task from old_user@datarobot.com to new_user@datarobot.com
import datarobot as dr new_access = dr.SharingAccess(new_user@datarobot.com, dr.enums.SHARING_ROLE.OWNER, can_share=True) access_list = [dr.SharingAccess(old_user@datarobot.com, None), new_access] dr.CustomTask.get('custom-task-id').share(access_list)
- Return type
None
- class datarobot.models.custom_task_version.CustomTaskFileItem(id, file_name, file_path, file_source, created_at=None)¶
A file item attached to a DataRobot custom task version.
New in version v2.26.
- Attributes
- id: str
id of the file item
- file_name: str
name of the file item
- file_path: str
path of the file item
- file_source: str
source of the file item
- created_at: str
ISO-8601 formatted timestamp of when the version was created
- class datarobot.enums.CustomTaskOutgoingNetworkPolicy(value)¶
The way to set and view a CustomTaskVersions outgoing network policy.
- class datarobot.CustomTaskVersion(id, custom_task_id, version_major, version_minor, label, created_at, is_frozen, items, description=None, base_environment_id=None, maximum_memory=None, base_environment_version_id=None, dependencies=None, required_metadata_values=None, arguments=None, outgoing_network_policy=None)¶
A version of a DataRobot custom task.
New in version v2.26.
- Attributes
- id: str
id of the custom task version
- custom_task_id: str
id of the custom task
- version_minor: int
a minor version number of custom task version
- version_major: int
a major version number of custom task version
- label: str
short human readable string to label the version
- created_at: str
ISO-8601 formatted timestamp of when the version was created
- is_frozen: bool
a flag if the custom task version is frozen
- items: List[CustomTaskFileItem]
a list of file items attached to the custom task version
- description: str, optional
custom task version description
- base_environment_id: str, optional
id of the environment to use with the task
- base_environment_version_id: str, optional
id of the environment version to use with the task
- dependencies: List[CustomDependency]
the parsed dependencies of the custom task version if the version has a valid requirements.txt file
- required_metadata_values: List[RequiredMetadataValue]
Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
- arguments: List[UserBlueprintTaskArgument]
A list of custom task version arguments.
- outgoing_network_policy: Optional[CustomTaskOutgoingNetworkPolicy]
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- classmethod create_clean(custom_task_id, base_environment_id, maximum_memory=None, is_major_update=True, folder_path=None, required_metadata_values=None, outgoing_network_policy=None)¶
Create a custom task version without files from previous versions.
New in version v2.26.
- Parameters
- custom_task_id: str
the id of the custom task
- base_environment_id: str
the id of the base environment to use with the custom task version
- maximum_memory: Optional[int]
A number in bytes about how much memory custom tasks’ inference containers can run with.
- is_major_update: bool
If the current version is 2.3, True would set the new version at 3.0. False would set the new version at 2.4. Defaults to True.
- folder_path: Optional[str]
The path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path.
- required_metadata_values: Optional[List[RequiredMetadataValue]]
Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
- outgoing_network_policy: Optional[CustomTaskOutgoingNetworkPolicy]
You must enable custom task network access permissions to pass any value other than None! Specifies if you custom task version is able to make network calls. None will set the value to DataRobot’s default.
- Returns
- CustomTaskVersion
created custom task version
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- classmethod create_from_previous(custom_task_id, base_environment_id, maximum_memory=None, is_major_update=True, folder_path=None, files_to_delete=None, required_metadata_values=None, outgoing_network_policy=None)¶
Create a custom task version containing files from a previous version.
New in version v2.26.
- Parameters
- custom_task_id: str
the id of the custom task
- base_environment_id: str
the id of the base environment to use with the custom task version
- maximum_memory: Optional[int]
A number in bytes about how much memory custom tasks’ inference containers can run with.
- is_major_update: bool
If the current version is 2.3, True would set the new version at 3.0. False would set the new version at 2.4. Defaults to True.
- folder_path: Optional[str]
The path to a folder containing files to be uploaded. Each file in the folder is uploaded under path relative to a folder path.
- files_to_delete: Optional[List[str]]
the list of a file items ids to be deleted Example: [“5ea95f7a4024030aba48e4f9”, “5ea6b5da402403181895cc51”]
- required_metadata_values: Optional[List[RequiredMetadataValue]]
Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
- outgoing_network_policy: Optional[CustomTaskOutgoingNetworkPolicy]
You must enable custom task network access permissions to pass any value other than None! Specifies if you custom task version is able to make network calls. None will get the value from the previous version if you have the proper permissions or use DataRobot’s default.
- Returns
- CustomTaskVersion
created custom task version
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- classmethod list(custom_task_id)¶
List custom task versions.
New in version v2.26.
- Parameters
- custom_task_id: str
the id of the custom task
- Returns
- List[CustomTaskVersion]
a list of custom task versions
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- classmethod get(custom_task_id, custom_task_version_id)¶
Get custom task version by id.
New in version v2.26.
- Parameters
- custom_task_id: str
the id of the custom task
- custom_task_version_id: str
the id of the custom task version to retrieve
- Returns
- CustomTaskVersion
retrieved custom task version
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- download(file_path)¶
Download custom task version.
New in version v2.26.
- Parameters
- file_path: str
path to create a file with custom task version content
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- update(description=None, required_metadata_values=None)¶
Update custom task version properties.
New in version v2.26.
- Parameters
- description: str
new custom task version description
- required_metadata_values: List[RequiredMetadataValue]
Additional parameters required by the execution environment. The required keys are defined by the fieldNames in the base environment’s requiredMetadataKeys.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- refresh()¶
Update custom task version with the latest data from server.
New in version v2.26.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- start_dependency_build()¶
Start the dependency build for a custom task version and return build status. .. versionadded:: v2.27
- Returns
- CustomTaskVersionDependencyBuild
DTO of custom task version dependency build.
- start_dependency_build_and_wait(max_wait)¶
Start the dependency build for a custom task version and wait while pulling status. .. versionadded:: v2.27
- Parameters
- max_wait: int
max time to wait for a build completion
- Returns
- CustomTaskVersionDependencyBuild
DTO of custom task version dependency build.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- datarobot.errors.AsyncTimeoutError
Raised if the dependency build is not finished after max_wait.
- cancel_dependency_build()¶
Cancel custom task version dependency build that is in progress. .. versionadded:: v2.27
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- get_dependency_build()¶
Retrieve information about a custom task version’s dependency build. .. versionadded:: v2.27
- Returns
- CustomTaskVersionDependencyBuild
DTO of custom task version dependency build.
- download_dependency_build_log(file_directory='.')¶
Get log of a custom task version dependency build. .. versionadded:: v2.27
- Parameters
- file_directory: str (optional, default is “.”)
Directory path where downloaded file is to save.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
Database Connectivity¶
- class datarobot.DataDriver(id=None, creator=None, base_names=None, class_name=None, canonical_name=None, database_driver=None, type=None, version=None)¶
A data driver
- Attributes
- idstr
the id of the driver.
- class_namestr
the Java class name for the driver.
- canonical_namestr
the user-friendly name of the driver.
- creatorstr
the id of the user who created the driver.
- base_nameslist of str
a list of the file name(s) of the jar files.
- classmethod list(typ=None)¶
Returns list of available drivers.
- Parameters
- typDataDriverListTypes
If specified, filters by specified driver type.
- Returns
- driverslist of DataDriver instances
contains a list of available drivers.
Examples
>>> import datarobot as dr >>> drivers = dr.DataDriver.list() >>> drivers [DataDriver('mysql'), DataDriver('RedShift'), DataDriver('PostgreSQL')]
- Return type
List
[DataDriver
]
- classmethod get(driver_id)¶
Gets the driver.
- Parameters
- driver_idstr
the identifier of the driver.
- Returns
- driverDataDriver
the required driver.
Examples
>>> import datarobot as dr >>> driver = dr.DataDriver.get('5ad08a1889453d0001ea7c5c') >>> driver DataDriver('PostgreSQL')
- Return type
- classmethod create(class_name, canonical_name, files=None, typ=None, database_driver=None)¶
Creates the driver. Only available to admin users.
- Parameters
- class_namestr
the Java class name for the driver. Specify None if typ is DataDriverTypes.DR_DATABASE_V1`.
- canonical_namestr
the user-friendly name of the driver.
- fileslist of str
a list of the file paths on file system file_path(s) for the driver.
- typ: str
Optional. Specify the type of the driver. Defaults to DataDriverTypes.JDBC, may also be DataDriverTypes.DR_DATABASE_V1.
- database_driver: str
Optional. Specify when typ is DataDriverTypes.DR_DATABASE_V1 to create a native database driver. See DrDatabaseV1Types enum for some of the types, but that list may not be exhaustive.
- Returns
- driverDataDriver
the created driver.
- Raises
- ClientError
raised if user is not granted for Can manage JDBC database drivers feature
Examples
>>> import datarobot as dr >>> driver = dr.DataDriver.create( ... class_name='org.postgresql.Driver', ... canonical_name='PostgreSQL', ... files=['/tmp/postgresql-42.2.2.jar'] ... ) >>> driver DataDriver('PostgreSQL')
- Return type
- update(class_name=None, canonical_name=None)¶
Updates the driver. Only available to admin users.
- Parameters
- class_namestr
the Java class name for the driver.
- canonical_namestr
the user-friendly name of the driver.
- Raises
- ClientError
raised if user is not granted for Can manage JDBC database drivers feature
Examples
>>> import datarobot as dr >>> driver = dr.DataDriver.get('5ad08a1889453d0001ea7c5c') >>> driver.canonical_name 'PostgreSQL' >>> driver.update(canonical_name='postgres') >>> driver.canonical_name 'postgres'
- Return type
None
- delete()¶
Removes the driver. Only available to admin users.
- Raises
- ClientError
raised if user is not granted for Can manage JDBC database drivers feature
- Return type
None
- class datarobot.Connector(id=None, creator_id=None, configuration_id=None, base_name=None, canonical_name=None, connector_type=None)¶
A connector
- Attributes
- idstr
the id of the connector.
- creator_idstr
the id of the user who created the connector.
- base_namestr
the file name of the jar file.
- canonical_namestr
the user-friendly name of the connector.
- configuration_idstr
the id of the configuration of the connector.
- classmethod list()¶
Returns list of available connectors.
- Returns
- connectorslist of Connector instances
contains a list of available connectors.
Examples
>>> import datarobot as dr >>> connectors = dr.Connector.list() >>> connectors [Connector('ADLS Gen2 Connector'), Connector('S3 Connector')]
- Return type
List
[Connector
]
- classmethod get(connector_id)¶
Gets the connector.
- Parameters
- connector_idstr
the identifier of the connector.
- Returns
- connectorConnector
the required connector.
Examples
>>> import datarobot as dr >>> connector = dr.Connector.get('5fe1063e1c075e0245071446') >>> connector Connector('ADLS Gen2 Connector')
- Return type
- classmethod create(file_path)¶
Creates the connector from a jar file. Only available to admin users.
- Parameters
- file_pathstr
the file path on file system file_path(s) for the connector.
- Returns
- connectorConnector
the created connector.
- Raises
- ClientError
raised if user is not granted for Can manage connectors feature
Examples
>>> import datarobot as dr >>> connector = dr.Connector.create('/tmp/connector-adls-gen2.jar') >>> connector Connector('ADLS Gen2 Connector')
- Return type
- update(file_path)¶
Updates the connector with new jar file. Only available to admin users.
- Parameters
- file_pathstr
the file path on file system file_path(s) for the connector.
- Returns
- connectorConnector
the updated connector.
- Raises
- ClientError
raised if user is not granted for Can manage connectors feature
Examples
>>> import datarobot as dr >>> connector = dr.Connector.get('5fe1063e1c075e0245071446') >>> connector.base_name 'connector-adls-gen2.jar' >>> connector.update('/tmp/connector-s3.jar') >>> connector.base_name 'connector-s3.jar'
- Return type
- delete()¶
Removes the connector. Only available to admin users.
- Raises
- ClientError
raised if user is not granted for Can manage connectors feature
- Return type
None
- class datarobot.DataStore(data_store_id=None, data_store_type=None, canonical_name=None, creator=None, updated=None, params=None, role=None)¶
A data store. Represents database
- Attributes
- idstr
The id of the data store.
- data_store_typestr
The type of data store.
- canonical_namestr
The user-friendly name of the data store.
- creatorstr
The id of the user who created the data store.
- updateddatetime.datetime
The time of the last update
- paramsDataStoreParameters
A list specifying data store parameters.
- rolestr
Your access role for this data store.
- classmethod list(typ=None, name=None)¶
Returns list of available data stores.
- Parameters
- typstr
If specified, filters by specified data store type. If not specified, the default is DataStoreListTypes.JDBC.
- name: str
If specified, filters by data store names that match or contain this name. The search is case-insensitive.
- Returns
- data_storeslist of DataStore instances
contains a list of available data stores.
Examples
>>> import datarobot as dr >>> data_stores = dr.DataStore.list() >>> data_stores [DataStore('Demo'), DataStore('Airlines')]
- Return type
List
[DataStore
]
- classmethod get(data_store_id)¶
Gets the data store.
- Parameters
- data_store_idstr
the identifier of the data store.
- Returns
- data_storeDataStore
the required data store.
Examples
>>> import datarobot as dr >>> data_store = dr.DataStore.get('5a8ac90b07a57a0001be501e') >>> data_store DataStore('Demo')
- Return type
- classmethod create(data_store_type, canonical_name, driver_id=None, jdbc_url=None, fields=None, connector_id=None)¶
Creates the data store.
- Parameters
- data_store_typestr or DataStoreTypes
the type of data store.
- canonical_namestr
the user-friendly name of the data store.
- driver_idstr
Optional. The identifier of the DataDriver if data_store_type is DataStoreListTypes.JDBC or DataStoreListTypes.DR_DATABASE_V1.
- jdbc_urlstr
Optional. The full JDBC URL (for example: jdbc:postgresql://my.dbaddress.org:5432/my_db).
- fields: list
Optional. If the type is dr-database-v1, then the fields specify the configuration.
- connector_id: str
Optional. The identifier of the Connector if data_store_type is DataStoreListTypes.DR_CONNECTOR_V1
- Returns
- ——-
- data_storeDataStore
the created data store.
Examples
>>> import datarobot as dr >>> data_store = dr.DataStore.create( ... data_store_type='jdbc', ... canonical_name='Demo DB', ... driver_id='5a6af02eb15372000117c040', ... jdbc_url='jdbc:postgresql://my.db.address.org:5432/perftest' ... ) >>> data_store DataStore('Demo DB')
- Return type
- update(canonical_name=None, driver_id=None, connector_id=None, jdbc_url=None, fields=None)¶
Updates the data store.
- Parameters
- canonical_namestr
optional, the user-friendly name of the data store.
- driver_idstr
Optional. The identifier of the DataDriver. if the type is one of DataStoreTypes.DR_DATABASE_V1 or DataStoreTypes.JDBC.
- connector_idstr
Optional. The identifier of the Connector. if the type is DataStoreTypes.DR_CONNECTOR_V1.
- jdbc_urlstr
Optional. The full JDBC URL (for example: jdbc:postgresql://my.dbaddress.org:5432/my_db).
- fields: list
Optional. If the type is dr-database-v1, then the fields specify the configuration.
Examples
>>> import datarobot as dr >>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae') >>> data_store DataStore('Demo DB') >>> data_store.update(canonical_name='Demo DB updated') >>> data_store DataStore('Demo DB updated')
- Return type
None
- delete()¶
Removes the DataStore
- Return type
None
- test(username=None, password=None, credential_id=None, use_kerberos=None, credential_data=None)¶
Tests database connection.
Changed in version v3.2: Added credential_id, use_kerberos and credential_data optional params and made username and password optional.
- Parameters
- usernamestr
optional, the username for database authentication.
- passwordstr
optional, the password for database authentication. The password is encrypted at server side and never saved / stored
- credential_idstr
optional, id of the set of credentials to use instead of username and password
- use_kerberosbool
optional, whether to use Kerberos for data store authentication
- credential_datadict
optional, the credentials to authenticate with the database, to use instead of user/password or credential ID
- Returns
- messagedict
message with status.
Examples
>>> import datarobot as dr >>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae') >>> data_store.test(username='db_username', password='db_password') {'message': 'Connection successful'}
- Return type
- schemas(username, password)¶
Returns list of available schemas.
- Parameters
- usernamestr
the username for database authentication.
- passwordstr
the password for database authentication. The password is encrypted at server side and never saved / stored
- Returns
- responsedict
dict with database name and list of str - available schemas
Examples
>>> import datarobot as dr >>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae') >>> data_store.schemas(username='db_username', password='db_password') {'catalog': 'perftest', 'schemas': ['demo', 'information_schema', 'public']}
- Return type
- tables(username, password, schema=None)¶
Returns list of available tables in schema.
- Parameters
- usernamestr
optional, the username for database authentication.
- passwordstr
optional, the password for database authentication. The password is encrypted at server side and never saved / stored
- schemastr
optional, the schema name.
- Returns
- responsedict
dict with catalog name and tables info
Examples
>>> import datarobot as dr >>> data_store = dr.DataStore.get('5ad5d2afef5cd700014d3cae') >>> data_store.tables(username='db_username', password='db_password', schema='demo') {'tables': [{'type': 'TABLE', 'name': 'diagnosis', 'schema': 'demo'}, {'type': 'TABLE', 'name': 'kickcars', 'schema': 'demo'}, {'type': 'TABLE', 'name': 'patient', 'schema': 'demo'}, {'type': 'TABLE', 'name': 'transcript', 'schema': 'demo'}], 'catalog': 'perftest'}
- Return type
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
- get_access_list()¶
Retrieve what users have access to this data store
New in version v2.14.
- Returns
- list ofclass:SharingAccess <datarobot.SharingAccess>
- Return type
List
[SharingAccess
]
Retrieve what users have access to this data store
New in version v3.2.
- Returns
- list ofclass:SharingRole <datarobot.models.sharing.SharingRole>
- Return type
List
[SharingRole
]
Modify the ability of users to access this data store
New in version v2.14.
- Parameters
- access_listlist of
SharingRole
the modifications to make.
- access_listlist of
- Raises
- datarobot.ClientError
if you do not have permission to share this data store, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the data store without an owner.
Examples
The
SharingRole
class is needed in order to share a Data Store with one or more users.For example, suppose you had a list of user IDs you wanted to share this DataStore with. You could use a loop to generate a list of
SharingRole
objects for them, and bulk share this Data Store.>>> import datarobot as dr >>> from datarobot.models.sharing import SharingRole >>> from datarobot.enums import SHARING_ROLE, SHARING_RECIPIENT_TYPE >>> >>> user_ids = ["60912e09fd1f04e832a575c1", "639ce542862e9b1b1bfa8f1b", "63e185e7cd3a5f8e190c6393"] >>> sharing_roles = [] >>> for user_id in user_ids: ... new_sharing_role = SharingRole( ... role=SHARING_ROLE.CONSUMER, ... share_recipient_type=SHARING_RECIPIENT_TYPE.USER, ... id=user_id, ... can_share=True, ... ) ... sharing_roles.append(new_sharing_role) >>> dr.DataStore.get('my-data-store-id').share(access_list)
Similarly, a
SharingRole
instance can be used to remove a user’s access if therole
is set toSHARING_ROLE.NO_ROLE
, like in this example:>>> import datarobot as dr >>> from datarobot.models.sharing import SharingRole >>> from datarobot.enums import SHARING_ROLE, SHARING_RECIPIENT_TYPE >>> >>> user_to_remove = "[email protected]" ... remove_sharing_role = SharingRole( ... role=SHARING_ROLE.NO_ROLE, ... share_recipient_type=SHARING_RECIPIENT_TYPE.USER, ... username=user_to_remove, ... can_share=False, ... ) >>> dr.DataStore.get('my-data-store-id').share(roles=[remove_sharing_role])
- Return type
None
- class datarobot.DataSource(data_source_id=None, data_source_type=None, canonical_name=None, creator=None, updated=None, params=None, role=None)¶
A data source. Represents data request
- Attributes
- idstr
the id of the data source.
- typestr
the type of data source.
- canonical_namestr
the user-friendly name of the data source.
- creatorstr
the id of the user who created the data source.
- updateddatetime.datetime
the time of the last update.
- paramsDataSourceParameters
a list specifying data source parameters.
- rolestr or None
if a string, represents a particular level of access and should be one of
datarobot.enums.SHARING_ROLE
. For more information on the specific access levels, see the sharing documentation. If None, can be passed to a share function to revoke access for a specific user.
- classmethod list(typ=None)¶
Returns list of available data sources.
- Parameters
- typDataStoreListTypes
If specified, filters by specified datasource type. If not specified it will default to DataStoreListTypes.DATABASES
- Returns
- data_sourceslist of DataSource instances
contains a list of available data sources.
Examples
>>> import datarobot as dr >>> data_sources = dr.DataSource.list() >>> data_sources [DataSource('Diagnostics'), DataSource('Airlines 100mb'), DataSource('Airlines 10mb')]
- Return type
List
[DataSource
]
- classmethod get(data_source_id)¶
Gets the data source.
- Parameters
- data_source_idstr
the identifier of the data source.
- Returns
- data_sourceDataSource
the requested data source.
Examples
>>> import datarobot as dr >>> data_source = dr.DataSource.get('5a8ac9ab07a57a0001be501f') >>> data_source DataSource('Diagnostics')
- Return type
TypeVar
(TDataSource
, bound=DataSource
)
- classmethod create(data_source_type, canonical_name, params)¶
Creates the data source.
- Parameters
- data_source_typestr or DataStoreTypes
the type of data source.
- canonical_namestr
the user-friendly name of the data source.
- paramsDataSourceParameters
a list specifying data source parameters.
- Returns
- data_sourceDataSource
the created data source.
Examples
>>> import datarobot as dr >>> params = dr.DataSourceParameters( ... data_store_id='5a8ac90b07a57a0001be501e', ... query='SELECT * FROM airlines10mb WHERE "Year" >= 1995;' ... ) >>> data_source = dr.DataSource.create( ... data_source_type='jdbc', ... canonical_name='airlines stats after 1995', ... params=params ... ) >>> data_source DataSource('airlines stats after 1995')
- Return type
TypeVar
(TDataSource
, bound=DataSource
)
- update(canonical_name=None, params=None)¶
Creates the data source.
- Parameters
- canonical_namestr
optional, the user-friendly name of the data source.
- paramsDataSourceParameters
optional, the identifier of the DataDriver.
Examples
>>> import datarobot as dr >>> data_source = dr.DataSource.get('5ad840cc613b480001570953') >>> data_source DataSource('airlines stats after 1995') >>> params = dr.DataSourceParameters( ... query='SELECT * FROM airlines10mb WHERE "Year" >= 1990;' ... ) >>> data_source.update( ... canonical_name='airlines stats after 1990', ... params=params ... ) >>> data_source DataSource('airlines stats after 1990')
- Return type
None
- delete()¶
Removes the DataSource
- Return type
None
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
TypeVar
(TDataSource
, bound=DataSource
)
- get_access_list()¶
Retrieve what users have access to this data source
New in version v2.14.
- Returns
- list ofclass:SharingAccess <datarobot.SharingAccess>
- Return type
List
[SharingAccess
]
Modify the ability of users to access this data source
New in version v2.14.
- Parameters
- access_list: list ofclass:SharingAccess <datarobot.SharingAccess>
The modifications to make.
- Raises
- datarobot.ClientError:
If you do not have permission to share this data source, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the data source without an owner.
Examples
Transfer access to the data source from old_user@datarobot.com to new_user@datarobot.com
from datarobot.enums import SHARING_ROLE from datarobot.models.data_source import DataSource from datarobot.models.sharing import SharingAccess new_access = SharingAccess( "[email protected]", SHARING_ROLE.OWNER, can_share=True, ) access_list = [ SharingAccess("[email protected]", SHARING_ROLE.OWNER, can_share=True), new_access, ] DataSource.get('my-data-source-id').share(access_list)
- Return type
None
- create_dataset(username=None, password=None, do_snapshot=None, persist_data_after_ingestion=None, categories=None, credential_id=None, use_kerberos=None)¶
Create a
Dataset
from this data source.New in version v2.22.
- Parameters
- username: string, optional
The username for database authentication.
- password: string, optional
The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored.
- do_snapshot: bool, optional
If unset, uses the server default: True. If true, creates a snapshot dataset; if false, creates a remote dataset. Creating snapshots from non-file sources requires an additional permission, Enable Create Snapshot Data Source.
- persist_data_after_ingestion: bool, optional
If unset, uses the server default: True. If true, will enforce saving all data (for download and sampling) and will allow a user to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.). If false, will not enforce saving data. The data schema (feature names and types) still will be available. Specifying this parameter to false and doSnapshot to true will result in an error.
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- credential_id: string, optional
The ID of the set of credentials to use instead of user and password. Note that with this change, username and password will become optional.
- use_kerberos: bool, optional
If unset, uses the server default: False. If true, use kerberos authentication for database authentication.
- Returns
- response: Dataset
The Dataset created from the uploaded data
- Return type
- class datarobot.DataSourceParameters(data_store_id=None, catalog=None, table=None, schema=None, partition_column=None, query=None, fetch_size=None, path=None)¶
Data request configuration
- Attributes
- data_store_idstr
the id of the DataStore.
- tablestr
Optional. The name of specified database table.
- schemastr
Optional. The name of the schema associated with the table.
- partition_columnstr
Optional. The name of the partition column.
- querystr
Optional. The user specified SQL query.
- fetch_sizeint
Optional. A user specified fetch size in the range [1, 20000]. By default a fetchSize will be assigned to balance throughput and memory usage
- path: str
Optional. The user-specified path for BLOB storage
Datasets¶
- class datarobot.models.Dataset(dataset_id, version_id, name, categories, created_at, is_data_engine_eligible, is_latest_version, is_snapshot, processing_state, created_by=None, data_persisted=None, size=None, row_count=None, recipe_id=None, sample_size=None)¶
Represents a Dataset returned from the api/v2/datasets/ endpoints.
- Attributes
- id: string
The ID of this dataset
- name: string
The name of this dataset in the catalog
- is_latest_version: bool
Whether this dataset version is the latest version of this dataset
- version_id: string
The object ID of the catalog_version the dataset belongs to
- categories: list(string)
An array of strings describing the intended use of the dataset. The supported options are “TRAINING” and “PREDICTION”.
- created_at: string
The date when the dataset was created
- created_by: string, optional
Username of the user who created the dataset
- is_snapshot: bool
Whether the dataset version is an immutable snapshot of data which has previously been retrieved and saved to Data_robot
- data_persisted: bool, optional
If true, user is allowed to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.) and download data. If false, download is not allowed and only the data schema (feature names and types) will be available.
- is_data_engine_eligible: bool
Whether this dataset can be a data source of a data engine query.
- processing_state: string
Current ingestion process state of the dataset
- row_count: int, optional
The number of rows in the dataset.
- size: int, optional
The size of the dataset as a CSV in bytes.
- sample_size: dict, optional
The size of data fetched during dataset registration. For example, to fetch the first 95 rows, the sample_size value is {‘type’: ‘rows’, ‘value’: 95}. Currently only ‘rows’ type is supported.
- get_uri()¶
- Returns
- urlstr
Permanent static hyperlink to this dataset in AI Catalog.
- Return type
str
- classmethod upload(source)¶
This method covers Dataset creation from local materials (file & DataFrame) and a URL.
- Parameters
- source: str, pd.DataFrame or file object
Pass a URL, filepath, file or DataFrame to create and return a Dataset.
- Returns
- response: Dataset
The Dataset created from the uploaded data source.
- Raises
- InvalidUsageError
If the source parameter cannot be determined to be a URL, filepath, file or DataFrame.
Examples
# Upload a local file dataset_one = Dataset.upload("./data/examples.csv") # Create a dataset via URL dataset_two = Dataset.upload( "https://raw.githubusercontent.com/curran/data/gh-pages/dbpedia/cities/data.csv" ) # Create dataset with a pandas Dataframe dataset_three = Dataset.upload(my_df) # Create dataset using a local file with open("./data/examples.csv", "rb") as file_pointer: dataset_four = Dataset.create_from_file(filelike=file_pointer)
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod create_from_file(cls, file_path=None, filelike=None, categories=None, read_timeout=600, max_wait=600, *, use_cases=None)¶
A blocking call that creates a new Dataset from a file. Returns when the dataset has been successfully uploaded and processed.
Warning: This function does not clean up it’s open files. If you pass a filelike, you are responsible for closing it. If you pass a file_path, this will create a file object from the file_path but will not close it.
- Parameters
- file_path: string, optional
The path to the file. This will create a file object pointing to that file but will not close it.
- filelike: file, optional
An open and readable file object.
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- read_timeout: int, optional
The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete
- max_wait: int, optional
Time in seconds after which dataset creation is considered unsuccessful
- use_cases: list[UseCase] | UseCase | list[string] | string, optional
A list of UseCase objects, UseCase object, list of Use Case ids or a single Use Case id to add this new Dataset to. Must be a kwarg.
- Returns
- response: Dataset
A fully armed and operational Dataset
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod create_from_in_memory_data(cls, data_frame=None, records=None, categories=None, read_timeout=600, max_wait=600, fname=None, *, use_cases=None)¶
A blocking call that creates a new Dataset from in-memory data. Returns when the dataset has been successfully uploaded and processed.
The data can be either a pandas DataFrame or a list of dictionaries with identical keys.
- Parameters
- data_frame: DataFrame, optional
The data frame to upload
- records: list[dict], optional
A list of dictionaries with identical keys to upload
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- read_timeout: int, optional
The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete
- max_wait: int, optional
Time in seconds after which dataset creation is considered unsuccessful
- fname: string, optional
The file name, “data.csv” by default
- use_cases: list[UseCase] | UseCase | list[string] | string, optional
A list of UseCase objects, UseCase object, list of Use Case IDs or a single Use Case ID to add this new dataset to. Must be a kwarg.
- Returns
- response: Dataset
The Dataset created from the uploaded data.
- Raises
- InvalidUsageError
If neither a DataFrame or list of records is passed.
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod create_from_url(cls, url, do_snapshot=None, persist_data_after_ingestion=None, categories=None, sample_size=None, max_wait=600, *, use_cases=None)¶
A blocking call that creates a new Dataset from data stored at a url. Returns when the dataset has been successfully uploaded and processed.
- Parameters
- url: string
The URL to use as the source of data for the dataset being created.
- do_snapshot: bool, optional
If unset, uses the server default: True. If true, creates a snapshot dataset; if false, creates a remote dataset. Creating snapshots from non-file sources may be disabled by the permission, Disable AI Catalog Snapshots.
- persist_data_after_ingestion: bool, optional
If unset, uses the server default: True. If true, will enforce saving all data (for download and sampling) and will allow a user to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.). If false, will not enforce saving data. The data schema (feature names and types) still will be available. Specifying this parameter to false and doSnapshot to true will result in an error.
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- sample_size: dict, optional
The size of data fetched during dataset registration. For example, to fetch the first 95 rows, the sample_size value would be: {‘type’: ‘rows’, ‘value’: 95}. Currently only ‘rows’ type is supported.
- max_wait: int, optional
Time in seconds after which dataset creation is considered unsuccessful.
- use_cases: list[UseCase] | UseCase | list[string] | string, optional
A list of UseCase objects, UseCase object, list of Use Case IDs or a single Use Case ID to add this new dataset to. Must be a kwarg.
- Returns
- response: Dataset
The Dataset created from the uploaded data
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod create_from_datastage(cls, datastage_id, categories=None, max_wait=600, *, use_cases=None)¶
A blocking call that creates a new Dataset from data stored as a DataStage. Returns when the dataset has been successfully uploaded and processed.
- Parameters
- datastage_id: string
The ID of the DataStage to use as the source of data for the dataset being created.
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- max_wait: int, optional
Time in seconds after which dataset creation is considered unsuccessful.
- Returns
- response: Dataset
The Dataset created from the uploaded data
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod create_from_data_source(cls, data_source_id, username=None, password=None, do_snapshot=None, persist_data_after_ingestion=None, categories=None, credential_id=None, use_kerberos=None, credential_data=None, sample_size=None, max_wait=600, *, use_cases=None)¶
A blocking call that creates a new Dataset from data stored at a DataSource. Returns when the dataset has been successfully uploaded and processed.
New in version v2.22.
- Parameters
- data_source_id: string
The ID of the DataSource to use as the source of data.
- username: string, optional
The username for database authentication.
- password: string, optional
The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored.
- do_snapshot: bool, optional
If unset, uses the server default: True. If true, creates a snapshot dataset; if false, creates a remote dataset. Creating snapshots from non-file sources requires may be disabled by the permission, Disable AI Catalog Snapshots.
- persist_data_after_ingestion: bool, optional
If unset, uses the server default: True. If true, will enforce saving all data (for download and sampling) and will allow a user to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.). If false, will not enforce saving data. The data schema (feature names and types) still will be available. Specifying this parameter to false and doSnapshot to true will result in an error.
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- credential_id: string, optional
The ID of the set of credentials to use instead of user and password. Note that with this change, username and password will become optional.
- use_kerberos: bool, optional
If unset, uses the server default: False. If true, use kerberos authentication for database authentication.
- credential_data: dict, optional
The credentials to authenticate with the database, to use instead of user/password or credential ID.
- sample_size: dict, optional
The size of data fetched during dataset registration. For example, to fetch the first 95 rows, the sample_size value would be: {‘type’: ‘rows’, ‘value’: 95}. Currently only ‘rows’ type is supported.
- max_wait: int, optional
Time in seconds after which project creation is considered unsuccessful.
- use_cases: list[UseCase] | UseCase | list[string] | string, optional
A list of UseCase objects, UseCase object, list of Use Case IDs or a single Use Case ID to add this new dataset to. Must be a kwarg.
- Returns
- response: Dataset
The Dataset created from the uploaded data
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod create_from_query_generator(cls, generator_id, dataset_id=None, dataset_version_id=None, max_wait=600, *, use_cases=None)¶
A blocking call that creates a new Dataset from the query generator. Returns when the dataset has been successfully processed. If optional parameters are not specified the query is applied to the dataset_id and dataset_version_id stored in the query generator. If specified they will override the stored dataset_id/dataset_version_id, e.g. to prep a prediction dataset.
- Parameters
- generator_id: str
The id of the query generator to use.
- dataset_id: str, optional
The id of the dataset to apply the query to.
- dataset_version_id: str, optional
The id of the dataset version to apply the query to. If not specified the latest version associated with dataset_id (if specified) is used.
- max_waitint
optional, the maximum number of seconds to wait before giving up.
- use_cases: list[UseCase] | UseCase | list[string] | string, optional
A list of UseCase objects, UseCase object, list of Use Case IDs or a single Use Case ID to add this new dataset to. Must be a kwarg.
- Returns
- response: Dataset
The Dataset created from the query generator
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod get(dataset_id)¶
Get information about a dataset.
- Parameters
- dataset_idstring
the id of the dataset
- Returns
- datasetDataset
the queried dataset
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod delete(dataset_id)¶
Soft deletes a dataset. You cannot get it or list it or do actions with it, except for un-deleting it.
- Parameters
- dataset_id: string
The id of the dataset to mark for deletion
- Returns
- None
- Return type
None
- classmethod un_delete(dataset_id)¶
Un-deletes a previously deleted dataset. If the dataset was not deleted, nothing happens.
- Parameters
- dataset_id: string
The id of the dataset to un-delete
- Returns
- None
- Return type
None
- classmethod list(category=None, filter_failed=None, order_by=None, use_cases=None)¶
List all datasets a user can view.
- Parameters
- category: string, optional
Optional. If specified, only dataset versions that have the specified category will be included in the results. Categories identify the intended use of the dataset; supported categories are “TRAINING” and “PREDICTION”.
- filter_failed: bool, optional
If unset, uses the server default: False. Whether datasets that failed during import should be excluded from the results. If True invalid datasets will be excluded.
- order_by: string, optional
If unset, uses the server default: “-created”. Sorting order which will be applied to catalog list, valid options are: - “created” – ascending order by creation datetime; - “-created” – descending order by creation datetime.
- use_cases: Union[UseCase, List[UseCase], str, List[str]], optional
Filter available datasets by a specific Use Case or Cases. Accepts either the entity or the ID. If set to [None], the method filters the project’s datasets by those not linked to a UseCase.
- Returns
- list[Dataset]
a list of datasets the user can view
- Return type
List
[TypeVar
(TDataset
, bound=Dataset
)]
- classmethod iterate(offset=None, limit=None, category=None, order_by=None, filter_failed=None, use_cases=None)¶
Get an iterator for the requested datasets a user can view. This lazily retrieves results. It does not get the next page from the server until the current page is exhausted.
- Parameters
- offset: int, optional
If set, this many results will be skipped
- limit: int, optional
Specifies the size of each page retrieved from the server. If unset, uses the server default.
- category: string, optional
Optional. If specified, only dataset versions that have the specified category will be included in the results. Categories identify the intended use of the dataset; supported categories are “TRAINING” and “PREDICTION”.
- filter_failed: bool, optional
If unset, uses the server default: False. Whether datasets that failed during import should be excluded from the results. If True invalid datasets will be excluded.
- order_by: string, optional
If unset, uses the server default: “-created”. Sorting order which will be applied to catalog list, valid options are: - “created” – ascending order by creation datetime; - “-created” – descending order by creation datetime.
- use_cases: Union[UseCase, List[UseCase], str, List[str]], optional
Filter available datasets by a specific Use Case or Cases. Accepts either the entity or the ID. If set to [None], the method filters the project’s datasets by those not linked to a UseCase.
- Yields
- Dataset
An iterator of the datasets the user can view.
- Return type
Generator
[TypeVar
(TDataset
, bound=Dataset
),None
,None
]
- update()¶
Updates the Dataset attributes in place with the latest information from the server.
- Returns
- None
- Return type
None
- modify(name=None, categories=None)¶
Modifies the Dataset name and/or categories. Updates the object in place.
- Parameters
- name: string, optional
The new name of the dataset
- categories: list[string], optional
A list of strings describing the intended use of the dataset. The supported options are “TRAINING” and “PREDICTION”. If any categories were previously specified for the dataset, they will be overwritten. If omitted or None, keep previous categories. To clear them specify []
- Returns
- None
- Return type
None
Modify the ability of users to access this dataset
- Parameters
- access_list: list ofclass:SharingAccess <datarobot.SharingAccess>
The modifications to make.
- apply_grant_to_linked_objects: bool
If true for any users being granted access to the dataset, grant the user read access to any linked objects such as DataSources and DataStores that may be used by this dataset. Ignored if no such objects are relevant for dataset, defaults to False.
- Raises
- datarobot.ClientError:
If you do not have permission to share this dataset, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the dataset without an owner.
Examples
Transfer access to the dataset from old_user@datarobot.com to new_user@datarobot.com
from datarobot.enums import SHARING_ROLE from datarobot.models.dataset import Dataset from datarobot.models.sharing import SharingAccess new_access = SharingAccess( "[email protected]", SHARING_ROLE.OWNER, can_share=True, ) access_list = [ SharingAccess( "[email protected]", SHARING_ROLE.OWNER, can_share=True, can_use_data=True, ), new_access, ] Dataset.get('my-dataset-id').share(access_list)
- Return type
None
- get_details()¶
Gets the details for this Dataset
- Returns
- DatasetDetails
- Return type
- get_all_features(order_by=None)¶
Get a list of all the features for this dataset.
- Parameters
- order_by: string, optional
If unset, uses the server default: ‘name’. How the features should be ordered. Can be ‘name’ or ‘featureType’.
- Returns
- list[DatasetFeature]
- Return type
List
[DatasetFeature
]
- iterate_all_features(offset=None, limit=None, order_by=None)¶
Get an iterator for the requested features of a dataset. This lazily retrieves results. It does not get the next page from the server until the current page is exhausted.
- Parameters
- offset: int, optional
If set, this many results will be skipped.
- limit: int, optional
Specifies the size of each page retrieved from the server. If unset, uses the server default.
- order_by: string, optional
If unset, uses the server default: ‘name’. How the features should be ordered. Can be ‘name’ or ‘featureType’.
- Yields
- DatasetFeature
- Return type
Generator
[DatasetFeature
,None
,None
]
- get_featurelists()¶
Get DatasetFeaturelists created on this Dataset
- Returns
- feature_lists: list[DatasetFeaturelist]
- Return type
List
[DatasetFeaturelist
]
- create_featurelist(name, features)¶
Create a new dataset featurelist
- Parameters
- namestr
the name of the modeling featurelist to create. Names must be unique within the dataset, or the server will return an error.
- featureslist of str
the names of the features to include in the dataset featurelist. Each feature must be a dataset feature.
- Returns
- featurelistDatasetFeaturelist
the newly created featurelist
Examples
dataset = Dataset.get('1234deadbeeffeeddead4321') dataset_features = dataset.get_all_features() selected_features = [feat.name for feat in dataset_features][:5] # select first five new_flist = dataset.create_featurelist('Simple Features', selected_features)
- Return type
- get_file(file_path=None, filelike=None)¶
Retrieves all the originally uploaded data in CSV form. Writes it to either the file or a filelike object that can write bytes.
Only one of file_path or filelike can be provided and it must be provided as a keyword argument (i.e. file_path=’path-to-write-to’). If a file-like object is provided, the user is responsible for closing it when they are done.
The user must also have permission to download data.
- Parameters
- file_path: string, optional
The destination to write the file to.
- filelike: file, optional
A file-like object to write to. The object must be able to write bytes. The user is responsible for closing the object
- Returns
- None
- Return type
None
- get_as_dataframe(low_memory=False)¶
Retrieves all the originally uploaded data in a pandas DataFrame.
New in version v3.0.
- Parameters
- low_memory: bool, optional
If True, use local files to reduce memory usage which will be slower.
- Returns
- pd.DataFrame
- Return type
DataFrame
- get_projects()¶
Retrieves the Dataset’s projects as ProjectLocation named tuples.
- Returns
- locations: list[ProjectLocation]
- Return type
List
[ProjectLocation
]
- create_project(project_name=None, user=None, password=None, credential_id=None, use_kerberos=None, credential_data=None, *, use_cases=None)¶
Create a
datarobot.models.Project
from this dataset- Parameters
- project_name: string, optional
The name of the project to be created. If not specified, will be “Untitled Project” for database connections, otherwise the project name will be based on the file used.
- user: string, optional
The username for database authentication.
- password: string, optional
The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored
- credential_id: string, optional
The ID of the set of credentials to use instead of user and password.
- use_kerberos: bool, optional
Server default is False. If true, use kerberos authentication for database authentication.
- credential_data: dict, optional
The credentials to authenticate with the database, to use instead of user/password or credential ID.
- use_cases: list[UseCase] | UseCase | list[string] | string, optional
A list of UseCase objects, UseCase object, list of Use Case ids or a single Use Case id to add this new Dataset to. Must be a kwarg.
- Returns
- Project
- Return type
- classmethod create_version_from_file(dataset_id, file_path=None, filelike=None, categories=None, read_timeout=600, max_wait=600)¶
A blocking call that creates a new Dataset version from a file. Returns when the new dataset version has been successfully uploaded and processed.
Warning: This function does not clean up it’s open files. If you pass a filelike, you are responsible for closing it. If you pass a file_path, this will create a file object from the file_path but will not close it.
New in version v2.23.
- Parameters
- dataset_id: string
The ID of the dataset for which new version to be created
- file_path: string, optional
The path to the file. This will create a file object pointing to that file but will not close it.
- filelike: file, optional
An open and readable file object.
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- read_timeout: int, optional
The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete
- max_wait: int, optional
Time in seconds after which project creation is considered unsuccessful
- Returns
- response: Dataset
A fully armed and operational Dataset version
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod create_version_from_in_memory_data(dataset_id, data_frame=None, records=None, categories=None, read_timeout=600, max_wait=600)¶
A blocking call that creates a new Dataset version for a dataset from in-memory data. Returns when the dataset has been successfully uploaded and processed.
The data can be either a pandas DataFrame or a list of dictionaries with identical keys.
New in version v2.23.
- Parameters
- dataset_id: string
The ID of the dataset for which new version to be created
- data_frame: DataFrame, optional
The data frame to upload
- records: list[dict], optional
A list of dictionaries with identical keys to upload
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- read_timeout: int, optional
The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete
- max_wait: int, optional
Time in seconds after which project creation is considered unsuccessful
- Returns
- response: Dataset
The Dataset version created from the uploaded data
- Raises
- InvalidUsageError
If neither a DataFrame or list of records is passed.
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod create_version_from_url(dataset_id, url, categories=None, max_wait=600)¶
A blocking call that creates a new Dataset from data stored at a url for a given dataset. Returns when the dataset has been successfully uploaded and processed.
New in version v2.23.
- Parameters
- dataset_id: string
The ID of the dataset for which new version to be created
- url: string
The URL to use as the source of data for the dataset being created.
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- max_wait: int, optional
Time in seconds after which project creation is considered unsuccessful
- Returns
- response: Dataset
The Dataset version created from the uploaded data
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod create_version_from_datastage(dataset_id, datastage_id, categories=None, max_wait=600)¶
A blocking call that creates a new Dataset from data stored as a DataStage for a given dataset. Returns when the dataset has been successfully uploaded and processed.
- Parameters
- dataset_id: string
The ID of the dataset for which new version to be created
- datastage_id: string
The ID of the DataStage to use as the source of data for the dataset being created.
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- max_wait: int, optional
Time in seconds after which project creation is considered unsuccessful
- Returns
- response: Dataset
The Dataset version created from the uploaded data
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod create_version_from_data_source(dataset_id, data_source_id, username=None, password=None, categories=None, credential_id=None, use_kerberos=None, credential_data=None, max_wait=600)¶
A blocking call that creates a new Dataset from data stored at a DataSource. Returns when the dataset has been successfully uploaded and processed.
New in version v2.23.
- Parameters
- dataset_id: string
The ID of the dataset for which new version to be created
- data_source_id: string
The ID of the DataSource to use as the source of data.
- username: string, optional
The username for database authentication.
- password: string, optional
The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored.
- categories: list[string], optional
An array of strings describing the intended use of the dataset. The current supported options are “TRAINING” and “PREDICTION”.
- credential_id: string, optional
The ID of the set of credentials to use instead of user and password. Note that with this change, username and password will become optional.
- use_kerberos: bool, optional
If unset, uses the server default: False. If true, use kerberos authentication for database authentication.
- credential_data: dict, optional
The credentials to authenticate with the database, to use instead of user/password or credential ID.
- max_wait: int, optional
Time in seconds after which project creation is considered unsuccessful
- Returns
- response: Dataset
The Dataset version created from the uploaded data
- Return type
TypeVar
(TDataset
, bound=Dataset
)
- classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters
- datadict
Correctly snake_cased keys and their values.
- Return type
TypeVar
(T
, bound=APIObject
)
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
TypeVar
(T
, bound=APIObject
)
- open_in_browser()¶
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type
None
- class datarobot.DatasetDetails(dataset_id, version_id, categories, created_by, created_at, data_source_type, error, is_latest_version, is_snapshot, is_data_engine_eligible, last_modification_date, last_modifier_full_name, name, uri, processing_state, data_persisted=None, data_engine_query_id=None, data_source_id=None, description=None, eda1_modification_date=None, eda1_modifier_full_name=None, feature_count=None, feature_count_by_type=None, row_count=None, size=None, tags=None, recipe_id=None, is_wrangling_eligible=None, sample_size=None)¶
Represents a detailed view of a Dataset. The to_dataset method creates a Dataset from this details view.
- Attributes
- dataset_id: string
The ID of this dataset
- name: string
The name of this dataset in the catalog
- is_latest_version: bool
Whether this dataset version is the latest version of this dataset
- version_id: string
The object ID of the catalog_version the dataset belongs to
- categories: list(string)
An array of strings describing the intended use of the dataset. The supported options are “TRAINING” and “PREDICTION”.
- created_at: string
The date when the dataset was created
- created_by: string
Username of the user who created the dataset
- is_snapshot: bool
Whether the dataset version is an immutable snapshot of data which has previously been retrieved and saved to Data_robot
- data_persisted: bool, optional
If true, user is allowed to view extended data profile (which includes data statistics like min/max/median/mean, histogram, etc.) and download data. If false, download is not allowed and only the data schema (feature names and types) will be available.
- is_data_engine_eligible: bool
Whether this dataset can be a data source of a data engine query.
- processing_state: string
Current ingestion process state of the dataset
- row_count: int, optional
The number of rows in the dataset.
- size: int, optional
The size of the dataset as a CSV in bytes.
- data_engine_query_id: string, optional
ID of the source data engine query
- data_source_id: string, optional
ID of the datasource used as the source of the dataset
- data_source_type: string
the type of the datasource that was used as the source of the dataset
- description: string, optional
the description of the dataset
- eda1_modification_date: string, optional
the ISO 8601 formatted date and time when the EDA1 for the dataset was updated
- eda1_modifier_full_name: string, optional
the user who was the last to update EDA1 for the dataset
- error: string
details of exception raised during ingestion process, if any
- feature_count: int, optional
total number of features in the dataset
- feature_count_by_type: list[FeatureTypeCount]
number of features in the dataset grouped by feature type
- last_modification_date: string
the ISO 8601 formatted date and time when the dataset was last modified
- last_modifier_full_name: string
full name of user who was the last to modify the dataset
- tags: list[string]
list of tags attached to the item
- uri: string
the uri to datasource like: - ‘file_name.csv’ - ‘jdbc:DATA_SOURCE_GIVEN_NAME/SCHEMA.TABLE_NAME’ - ‘jdbc:DATA_SOURCE_GIVEN_NAME/<query>’ - for query based datasources - ‘https://s3.amazonaws.com/my_data/my_dataset.csv’ - etc.
- sample_size: dict, optional
The size of data fetched during dataset registration. For example, to fetch the first 95 rows, the sample_size value is {‘type’: ‘rows’, ‘value’: 95}. Currently only ‘rows’ type is supported.
- classmethod get(dataset_id)¶
Get details for a Dataset from the server
- Parameters
- dataset_id: str
The id for the Dataset from which to get details
- Returns
- DatasetDetails
- Return type
TypeVar
(TDatasetDetails
, bound=DatasetDetails
)
Data Engine Query Generator¶
- class datarobot.DataEngineQueryGenerator(**generator_kwargs)¶
DataEngineQueryGenerator is used to set up time series data prep.
New in version v2.27.
- Attributes
- id: str
id of the query generator
- query: str
text of the generated Spark SQL query
- datasets: list(QueryGeneratorDataset)
datasets associated with the query generator
- generator_settings: QueryGeneratorSettings
the settings used to define the query
- generator_type: str
“TimeSeries” is the only supported type
- classmethod create(generator_type, datasets, generator_settings)¶
Creates a query generator entity.
New in version v2.27.
- Parameters
- generator_typestr
Type of data engine query generator
- datasetsList[QueryGeneratorDataset]
Source datasets in the Data Engine workspace.
- generator_settingsdict
Data engine generator settings of the given generator_type.
- Returns
- query_generatorDataEngineQueryGenerator
The created generator
Examples
import datarobot as dr from datarobot.models.data_engine_query_generator import ( QueryGeneratorDataset, QueryGeneratorSettings, ) dataset = QueryGeneratorDataset( alias='My_Awesome_Dataset_csv', dataset_id='61093144cabd630828bca321', dataset_version_id=1, ) settings = QueryGeneratorSettings( datetime_partition_column='date', time_unit='DAY', time_step=1, default_numeric_aggregation_method='sum', default_categorical_aggregation_method='mostFrequent', ) g = dr.DataEngineQueryGenerator.create( generator_type='TimeSeries', datasets=[dataset], generator_settings=settings, ) g.id >>>'54e639a18bd88f08078ca831' g.generator_type >>>'TimeSeries'
- classmethod get(generator_id)¶
Gets information about a query generator.
- Parameters
- generator_idstr
The identifier of the query generator you want to load.
- Returns
- query_generatorDataEngineQueryGenerator
The queried generator
Examples
import datarobot as dr g = dr.DataEngineQueryGenerator.get(generator_id='54e639a18bd88f08078ca831') g.id >>>'54e639a18bd88f08078ca831' g.generator_type >>>'TimeSeries'
- create_dataset(dataset_id=None, dataset_version_id=None, max_wait=600)¶
A blocking call that creates a new Dataset from the query generator. Returns when the dataset has been successfully processed. If optional parameters are not specified the query is applied to the dataset_id and dataset_version_id stored in the query generator. If specified they will override the stored dataset_id/dataset_version_id, i.e. to prep a prediction dataset.
- Parameters
- dataset_id: str, optional
The id of the unprepped dataset to apply the query to
- dataset_version_id: str, optional
The version_id of the unprepped dataset to apply the query to
- Returns
- response: Dataset
The Dataset created from the query generator
- prepare_prediction_dataset_from_catalog(project_id, dataset_id, dataset_version_id=None, max_wait=600, relax_known_in_advance_features_check=None)¶
Apply time series data prep to a catalog dataset and upload it to the project as a PredictionDataset.
New in version v3.1.
- Parameters
- project_idstr
The id of the project to which you upload the prediction dataset.
- dataset_idstr
The identifier of the dataset.
- dataset_version_idstr, optional
The version id of the dataset to use.
- max_waitint, optional
Optional, the maximum number of seconds to wait before giving up.
- relax_known_in_advance_features_checkbool, optional
For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- Returns
- datasetPredictionDataset
The newly uploaded dataset.
- Return type
- prepare_prediction_dataset(sourcedata, project_id, max_wait=600, relax_known_in_advance_features_check=None)¶
Apply time series data prep and upload the PredictionDataset to the project.
New in version v3.1.
- Parameters
- sourcedatastr, file or pandas.DataFrame
Data to be used for predictions. If it is a string, it can be either a path to a local file, or raw file content. If using a file on disk, the filename must consist of ASCII characters only.
- project_idstr
The id of the project to which you upload the prediction dataset.
- max_waitint, optional
The maximum number of seconds to wait for the uploaded dataset to be processed before raising an error.
- relax_known_in_advance_features_checkbool, optional
For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- Returns
- ——-
- datasetPredictionDataset
The newly uploaded dataset.
- Raises
- InputNotUnderstoodError
Raised if
sourcedata
isn’t one of supported types.- AsyncFailureError
Raised if polling for the status of an async process resulted in a response with an unsupported status code.
- AsyncProcessUnsuccessfulError
Raised if project creation was unsuccessful (i.e. the server reported an error in uploading the dataset).
- AsyncTimeoutError
Raised if processing the uploaded dataset took more time than specified by the
max_wait
parameter.
- Return type
Data Exports¶
- class datarobot.models.deployment.data_exports.PredictionDataExport(id, period, created_at, model_id, status, data=None, error=None, batches=None, deployment_id=None)¶
A prediction data export.
New in version v3.4.
- Attributes
- id: str
The ID of the prediction data export.
- model_id: str
The ID of the model (or null if not specified).
- created_at: datetime
Prediction data export creation timestamp.
- period: Period
A prediction data time range definition.
- status: ExportStatus
A prediction data export processing state.
- error: ExportError
Error description, appears when prediction data export job failed (status is FAILED).
- batches: ExportBatches
Metadata associated with exported batch.
- deployment_id: str
The ID of the deployment.
- classmethod list(deployment_id, status=None, model_id=None, batch=None, offset=0, limit=100)¶
Retrieve a list of prediction data exports.
- Parameters
- deployment_id: str
The ID of the deployment.
- model_id: Optional[str]
The ID of the model used for prediction data export.
- status: Optional[ExportStatus]
A prediction data export processing state.
- batch: Optional[bool]
If true, only return batch exports. If false, only return real-time exports. If not provided, return both real-time and batch exports.
- limit: Optional[int]
The maximum number of objects to return. The default is 100 (0 means no limit).
- offset: Optional[int]
The starting offset of the results. The default is 0.
- Returns
- prediction_data_exports: list
A list of PredictionDataExport objects.
Examples
from datarobot.models.deployment import PredictionDataExport prediction_data_exports = PredictionDataExport.list(deployment_id='5c939e08962d741e34f609f0')
- Return type
List
[PredictionDataExport
]
- classmethod get(deployment_id, export_id)¶
Retrieve a single prediction data export.
- Parameters
- deployment_id: str
The ID of the deployment.
- export_id: str
The ID of the prediction data export.
- Returns
- prediction_data_export: PredictionDataExport
A prediction data export.
Examples
from datarobot.models.deployment import PredictionDataExport prediction_data_export = PredictionDataExport.get( deployment_id='5c939e08962d741e34f609f0', export_id='65fbe59aaa3f847bd5acc75b' )
- Return type
- classmethod create(deployment_id, start, end, model_id=None, batch_ids=None, max_wait=600)¶
Create a deployment prediction data export. Waits until ready and fetches PredictionDataExport after the export finishes. This method is blocking.
- Parameters
- deployment_id: str
The ID of the deployment.
- start: Union[datetime, str]
Inclusive start of the time range.
- end: Union[datetime, str]
Exclusive end of the time range.
- model_id: Optional[str]
The ID of the model.
- batch_ids: Optional[List[str]]
IDs of batches to export. Null for real-time data exports.
- max_wait: int,
Seconds to wait for successful resolution.
- Returns
- prediction_data_export: PredictionDataExport
A prediction data export.
Examples
from datetime import datetime, timedelta from datarobot.models.deployment import PredictionDataExport now=datetime.now() prediction_data_export = PredictionDataExport.create( deployment_id='5c939e08962d741e34f609f0', start=now - timedelta(days=7), end=now )
- Return type
- fetch_data()¶
Return data from prediction export as datarobot Dataset.
- Returns
- prediction_datasets: List[Dataset]
List of datasets for a given export, most often it is just one.
Examples
from datarobot.models.deployment import PredictionDataExport prediction_data_export = PredictionDataExport.get( deployment_id='5c939e08962d741e34f609f0', export_id='65fbe59aaa3f847bd5acc75b' ) prediction_datasets = prediction_data_export.fetch_data()
- Return type
List
[Dataset
]
- class datarobot.models.deployment.data_exports.ActualsDataExport(id, period, created_at, model_id, status, data=None, error=None, only_matched_predictions=None, deployment_id=None)¶
An actuals data export.
New in version v3.4.
- Attributes
- id: str
The ID of the actuals data export.
- model_id: str
The ID of the model (or null if not specified).
- created_at: datetime
Actuals data export creation timestamp.
- period: Period
A actuals data time range definition.
- status: ExportStatus
A data export processing state.
- error: ExportError
Error description, appears when actuals data export job failed (status is FAILED).
- only_matched_predictions: bool
If true, exports actuals with matching predictions only.
- deployment_id: str
The ID of the deployment.
- classmethod list(deployment_id, status=None, offset=0, limit=100)¶
Retrieve a list of actuals data exports.
- Parameters
- deployment_id: str
The ID of the deployment.
- status: Optional[ExportStatus]
Actuals data export processing state.
- limit: Optional[int]
The maximum number of objects to return. The default is 100 (0 means no limit).
- offset: Optional[int]
The starting offset of the results. The default is 0.
- Returns
- actuals_data_exports: list
A list of ActualsDataExport objects.
Examples
from datarobot.models.deployment import ActualsDataExport actuals_data_exports = ActualsDataExport.list(deployment_id='5c939e08962d741e34f609f0')
- Return type
List
[ActualsDataExport
]
- classmethod get(deployment_id, export_id)¶
Retrieve a single actuals data export.
- Parameters
- deployment_id: str
The ID of the deployment.
- export_id: str
The ID of the actuals data export.
- Returns
- actuals_data_export: ActualsDataExport
An actuals data export.
Examples
from datarobot.models.deployment import ActualsDataExport actuals_data_export = ActualsDataExport.get( deployment_id='5c939e08962d741e34f609f0', export_id='65fb0a6c9bb187781cfdea36' )
- Return type
- classmethod create(deployment_id, start, end, model_id=None, only_matched_predictions=None, max_wait=600)¶
Create a deployment actuals data export. Waits until ready and fetches ActualsDataExport after the export finishes. This method is blocking.
- Parameters
- deployment_id: str
The ID of the deployment.
- start: Union[datetime, str]
Inclusive start of the time range.
- end: Union[datetime, str]
Exclusive end of the time range.
- model_id: Optional[str]
The ID of the model.
- only_matched_predictions: Optional[bool]
If true, exports actuals with matching predictions only.
- max_wait: int
Seconds to wait for successful resolution.
- Returns
- actuals_data_export: ActualsDataExport
An actuals data export.
Examples
from datetime import datetime, timedelta from datarobot.models.deployment import ActualsDataExport now=datetime.now() actuals_data_export = ActualsDataExport.create( deployment_id='5c939e08962d741e34f609f0', start=now - timedelta(days=7), end=now )
- Return type
- fetch_data()¶
Return data from actuals export as datarobot Dataset.
- Returns
- actuals_datasets: List[Dataset]
List of datasets for a given export, most often it is just one.
Examples
from datarobot.models.deployment import ActualsDataExport actuals_data_export = ActualsDataExport.get( deployment_id='5c939e08962d741e34f609f0', export_id='65fb0a6c9bb187781cfdea36' ) actuals_datasets = actuals_data_export.fetch_data()
- Return type
List
[Dataset
]
- class datarobot.models.deployment.data_exports.TrainingDataExport(id, created_at, model_id, model_package_id, data=None, deployment_id=None)¶
A training data export.
New in version v3.4.
- Attributes
- id: str
The ID of the training data export.
- model_id: str
The ID of the model (or null if not specified).
- model_package_id: str
The ID of the model package.
- created_at: datetime
Training data export creation timestamp.
- deployment_id: str
The ID of the deployment.
- classmethod list(deployment_id)¶
Retrieve a list of successful training data exports.
- Parameters
- deployment_id: str
The ID of the deployment.
- Returns
- training_data_exports: list
A list of TrainingDataExport objects.
Examples
from datarobot.models.deployment import TrainingDataExport training_data_exports = TrainingDataExport.list(deployment_id='5c939e08962d741e34f609f0')
- Return type
List
[TrainingDataExport
]
- classmethod get(deployment_id, export_id)¶
Retrieve a single training data export.
- Parameters
- deployment_id: str
The ID of the deployment.
- export_id: str
The ID of the training data export.
- Returns
- training_data_export: TrainingDataExport
A training data export.
Examples
from datarobot.models.deployment import TrainingDataExport training_data_export = TrainingDataExport.get( deployment_id='5c939e08962d741e34f609f0', export_id='65fbf2356124f1daa3acc522' )
- Return type
- classmethod create(deployment_id, model_id=None, max_wait=600)¶
Create a single training data export. Waits until ready and fetches TrainingDataExport after the export finishes. This method is blocking.
- Parameters
- deployment_id: str
The ID of the deployment.
- model_id: Optional[str]
The ID of the model.
- max_wait: int
Seconds to wait for successful resolution.
- Returns
- dataset_id: str
A created dataset with training data.
Examples
from datarobot.models.deployment import TrainingDataExport dataset_id = TrainingDataExport.create(deployment_id='5c939e08962d741e34f609f0')
- Return type
str
- fetch_data()¶
Return data from training data export as datarobot Dataset.
- Returns
- training_dataset: Dataset
A datasets for a given export.
Examples
from datarobot.models.deployment import TrainingDataExport training_data_export = TrainingDataExport.get( deployment_id='5c939e08962d741e34f609f0', export_id='65fbf2356124f1daa3acc522' ) training_data_export = training_data_export.fetch_data()
- Return type
Data Store¶
- class datarobot.models.data_store.TestResponse() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
- class datarobot.models.data_store.SchemasResponse() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
- class datarobot.models.data_store.TablesResponse() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
Datetime Trend Plots¶
- class datarobot.models.datetime_trend_plots.AccuracyOverTimePlotsMetadata(project_id, model_id, forecast_distance, resolutions, backtest_metadata, holdout_metadata, backtest_statuses, holdout_statuses)¶
Accuracy over Time metadata for datetime model.
New in version v2.25.
Notes
Backtest/holdout status is a dict containing the following:
- training: string
Status backtest/holdout training. One of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
- validation: string
Status backtest/holdout validation. One of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
Backtest/holdout metadata is a dict containing the following:
- training: dict
Start and end dates for the backtest/holdout training.
- validation: dict
Start and end dates for the backtest/holdout validation.
Each dict in the training and validation in backtest/holdout metadata is structured like:
- start_date: datetime.datetime or None
The datetime of the start of the chart data (inclusive). None if chart data is not computed.
- end_date: datetime.datetime or None
The datetime of the end of the chart data (exclusive). None if chart data is not computed.
- Attributes
- project_id: string
The project ID.
- model_id: string
The model ID.
- forecast_distance: int or None
The forecast distance for which the metadata was retrieved. None for OTV projects.
- resolutions: list of string
A list of
datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
, which represents available time resolutions for which plots can be retrieved.- backtest_metadata: list of dict
List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details.
- holdout_metadata: dict
Holdout metadata dict. See backtest/holdout metadata info in Notes for more details.
- backtest_statuses: list of dict
List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details.
- holdout_statuses: dict
Holdout status dict. See backtest/holdout status info in Notes for more details.
- class datarobot.models.datetime_trend_plots.AccuracyOverTimePlot(project_id, model_id, start_date, end_date, resolution, bins, statistics, calendar_events)¶
Accuracy over Time plot for datetime model.
New in version v2.25.
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
- actual: float or None
Average actual value of the target in the bin. None if there are no entries in the bin.
- predicted: float or None
Average prediction of the model in the bin. None if there are no entries in the bin.
- frequency: int or None
Indicates number of values averaged in bin.
Statistics is a dict containing the following:
- durbin_watson: float or None
The Durbin-Watson statistic for the chart data. Value is between 0 and 4. Durbin-Watson statistic is a test statistic used to detect the presence of autocorrelation at lag 1 in the residuals (prediction errors) from a regression analysis. More info https://wikipedia.org/wiki/Durbin%E2%80%93Watson_statistic
Calendar event is a dict containing the following:
- name: string
Name of the calendar event.
- date: datetime
Date of the calendar event.
- series_id: string or None
The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.
- Attributes
- project_id: string
The project ID.
- model_id: string
The model ID.
- resolution: string
The resolution that is used for binning. One of
datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
- start_date: datetime.datetime
The datetime of the start of the chart data (inclusive).
- end_date: datetime.datetime
The datetime of the end of the chart data (exclusive).
- bins: list of dict
List of plot bins. See bin info in Notes for more details.
- statistics: dict
Statistics for plot. See statistics info in Notes for more details.
- calendar_events: list of dict
List of calendar events for the plot. See calendar events info in Notes for more details.
- class datarobot.models.datetime_trend_plots.AccuracyOverTimePlotPreview(project_id, model_id, start_date, end_date, bins)¶
Accuracy over Time plot preview for datetime model.
New in version v2.25.
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
- actual: float or None
Average actual value of the target in the bin. None if there are no entries in the bin.
- predicted: float or None
Average prediction of the model in the bin. None if there are no entries in the bin.
- Attributes
- project_id: string
The project ID.
- model_id: string
The model ID.
- start_date: datetime.datetime
The datetime of the start of the chart data (inclusive).
- end_date: datetime.datetime
The datetime of the end of the chart data (exclusive).
- bins: list of dict
List of plot bins. See bin info in Notes for more details.
- class datarobot.models.datetime_trend_plots.ForecastVsActualPlotsMetadata(project_id, model_id, resolutions, backtest_metadata, holdout_metadata, backtest_statuses, holdout_statuses)¶
Forecast vs Actual plots metadata for datetime model.
New in version v2.25.
Notes
Backtest/holdout status is a dict containing the following:
- training: dict
Dict containing each of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
as dict key, and list of forecast distances for particular status as dict value.
- validation: dict
Dict containing each of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
as dict key, and list of forecast distances for particular status as dict value.
Backtest/holdout metadata is a dict containing the following:
- training: dict
Start and end dates for the backtest/holdout training.
- validation: dict
Start and end dates for the backtest/holdout validation.
Each dict in the training and validation in backtest/holdout metadata is structured like:
- start_date: datetime.datetime or None
The datetime of the start of the chart data (inclusive). None if chart data is not computed.
- end_date: datetime.datetime or None
The datetime of the end of the chart data (exclusive). None if chart data is not computed.
- Attributes
- project_id: string
The project ID.
- model_id: string
The model ID.
- resolutions: list of string
A list of
datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
, which represents available time resolutions for which plots can be retrieved.- backtest_metadata: list of dict
List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details.
- holdout_metadata: dict
Holdout metadata dict. See backtest/holdout metadata info in Notes for more details.
- backtest_statuses: list of dict
List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details.
- holdout_statuses: dict
Holdout status dict. See backtest/holdout status info in Notes for more details.
- class datarobot.models.datetime_trend_plots.ForecastVsActualPlot(project_id, model_id, forecast_distances, start_date, end_date, resolution, bins, calendar_events)¶
Forecast vs Actual plot for datetime model.
New in version v2.25.
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
- actual: float or None
Average actual value of the target in the bin. None if there are no entries in the bin.
- forecasts: list of float
A list of average forecasts for the model for each forecast distance. Empty if there are no forecasts in the bin. Each index in the forecasts list maps to forecastDistances list index.
- error: float or None
Average absolute residual value of the bin. None if there are no entries in the bin.
- normalized_error: float or None
Normalized average absolute residual value of the bin. None if there are no entries in the bin.
- frequency: int or None
Indicates number of values averaged in bin.
Calendar event is a dict containing the following:
- name: string
Name of the calendar event.
- date: datetime
Date of the calendar event.
- series_id: string or None
The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.
- Attributes
- project_id: string
The project ID.
- model_id: string
The model ID.
- forecast_distances: list of int
A list of forecast distances that were retrieved.
- resolution: string
The resolution that is used for binning. One of
datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
- start_date: datetime.datetime
The datetime of the start of the chart data (inclusive).
- end_date: datetime.datetime
The datetime of the end of the chart data (exclusive).
- bins: list of dict
List of plot bins. See bin info in Notes for more details.
- calendar_events: list of dict
List of calendar events for the plot. See calendar events info in Notes for more details.
- class datarobot.models.datetime_trend_plots.ForecastVsActualPlotPreview(project_id, model_id, start_date, end_date, bins)¶
Forecast vs Actual plot preview for datetime model.
New in version v2.25.
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
- actual: float or None
Average actual value of the target in the bin. None if there are no entries in the bin.
- predicted: float or None
Average prediction of the model in the bin. None if there are no entries in the bin.
- Attributes
- project_id: string
The project ID.
- model_id: string
The model ID.
- start_date: datetime.datetime
The datetime of the start of the chart data (inclusive).
- end_date: datetime.datetime
The datetime of the end of the chart data (exclusive).
- bins: list of dict
List of plot bins. See bin info in Notes for more details.
- class datarobot.models.datetime_trend_plots.AnomalyOverTimePlotsMetadata(project_id, model_id, resolutions, backtest_metadata, holdout_metadata, backtest_statuses, holdout_statuses)¶
Anomaly over Time metadata for datetime model.
New in version v2.25.
Notes
Backtest/holdout status is a dict containing the following:
- training: string
Status backtest/holdout training. One of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
- validation: string
Status backtest/holdout validation. One of
datarobot.enums.DATETIME_TREND_PLOTS_STATUS
Backtest/holdout metadata is a dict containing the following:
- training: dict
Start and end dates for the backtest/holdout training.
- validation: dict
Start and end dates for the backtest/holdout validation.
Each dict in the training and validation in backtest/holdout metadata is structured like:
- start_date: datetime.datetime or None
The datetime of the start of the chart data (inclusive). None if chart data is not computed.
- end_date: datetime.datetime or None
The datetime of the end of the chart data (exclusive). None if chart data is not computed.
- Attributes
- project_id: string
The project ID.
- model_id: string
The model ID.
- resolutions: list of string
A list of
datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
, which represents available time resolutions for which plots can be retrieved.- backtest_metadata: list of dict
List of backtest metadata dicts. The list index of metadata dict is the backtest index. See backtest/holdout metadata info in Notes for more details.
- holdout_metadata: dict
Holdout metadata dict. See backtest/holdout metadata info in Notes for more details.
- backtest_statuses: list of dict
List of backtest statuses dict. The list index of status dict is the backtest index. See backtest/holdout status info in Notes for more details.
- holdout_statuses: dict
Holdout status dict. See backtest/holdout status info in Notes for more details.
- class datarobot.models.datetime_trend_plots.AnomalyOverTimePlot(project_id, model_id, start_date, end_date, resolution, bins, calendar_events)¶
Anomaly over Time plot for datetime model.
New in version v2.25.
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
- predicted: float or None
Average prediction of the model in the bin. None if there are no entries in the bin.
- frequency: int or None
Indicates number of values averaged in bin.
Calendar event is a dict containing the following:
- name: string
Name of the calendar event.
- date: datetime
Date of the calendar event.
- series_id: string or None
The series ID for the event. If this event does not specify a series ID, then this will be None, indicating that the event applies to all series.
- Attributes
- project_id: string
The project ID.
- model_id: string
The model ID.
- resolution: string
The resolution that is used for binning. One of
datarobot.enums.DATETIME_TREND_PLOTS_RESOLUTION
- start_date: datetime.datetime
The datetime of the start of the chart data (inclusive).
- end_date: datetime.datetime
The datetime of the end of the chart data (exclusive).
- bins: list of dict
List of plot bins. See bin info in Notes for more details.
- calendar_events: list of dict
List of calendar events for the plot. See calendar events info in Notes for more details.
- class datarobot.models.datetime_trend_plots.AnomalyOverTimePlotPreview(project_id, model_id, prediction_threshold, start_date, end_date, bins)¶
Anomaly over Time plot preview for datetime model.
New in version v2.25.
Notes
Bin is a dict containing the following:
- start_date: datetime.datetime
The datetime of the start of the bin (inclusive).
- end_date: datetime.datetime
The datetime of the end of the bin (exclusive).
- Attributes
- project_id: string
The project ID.
- model_id: string
The model ID.
- prediction_threshold: float
Only bins with predictions exceeding this threshold are returned in the response.
- start_date: datetime.datetime
The datetime of the start of the chart data (inclusive).
- end_date: datetime.datetime
The datetime of the end of the chart data (exclusive).
- bins: list of dict
List of plot bins. See bin info in Notes for more details.
Deployment¶
- class datarobot.models.Deployment(id, label=None, description=None, status=None, default_prediction_server=None, model=None, model_package=None, capabilities=None, prediction_usage=None, permissions=None, service_health=None, model_health=None, accuracy_health=None, importance=None, fairness_health=None, governance=None, owners=None, prediction_environment=None)¶
A deployment created from a DataRobot model.
- Attributes
- idstr
the id of the deployment
- labelstr
the label of the deployment
- descriptionstr
the description of the deployment
- statusstr
(New in version v2.29) deployment status
- default_prediction_serverdict
Information about the default prediction server for the deployment. Accepts the following values:
id: str. Prediction server ID.
url: str, optional. Prediction server URL.
datarobot-key: str. Corresponds the to the
PredictionServer
’s “snake_cased”datarobot_key
parameter that allows you to verify and access the prediction server.
- importancestr, optional
deployment importance
- modeldict
information on the model of the deployment
- model_packagedict
(New in version v3.4) information on the model package of the deployment
- capabilitiesdict
information on the capabilities of the deployment
- prediction_usagedict
information on the prediction usage of the deployment
- permissionslist
(New in version v2.18) user’s permissions on the deployment
- service_healthdict
information on the service health of the deployment
- model_healthdict
information on the model health of the deployment
- accuracy_healthdict
information on the accuracy health of the deployment
- fairness_healthdict
information on the fairness health of a deployment
- governancedict
information on approval and change requests of a deployment
- ownersdict
information on the owners of a deployment
- prediction_environmentdict
information on the prediction environment of a deployment
- classmethod create_from_learning_model(model_id, label, description=None, default_prediction_server_id=None, importance=None, prediction_threshold=None, status=None, max_wait=600)¶
Create a deployment from a DataRobot model.
New in version v2.17.
- Parameters
- model_idstr
id of the DataRobot model to deploy
- labelstr
a human-readable label of the deployment
- descriptionstr, optional
a human-readable description of the deployment
- default_prediction_server_idstr, optional
an identifier of a prediction server to be used as the default prediction server
- importancestr, optional
deployment importance
- prediction_thresholdfloat, optional
threshold used for binary classification in predictions
- statusstr, optional
deployment status
- max_wait: int, optional
Seconds to wait for successful resolution of a deployment creation job. Deployment supports making predictions only after a deployment creating job has successfully finished.
- Returns
- deploymentDeployment
The created deployment
Examples
from datarobot import Project, Deployment project = Project.get('5506fcd38bd88f5953219da0') model = project.get_models()[0] deployment = Deployment.create_from_learning_model(model.id, 'New Deployment') deployment >>> Deployment('New Deployment')
- Return type
TypeVar
(TDeployment
, bound=Deployment
)
- classmethod create_from_leaderboard(model_id, label, description=None, default_prediction_server_id=None, importance=None, prediction_threshold=None, status=None, max_wait=600)¶
Create a deployment from a Leaderboard.
New in version v2.17.
- Parameters
- model_idstr
id of the Leaderboard to deploy
- labelstr
a human-readable label of the deployment
- descriptionstr, optional
a human-readable description of the deployment
- default_prediction_server_idstr, optional
an identifier of a prediction server to be used as the default prediction server
- importancestr, optional
deployment importance
- prediction_thresholdfloat, optional
threshold used for binary classification in predictions
- statusstr, optional
deployment status
- max_waitint, optional
The amount of seconds to wait for successful resolution of a deployment creation job. Deployment supports making predictions only after a deployment creating job has successfully finished.
- Returns
- deploymentDeployment
The created deployment
Examples
from datarobot import Project, Deployment project = Project.get('5506fcd38bd88f5953219da0') model = project.get_models()[0] deployment = Deployment.create_from_leaderboard(model.id, 'New Deployment') deployment >>> Deployment('New Deployment')
- Return type
TypeVar
(TDeployment
, bound=Deployment
)
- classmethod create_from_custom_model_version(custom_model_version_id, label, description=None, default_prediction_server_id=None, max_wait=600, importance=None)¶
Create a deployment from a DataRobot custom model image.
- Parameters
- custom_model_version_idstr
The ID of the DataRobot custom model version to deploy. The version must have a base_environment_id.
- labelstr
A label of the deployment.
- descriptionstr, optional
A description of the deployment.
- default_prediction_server_idstr
An identifier of a prediction server to be used as the default prediction server. Required for SaaS users and optional for Self-Managed users.
- max_waitint, optional
Seconds to wait for successful resolution of a deployment creation job. Deployment supports making predictions only after a deployment creating job has successfully finished.
- importancestr, optional
Deployment importance level.
- Returns
- deploymentDeployment
The created deployment
- Return type
TypeVar
(TDeployment
, bound=Deployment
)
- classmethod create_from_registered_model_version(model_package_id, label, description=None, default_prediction_server_id=None, prediction_environment_id=None, importance=None, user_provided_id=None, additional_metadata=None, max_wait=600)¶
Create a deployment from a DataRobot model package (version).
- Parameters
- model_package_idstr
The ID of the DataRobot model package (version) to deploy.
- labelstr
A human readable label of the deployment.
- descriptionstr, optional
A human readable description of the deployment.
- default_prediction_server_idstr, optional
an identifier of a prediction server to be used as the default prediction server When working with prediction environments, default prediction server Id should not be provided
- prediction_environment_idstr, optional
An identifier of a prediction environment to be used for model deployment.
- importancestr, optional
Deployment importance level.
- user_provided_idstr, optional
A user-provided unique ID associated with a deployment definition in a remote git repository.
- additional_metadatadict, optional
‘Key/Value pair dict, with additional metadata’
- max_waitint, optional
The amount of seconds to wait for successful resolution of a deployment creation job. Deployment supports making predictions only after a deployment creating job has successfully finished.
- Returns
- deploymentDeployment
The created deployment
- Return type
TypeVar
(TDeployment
, bound=Deployment
)
- classmethod list(order_by=None, search=None, filters=None)¶
List all deployments a user can view.
New in version v2.17.
- Parameters
- order_bystr, optional
(New in version v2.18) the order to sort the deployment list by, defaults to label
Allowed attributes to sort by are:
label
serviceHealth
modelHealth
accuracyHealth
recentPredictions
lastPredictionTimestamp
If the sort attribute is preceded by a hyphen, deployments will be sorted in descending order, otherwise in ascending order.
For health related sorting, ascending means failing, warning, passing, unknown.
- searchstr, optional
(New in version v2.18) case insensitive search against deployment’s label and description.
- filtersdatarobot.models.deployment.DeploymentListFilters, optional
(New in version v2.20) an object containing all filters that you’d like to apply to the resulting list of deployments. See
DeploymentListFilters
for details on usage.
- Returns
- deploymentslist
a list of deployments the user can view
Examples
from datarobot import Deployment deployments = Deployment.list() deployments >>> [Deployment('New Deployment'), Deployment('Previous Deployment')]
from datarobot import Deployment from datarobot.enums import DEPLOYMENT_SERVICE_HEALTH_STATUS filters = DeploymentListFilters( role='OWNER', service_health=[DEPLOYMENT_SERVICE_HEALTH.FAILING] ) filtered_deployments = Deployment.list(filters=filters) filtered_deployments >>> [Deployment('Deployment I Own w/ Failing Service Health')]
- Return type
List
[TypeVar
(TDeployment
, bound=Deployment
)]
- classmethod get(deployment_id)¶
Get information about a deployment.
New in version v2.17.
- Parameters
- deployment_idstr
the id of the deployment
- Returns
- deploymentDeployment
the queried deployment
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') deployment.id >>>'5c939e08962d741e34f609f0' deployment.label >>>'New Deployment'
- Return type
TypeVar
(TDeployment
, bound=Deployment
)
- predict_batch(source, passthrough_columns=None, download_timeout=None, download_read_timeout=None, upload_read_timeout=None)¶
A convenience method for making predictions with csv file or pandas DataFrame using a batch prediction job.
For advanced usage, use
datarobot.models.BatchPredictionJob
directly.New in version v3.0.
- Parameters
- source: str, pd.DataFrame or file object
Pass a filepath, file, or DataFrame for making batch predictions.
- passthrough_columnslist[string] (optional)
Keep these columns from the scoring dataset in the scored dataset. This is useful for correlating predictions with source data.
- download_timeout: int, optional
Wait this many seconds for the download to become available. See
datarobot.models.BatchPredictionJob.score()
.- download_read_timeout: int, optional
Wait this many seconds for the server to respond between chunks. See
datarobot.models.BatchPredictionJob.score()
.- upload_read_timeout: int, optional
Wait this many seconds for the server to respond after a whole dataset upload. See
datarobot.models.BatchPredictionJob.score()
.
- Returns
- pd.DataFrame
Prediction results in a pandas DataFrame.
- Raises
- InvalidUsageError
If the source parameter cannot be determined to be a filepath, file, or DataFrame.
Examples
from datarobot.models.deployment import Deployment deployment = Deployment.get("<MY_DEPLOYMENT_ID>") prediction_results_as_dataframe = deployment.predict_batch( source="./my_local_file.csv", )
- Return type
DataFrame
- get_uri()¶
- Returns
- urlstr
Deployment’s overview URI
- Return type
str
- update(label=None, description=None, importance=None)¶
Update the label and description of this deployment.
New in version v2.19.
- Return type
None
- delete()¶
Delete this deployment.
New in version v2.17.
- Return type
None
- activate(max_wait=600)¶
Activates this deployment. When succeeded, deployment status become active.
New in version v2.29.
- Parameters
- max_waitint, optional
The maximum time to wait for deployment activation to complete before erroring
- Return type
None
- deactivate(max_wait=600)¶
Deactivates this deployment. When succeeded, deployment status become inactive.
New in version v2.29.
- Parameters
- max_waitint, optional
The maximum time to wait for deployment deactivation to complete before erroring
- Return type
None
- replace_model(new_model_id, reason, max_wait=600, new_registered_model_version_id=None)¶
- Replace the model used in this deployment. To confirm model replacement eligibility, use
validate_replacement_model()
beforehand.
New in version v2.17.
Model replacement is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.
Predictions made against this deployment will start using the new model as soon as the request is completed. There will be no interruption for predictions throughout the process.
- Parameters
- new_model_idOptional[str]
The id of the new model to use. If replacing the deployment’s model with a CustomInferenceModel, a specific CustomModelVersion ID must be used. If None, new_registered_model_version_id must be specified.
- reasonMODEL_REPLACEMENT_REASON
The reason for the model replacement. Must be one of ‘ACCURACY’, ‘DATA_DRIFT’, ‘ERRORS’, ‘SCHEDULED_REFRESH’, ‘SCORING_SPEED’, or ‘OTHER’. This value will be stored in the model history to keep track of why a model was replaced
- max_waitint, optional
(new in version 2.22) The maximum time to wait for model replacement job to complete before erroring
- new_registered_model_version_idOptional[str]
(new in version 3.4) The registered model version (model package) ID of the new model to use. Must be passed if new_model_id is None.
Examples
from datarobot import Deployment from datarobot.enums import MODEL_REPLACEMENT_REASON deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') deployment.model['id'], deployment.model['type'] >>>('5c0a979859b00004ba52e431', 'Decision Tree Classifier (Gini)') deployment.replace_model('5c0a969859b00004ba52e41b', MODEL_REPLACEMENT_REASON.ACCURACY) deployment.model['id'], deployment.model['type'] >>>('5c0a969859b00004ba52e41b', 'Support Vector Classifier (Linear Kernel)')
- Return type
None
- perform_model_replace(new_registered_model_version_id, reason, max_wait=600)¶
- Replace the model used in this deployment. To confirm model replacement eligibility, use
validate_replacement_model()
beforehand.
New in version v3.4.
Model replacement is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.
Predictions made against this deployment will start using the new model as soon as the request is completed. There will be no interruption for predictions throughout the process.
- Parameters
- new_registered_model_version_idstr
The registered model version (model package) ID of the new model to use.
- reasonMODEL_REPLACEMENT_REASON
The reason for the model replacement. Must be one of ‘ACCURACY’, ‘DATA_DRIFT’, ‘ERRORS’, ‘SCHEDULED_REFRESH’, ‘SCORING_SPEED’, or ‘OTHER’. This value will be stored in the model history to keep track of why a model was replaced
- max_waitint, optional
The maximum time to wait for model replacement job to complete before erroring
Examples
from datarobot import Deployment from datarobot.enums import MODEL_REPLACEMENT_REASON deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') deployment.model_package['id'] >>>'5c0a979859b00004ba52e431' deployment.perform_model_replace('5c0a969859b00004ba52e41b', MODEL_REPLACEMENT_REASON.ACCURACY) deployment.model_package['id'] >>>'5c0a969859b00004ba52e41b'
- Return type
None
- validate_replacement_model(new_model_id=None, new_registered_model_version_id=None)¶
Validate a model can be used as the replacement model of the deployment.
New in version v2.17.
- Parameters
- new_model_idOptional[str]
the id of the new model to validate
- new_registered_model_version_idOptional[str]
(new in version 3.4) The registered model version (model package) ID of the new model to use.
- Returns
- statusstr
status of the validation, will be one of ‘passing’, ‘warning’ or ‘failing’. If the status is passing or warning, use
replace_model()
to perform a model replacement. If the status is failing, refer tochecks
for more detail on why the new model cannot be used as a replacement.- messagestr
message for the validation result
- checksdict
explain why the new model can or cannot replace the deployment’s current model
- Return type
Tuple
[str
,str
,Dict
[str
,Any
]]
- get_features()¶
Retrieve the list of features needed to make predictions on this deployment.
- Returns
- features: list
a list of feature dict
Notes
Each feature dict contains the following structure:
name
: str, feature namefeature_type
: str, feature typeimportance
: float, numeric measure of the relationship strength between the feature and target (independent of model or other features)date_format
: str or None, the date format string for how this feature was interpreted, null if not a date feature, compatible with https://docs.python.org/2/library/time.html#time.strftime.known_in_advance
: bool, whether the feature was selected as known in advance in a time series model, false for non-time series models.
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') features = deployment.get_features() features[0]['feature_type'] >>>'Categorical' features[0]['importance'] >>>0.133
- Return type
List
[FeatureDict
]
- submit_actuals(data, batch_size=10000)¶
Submit actuals for processing. The actuals submitted will be used to calculate accuracy metrics.
- Parameters
- data: list or pandas.DataFrame
- batch_size: the max number of actuals in each request
- If `data` is a list, each item should be a dict-like object with the following keys and
- values; if `data` is a pandas.DataFrame, it should contain the following columns:
- - association_id: str, a unique identifier used with a prediction,
max length 128 characters
- - actual_value: str or int or float, the actual value of a prediction;
should be numeric for deployments with regression models or string for deployments with classification model
- - was_acted_on: bool, optional, indicates if the prediction was acted on in a way that
could have affected the actual outcome
- - timestamp: datetime or string in RFC3339 format, optional. If the datetime provided
does not have a timezone, we assume it is UTC.
- Raises
- ValueError
if input data is not a list of dict-like objects or a pandas.DataFrame if input data is empty
Examples
from datarobot import Deployment, AccuracyOverTime deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') data = [{ 'association_id': '439917', 'actual_value': 'True', 'was_acted_on': True }] deployment.submit_actuals(data)
- Return type
None
- submit_actuals_from_catalog_async(dataset_id, actual_value_column, association_id_column, dataset_version_id=None, timestamp_column=None, was_acted_on_column=None)¶
Submit actuals from AI Catalog for processing. The actuals submitted will be used to calculate accuracy metrics.
- Parameters
- dataset_id: str,
The ID of the source dataset.
- dataset_version_id: str, optional
The ID of the dataset version to apply the query to. If not specified, the latest version associated with dataset_id is used.
- association_id_column: str,
The name of the column that contains a unique identifier used with a prediction.
- actual_value_column: str,
The name of the column that contains the actual value of a prediction.
- was_acted_on_column: str, optional,
The name of the column that indicates if the prediction was acted on in a way that could have affected the actual outcome.
- timestamp_column: str, optional,
The name of the column that contains datetime or string in RFC3339 format.
- Returns
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Raises
- ValueError
if dataset_id not provided if actual_value_column not provided if association_id_column not provided
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') status_check_job = deployment.submit_actuals_from_catalog_async(data)
- Return type
- get_predictions_by_forecast_date_settings()¶
Retrieve predictions by forecast date settings of this deployment.
New in version v2.27.
- Returns
- settingsdict
Predictions by forecast date settings of the deployment is a dict with the following format:
- enabledbool
Is ‘’True’’ if predictions by forecast date is enabled for this deployment. To update this setting, see
update_predictions_by_forecast_date_settings()
- column_namestring
The column name in prediction datasets to be used as forecast date.
- datetime_formatstring
The datetime format of the forecast date column in prediction datasets.
- Return type
- update_predictions_by_forecast_date_settings(enable_predictions_by_forecast_date, forecast_date_column_name=None, forecast_date_format=None, max_wait=600)¶
Update predictions by forecast date settings of this deployment.
New in version v2.27.
Updating predictions by forecast date setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.
- Parameters
- enable_predictions_by_forecast_datebool
set to ‘’True’’ if predictions by forecast date is to be turned on or set to ‘’False’’ if predictions by forecast date is to be turned off.
- forecast_date_column_name: string, optional
The column name in prediction datasets to be used as forecast date. If ‘’enable_predictions_by_forecast_date’’ is set to ‘’False’’, then the parameter will be ignored.
- forecast_date_format: string, optional
The datetime format of the forecast date column in prediction datasets. If ‘’enable_predictions_by_forecast_date’’ is set to ‘’False’’, then the parameter will be ignored.
- max_waitint, optional
seconds to wait for successful
Examples
# To set predictions by forecast date settings to the same default settings you see when using # the DataRobot web application, you use your 'Deployment' object like this: deployment.update_predictions_by_forecast_date_settings( enable_predictions_by_forecast_date=True, forecast_date_column_name="date (actual)", forecast_date_format="%Y-%m-%d", )
- Return type
None
- get_challenger_models_settings()¶
Retrieve challenger models settings of this deployment.
New in version v2.27.
- Returns
- settingsdict
Challenger models settings of the deployment is a dict with the following format:
- enabledbool
Is ‘’True’’ if challenger models is enabled for this deployment. To update existing ‘’challenger_models’’ settings, see
update_challenger_models_settings()
- Return type
- update_challenger_models_settings(challenger_models_enabled, max_wait=600)¶
Update challenger models settings of this deployment.
New in version v2.27.
Updating challenger models setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.
- Parameters
- challenger_models_enabledbool
set to ‘’True’’ if challenger models is to be turned on or set to ‘’False’’ if challenger models is to be turned off
- max_waitint, optional
seconds to wait for successful resolution
- Return type
None
- get_segment_analysis_settings()¶
Retrieve segment analysis settings of this deployment.
New in version v2.27.
- Returns
- settingsdict
Segment analysis settings of the deployment containing two items with keys
enabled
andattributes
, which are further described below.- enabledbool
Set to ‘’True’’ if segment analysis is enabled for this deployment. To update existing setting, see
update_segment_analysis_settings()
- attributeslist
To create or update existing segment analysis attributes, see
update_segment_analysis_settings()
- Return type
- update_segment_analysis_settings(segment_analysis_enabled, segment_analysis_attributes=None, max_wait=600)¶
Update segment analysis settings of this deployment.
New in version v2.27.
Updating segment analysis setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.
- Parameters
- segment_analysis_enabledbool
set to ‘’True’’ if segment analysis is to be turned on or set to ‘’False’’ if segment analysis is to be turned off
- segment_analysis_attributes: list, optional
A list of strings that gives the segment attributes selected for tracking.
- max_waitint, optional
seconds to wait for successful resolution
- Return type
None
- get_bias_and_fairness_settings()¶
Retrieve bias and fairness settings of this deployment.
..versionadded:: v3.2.0
- Returns
- settingsdict in the following format:
- protected_featuresList[str]
A list of features to mark as protected.
- preferable_target_valuebool
A target value that should be treated as a positive outcome for the prediction.
- fairness_metric_setstr
Can be one of <datarobot.enums.FairnessMetricsSet>. A set of fairness metrics to use for calculating fairness.
- fairness_thresholdfloat
Threshold value of the fairness metric. Cannot be less than 0 or greater than 1.
- Return type
Optional
[BiasAndFairnessSettings
]
- update_bias_and_fairness_settings(protected_features, fairness_metric_set, fairness_threshold, preferable_target_value, max_wait=600)¶
Update bias and fairness settings of this deployment.
..versionadded:: v3.2.0
Updating bias and fairness setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.
- Parameters
- protected_featuresList[str]
A list of features to mark as protected.
- preferable_target_valuebool
A target value that should be treated as a positive outcome for the prediction.
- fairness_metric_setstr
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- fairness_thresholdfloat
Threshold value of the fairness metric. Cannot be less than 0 or greater than 1.
- max_waitint, optional
seconds to wait for successful resolution
- Return type
None
- get_challenger_replay_settings()¶
Retrieve challenger replay settings of this deployment.
New in version v3.4.
- Returns
- settingsdict in the following format:
- enabledbool
If challenger replay is enabled. To update existing ‘’challenger_replay’’ settings, see
update_challenger_replay_settings()
- scheduleSchedule
The recurring schedule for the challenger replay job.
- Return type
- update_challenger_replay_settings(enabled, schedule=None)¶
Update challenger replay settings of this deployment.
New in version v3.4.
- Parameters
- enabledbool
If challenger replay is enabled.
- scheduleOptional[Schedule]
The recurring schedule for the challenger replay job.
- Return type
None
- get_drift_tracking_settings()¶
Retrieve drift tracking settings of this deployment.
New in version v2.17.
- Returns
- settingsdict
Drift tracking settings of the deployment containing two nested dicts with key
target_drift
andfeature_drift
, which are further described below.Target drift
setting contains:- enabledbool
If target drift tracking is enabled for this deployment. To create or update existing ‘’target_drift’’ settings, see
update_drift_tracking_settings()
Feature drift
setting contains:- enabledbool
If feature drift tracking is enabled for this deployment. To create or update existing ‘’feature_drift’’ settings, see
update_drift_tracking_settings()
- Return type
- update_drift_tracking_settings(target_drift_enabled=None, feature_drift_enabled=None, max_wait=600)¶
Update drift tracking settings of this deployment.
New in version v2.17.
Updating drift tracking setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.
- Parameters
- target_drift_enabledbool, optional
if target drift tracking is to be turned on
- feature_drift_enabledbool, optional
if feature drift tracking is to be turned on
- max_waitint, optional
seconds to wait for successful resolution
- Return type
None
- get_association_id_settings()¶
Retrieve association ID setting for this deployment.
New in version v2.19.
- Returns
- association_id_settingsdict in the following format:
- column_nameslist[string], optional
name of the columns to be used as association ID,
- required_in_prediction_requestsbool, optional
whether the association ID column is required in prediction requests
- Return type
str
- update_association_id_settings(column_names=None, required_in_prediction_requests=None, max_wait=600)¶
Update association ID setting for this deployment.
New in version v2.19.
- Parameters
- column_nameslist[string], optional
name of the columns to be used as association ID, currently only support a list of one string
- required_in_prediction_requestsbool, optional
whether the association ID column is required in prediction requests
- max_waitint, optional
seconds to wait for successful resolution
- Return type
None
- get_predictions_data_collection_settings()¶
Retrieve predictions data collection settings of this deployment.
New in version v2.21.
- Returns
- predictions_data_collection_settingsdict in the following format:
- enabledbool
If predictions data collection is enabled for this deployment. To update existing ‘’predictions_data_collection’’ settings, see
update_predictions_data_collection_settings()
- Return type
Dict
[str
,bool
]
- update_predictions_data_collection_settings(enabled, max_wait=600)¶
Update predictions data collection settings of this deployment.
New in version v2.21.
Updating predictions data collection setting is an asynchronous process, which means some preparatory work may be performed after the initial request is completed. This function will not return until all preparatory work is fully finished.
- Parameters
- enabled: bool
if predictions data collection is to be turned on
- max_waitint, optional
seconds to wait for successful resolution
- Return type
None
- get_prediction_warning_settings()¶
Retrieve prediction warning settings of this deployment.
New in version v2.19.
- Returns
- settingsdict in the following format:
- enabledbool
If target prediction_warning is enabled for this deployment. To create or update existing ‘’prediction_warning’’ settings, see
update_prediction_warning_settings()
- custom_boundariesdict or None
- If None default boundaries for a model are used. Otherwise has following keys:
- upperfloat
All predictions greater than provided value are considered anomalous
- lowerfloat
All predictions less than provided value are considered anomalous
- Return type
- update_prediction_warning_settings(prediction_warning_enabled, use_default_boundaries=None, lower_boundary=None, upper_boundary=None, max_wait=600)¶
Update prediction warning settings of this deployment.
New in version v2.19.
- Parameters
- prediction_warning_enabledbool
If prediction warnings should be turned on.
- use_default_boundariesbool, optional
If default boundaries of the model should be used for the deployment.
- upper_boundaryfloat, optional
All predictions greater than provided value will be considered anomalous
- lower_boundaryfloat, optional
All predictions less than provided value will be considered anomalous
- max_waitint, optional
seconds to wait for successful resolution
- Return type
None
- get_prediction_intervals_settings()¶
Retrieve prediction intervals settings for this deployment.
New in version v2.19.
- Returns
- dict in the following format:
- enabledbool
Whether prediction intervals are enabled for this deployment
- percentileslist[int]
List of enabled prediction intervals’ sizes for this deployment. Currently we only support one percentile at a time.
Notes
Note that prediction intervals are only supported for time series deployments.
- Return type
- update_prediction_intervals_settings(percentiles, enabled=True, max_wait=600)¶
Update prediction intervals settings for this deployment.
New in version v2.19.
- Parameters
- percentileslist[int]
The prediction intervals percentiles to enable for this deployment. Currently we only support setting one percentile at a time.
- enabledbool, optional (defaults to True)
Whether to enable showing prediction intervals in the results of predictions requested using this deployment.
- max_waitint, optional
seconds to wait for successful resolution
- Raises
- AssertionError
If
percentiles
is in an invalid format- AsyncFailureError
If any of the responses from the server are unexpected
- AsyncProcessUnsuccessfulError
If the prediction intervals calculation job has failed or has been cancelled.
- AsyncTimeoutError
If the prediction intervals calculation job did not resolve in time
Notes
Updating prediction intervals settings is an asynchronous process, which means some preparatory work may be performed before the settings request is completed. This function will not return until all work is fully finished.
Note that prediction intervals are only supported for time series deployments.
- Return type
None
- get_health_settings()¶
Retrieve health settings of this deployment.
New in version v3.4.
- Returns
- settingsdict in the following format:
- servicedict
Service health settings.
- data_driftdict
Data drift health settings.
- accuracydict
Accuracy health settings.
- fairnessdict
Fairness health settings.
- custom_metricsdict
Custom metrics health settings.
- predictions_timelinessdict
Predictions timeliness health settings.
- actuals_timelinessdict
Actuals timeliness health settings.
- Return type
- update_health_settings(service=None, data_drift=None, accuracy=None, fairness=None, custom_metrics=None, predictions_timeliness=None, actuals_timeliness=None)¶
Update health settings of this deployment.
New in version v3.4.
- Parameters
- servicedict
Service health settings.
- data_driftdict
Data drift health settings.
- accuracydict
Accuracy health settings.
- fairnessdict
Fairness health settings.
- custom_metricsdict
Custom metrics health settings.
- predictions_timelinessdict
Predictions timeliness health settings.
- actuals_timelinessdict
Actuals timeliness health settings.
- Return type
- get_default_health_settings()¶
Retrieve default health settings of this deployment.
New in version v3.4.
- Returns
- settingsdict in the following format:
- servicedict
Service health settings.
- data_driftdict
Data drift health settings.
- accuracydict
Accuracy health settings.
- fairnessdict
Fairness health settings.
- custom_metricsdict
Custom metrics health settings.
- predictions_timelinessdict
Predictions timeliness health settings.
- actuals_timelinessdict
Actuals timeliness health settings.
- Return type
- get_service_stats(model_id=None, start_time=None, end_time=None, execution_time_quantile=None, response_time_quantile=None, slow_requests_threshold=None)¶
Retrieves values of many service stat metrics aggregated over a time period.
New in version v2.18.
- Parameters
- model_idstr, optional
the id of the model
- start_timedatetime, optional
start of the time period
- end_timedatetime, optional
end of the time period
- execution_time_quantilefloat, optional
quantile for executionTime, defaults to 0.5
- response_time_quantilefloat, optional
quantile for responseTime, defaults to 0.5
- slow_requests_thresholdfloat, optional
threshold for slowRequests, defaults to 1000
- Returns
- service_statsServiceStats
the queried service stats metrics information
- Return type
- get_service_stats_over_time(metric=None, model_id=None, start_time=None, end_time=None, bucket_size=None, quantile=None, threshold=None)¶
Retrieves values of a single service stat metric over a time period.
New in version v2.18.
- Parameters
- metricSERVICE_STAT_METRIC, optional
the service stat metric to retrieve
- model_idstr, optional
the id of the model
- start_timedatetime, optional
start of the time period
- end_timedatetime, optional
end of the time period
- bucket_sizestr, optional
time duration of a bucket, in ISO 8601 time duration format
- quantilefloat, optional
quantile for ‘executionTime’ or ‘responseTime’, ignored when querying other metrics
- thresholdint, optional
threshold for ‘slowQueries’, ignored when querying other metrics
- Returns
- service_stats_over_timeServiceStatsOverTime
the queried service stats metric over time information
- Return type
- get_target_drift(model_id=None, start_time=None, end_time=None, metric=None)¶
Retrieve target drift information over a certain time period.
New in version v2.21.
- Parameters
- model_idstr
the id of the model
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- metricstr
(New in version v2.22) metric used to calculate the drift score
- Returns
- target_driftTargetDrift
the queried target drift information
- Return type
- get_feature_drift(model_id=None, start_time=None, end_time=None, metric=None)¶
Retrieve drift information for deployment’s features over a certain time period.
New in version v2.21.
- Parameters
- model_idstr
the id of the model
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- metricstr
(New in version v2.22) The metric used to calculate the drift score. Allowed values include psi, kl_divergence, dissimilarity, hellinger, and js_divergence.
- Returns
- feature_drift_data[FeatureDrift]
the queried feature drift information
- Return type
List
[FeatureDrift
]
- get_predictions_over_time(model_ids=None, start_time=None, end_time=None, bucket_size=None, target_classes=None, include_percentiles=False)¶
Retrieve stats of deployment’s prediction response over a certain time period.
New in version v3.2.
- Parameters
- model_idslist[str]
ID of models to retrieve prediction stats
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- bucket_sizeBUCKET_SIZE
time duration of each bucket
- target_classeslist[str]
class names of target, only for deployments with multiclass target
- include_percentilesbool
if the returned data includes percentiles, only for a deployment with a binary and regression target
- Returns
- predictions_over_timePredictionsOverTime
the queried predictions over time information
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') predictions_over_time = deployment.get_predictions_over_time() predictions_over_time.buckets[0]['mean_predicted_value'] >>>0.3772 predictions_over_time.buckets[0]['row_count'] >>>2000
- Return type
- get_accuracy(model_id=None, start_time=None, end_time=None, start=None, end=None, target_classes=None)¶
Retrieves values of many accuracy metrics aggregated over a time period.
New in version v2.18.
- Parameters
- model_idstr
the id of the model
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- target_classeslist[str], optional
Optional list of target class strings
- Returns
- accuracyAccuracy
the queried accuracy metrics information
- Return type
- get_accuracy_over_time(metric=None, model_id=None, start_time=None, end_time=None, bucket_size=None, target_classes=None)¶
Retrieves values of a single accuracy metric over a time period.
New in version v2.18.
- Parameters
- metricACCURACY_METRIC
the accuracy metric to retrieve
- model_idstr
the id of the model
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- bucket_sizestr
time duration of a bucket, in ISO 8601 time duration format
- target_classeslist[str], optional
Optional list of target class strings
- Returns
- accuracy_over_timeAccuracyOverTime
the queried accuracy metric over time information
- Return type
- get_predictions_vs_actuals_over_time(model_ids=None, start_time=None, end_time=None, bucket_size=None, target_classes=None)¶
Retrieve information for deployment’s predictions vs actuals over a certain time period.
New in version v3.3.
- Parameters
- model_idslist[str]
The ID of models to retrieve predictions vs actuals stats for.
- start_timedatetime
Start of the time period.
- end_timedatetime
End of the time period.
- bucket_sizeBUCKET_SIZE
Time duration of each bucket.
- target_classeslist[str]
Class names of target, only for deployments with a multiclass target.
- Returns
- predictions_vs_actuals_over_timePredictionsVsActualsOverTime
The queried predictions vs actuals over time information.
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') predictions_over_time = deployment.get_predictions_vs_actuals_over_time() predictions_over_time.buckets[0]['mean_actual_value'] >>>0.6673 predictions_over_time.buckets[0]['row_count_with_actual'] >>>500
- Return type
- get_fairness_scores_over_time(start_time=None, end_time=None, bucket_size=None, model_id=None, protected_feature=None, fairness_metric=None)¶
Retrieves values of a single fairness score over a time period.
New in version v3.2.
- Parameters
- model_idstr
the id of the model
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- bucket_sizestr
time duration of a bucket, in ISO 8601 time duration format
- protected_featurestr
name of protected feature
- fairness_metricstr
A consolidation of the fairness metrics by the use case.
- Returns
- fairness_scores_over_timeFairnessScoresOverTime
the queried fairness score over time information
- Return type
- update_secondary_dataset_config(secondary_dataset_config_id, credential_ids=None)¶
Update the secondary dataset config used by Feature discovery model for a given deployment.
New in version v2.23.
- Parameters
- secondary_dataset_config_id: str
Id of the secondary dataset config
- credential_ids: list or None
List of DatasetsCredentials used by the secondary datasets
Examples
from datarobot import Deployment deployment = Deployment(deployment_id='5c939e08962d741e34f609f0') config = deployment.update_secondary_dataset_config('5df109112ca582033ff44084') config >>> '5df109112ca582033ff44084'
- Return type
str
- get_secondary_dataset_config()¶
Get the secondary dataset config used by Feature discovery model for a given deployment.
New in version v2.23.
- Returns
- secondary_dataset_configSecondaryDatasetConfigurations
Id of the secondary dataset config
Examples
from datarobot import Deployment deployment = Deployment(deployment_id='5c939e08962d741e34f609f0') deployment.update_secondary_dataset_config('5df109112ca582033ff44084') config = deployment.get_secondary_dataset_config() config >>> '5df109112ca582033ff44084'
- Return type
str
- get_prediction_results(model_id=None, start_time=None, end_time=None, actuals_present=None, offset=None, limit=None)¶
Retrieve a list of prediction results of the deployment.
New in version v2.24.
- Parameters
- model_idstr
the id of the model
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- actuals_presentbool
filters predictions results to only those who have actuals present or with missing actuals
- offsetint
this many results will be skipped
- limitint
at most this many results are returned
- Returns
- prediction_results: list[dict]
a list of prediction results
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') results = deployment.get_prediction_results()
- Return type
List
[Dict
[str
,Any
]]
- download_prediction_results(filepath, model_id=None, start_time=None, end_time=None, actuals_present=None, offset=None, limit=None)¶
Download prediction results of the deployment as a CSV file.
New in version v2.24.
- Parameters
- filepathstr
path of the csv file
- model_idstr
the id of the model
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- actuals_presentbool
filters predictions results to only those who have actuals present or with missing actuals
- offsetint
this many results will be skipped
- limitint
at most this many results are returned
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') results = deployment.download_prediction_results('path_to_prediction_results.csv')
- Return type
None
- download_scoring_code(filepath, source_code=False, include_agent=False, include_prediction_explanations=False, include_prediction_intervals=False, max_wait=600)¶
Retrieve scoring code of the current deployed model.
New in version v2.24.
- Parameters
- filepathstr
path of the scoring code file
- source_codebool
whether source code or binary of the scoring code will be retrieved
- include_agentbool
whether the scoring code retrieved will include tracking agent
- include_prediction_explanationsbool
whether the scoring code retrieved will include prediction explanations
- include_prediction_intervalsbool
whether the scoring code retrieved will support prediction intervals
- max_wait: int, optional
Seconds to wait for successful resolution of a deployment creation job. Deployment supports making predictions only after a deployment creating job has successfully finished
Notes
When setting include_agent or include_predictions_explanations or include_prediction_intervals to True, it can take a considerably longer time to download the scoring code.
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') results = deployment.download_scoring_code('path_to_scoring_code.jar')
- Return type
None
- download_model_package_file(filepath, compute_all_ts_intervals=False)¶
Retrieve model package file (mlpkg) of the current deployed model.
New in version v3.3.
- Parameters
- filepathstr
The file path of the model package file.
- compute_all_ts_intervalsbool
Includes all time series intervals into the built Model Package (.mlpkg) if set to True.
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') deployment.download_model_package_file('path_to_model_package.mlpkg')
- Return type
None
- delete_monitoring_data(model_id, start_time=None, end_time=None, max_wait=600)¶
Delete deployment monitoring data.
- Parameters
- model_idstr
id of the model to delete monitoring data
- start_timedatetime, optional
start of the time period to delete monitoring data
- end_timedatetime, optional
end of the time period to delete monitoring data
- max_waitint, optional
seconds to wait for successful resolution
- Return type
None
Get a list of users, groups and organizations that have an access to this user blueprint
- Parameters
- id: str, Optional
Only return the access control information for a organization, group or user with this ID.
- name: string, Optional
Only return the access control information for a organization, group or user with this name.
- share_recipient_type: enum(‘user’, ‘group’, ‘organization’), Optional
Only returns results with the given recipient type.
- limit: int (Default=0)
At most this many results are returned.
- offset: int (Default=0)
This many results will be skipped.
- Returns
- List[DeploymentSharedRole]
- Return type
List
[DeploymentSharedRole
]
Share a deployment with a user, group, or organization
- Parameters
- roles: list(or(GrantAccessControlWithUsernameValidator, GrantAccessControlWithIdValidator))
Array of GrantAccessControl objects, up to maximum 100 objects.
- Return type
None
- list_challengers()¶
Get a list of challengers for this deployment.
New in version v3.4.
- Returns
- list(Challenger)
- Return type
List
[Challenger
]
- get_champion_model_package()¶
Get a champion model package for this deployment.
- Returns
- champion_model_packageChampionModelPackage
A champion model package object.
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') champion_model_package = deployment.get_champion_model_package()
- Return type
- list_prediction_data_exports(model_id=None, status=None, batch=None, offset=0, limit=100)¶
Retrieve a list of asynchronous prediction data exports.
- Parameters
- model_id: Optional[str]
The ID of the model used for prediction data export.
- status: Optional[str]
A prediction data export processing state.
- batch: Optional[bool]
If true, only return batch exports. If false, only return real-time exports. If not provided, return both real-time and batch exports.
- limit: Optional[int]
The maximum number of objects to return. The default is 100 (0 means no limit).
- offset: Optional[int]
The starting offset of the results. The default is 0.
- Returns
- prediction_data_exports: List[PredictionDataExport]
A list of prediction data exports.
- Return type
List
[PredictionDataExport
]
- list_actuals_data_exports(status=None, offset=0, limit=100)¶
Retrieve a list of asynchronous actuals data exports.
- Parameters
- status: Optional[str]
Actuals data export processing state.
- limit: Optional[int]
The maximum number of objects to return. The default is 100 (0 means no limit).
- offset: Optional[int]
The starting offset of the results. The default is 0.
- Returns
- actuals_data_exports: List[ActualsDataExport]
A list of actuals data exports.
- Return type
List
[ActualsDataExport
]
- list_training_data_exports()¶
Retrieve a list of successful training data exports.
- Returns
- training_data_export: List[TrainingDataExport]
A list of training data exports.
- Return type
List
[TrainingDataExport
]
- get_segment_attributes(monitoringType='serviceHealth')¶
Get a list of segment attributes for this deployment.
New in version v3.5.
- Parameters
- monitoringType: str, Optional
- Returns
- list(str)
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') segment_attributes = deployment.get_segment_attributes(DEPLOYMENT_MONITORING_TYPE.SERVICE_HEALTH)
- Return type
List
[str
]
- get_segment_values(segmentAttribute=None, limit=100, offset=0, search=None)¶
Get a list of segment values for this deployment.
New in version v3.5.
- Parameters
- segmentAttribute: str, Optional
Represents the different ways that prediction requests can be viewed.
- limit: int, Optional
The maximum number of values to return.
- offset: int, Optional
The starting point of the values to be returned.
- search: str, Optional
A string to filter the values.
- Returns
- list(str)
Examples
from datarobot import Deployment deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') segment_values = deployment.get_segment_values(segmentAttribute='DataRobot-Consumer')
- Return type
List
[str
]
- classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters
- datadict
Correctly snake_cased keys and their values.
- Return type
TypeVar
(T
, bound=APIObject
)
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
TypeVar
(T
, bound=APIObject
)
- open_in_browser()¶
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type
None
- class datarobot.models.deployment.DeploymentListFilters(role=None, service_health=None, model_health=None, accuracy_health=None, execution_environment_type=None, importance=None)¶
- class datarobot.models.deployment.ServiceStats(period=None, metrics=None, model_id=None)¶
Deployment service stats information.
- Attributes
- model_idstr
the model used to retrieve service stats metrics
- perioddict
the time period used to retrieve service stats metrics
- metricsdict
the service stats metrics
- classmethod get(deployment_id, model_id=None, start_time=None, end_time=None, execution_time_quantile=None, response_time_quantile=None, slow_requests_threshold=None)¶
Retrieve value of service stat metrics over a certain time period.
New in version v2.18.
- Parameters
- deployment_idstr
the id of the deployment
- model_idstr, optional
the id of the model
- start_timedatetime, optional
start of the time period
- end_timedatetime, optional
end of the time period
- execution_time_quantilefloat, optional
quantile for executionTime, defaults to 0.5
- response_time_quantilefloat, optional
quantile for responseTime, defaults to 0.5
- slow_requests_thresholdfloat, optional
threshold for slowRequests, defaults to 1000
- Returns
- service_statsServiceStats
the queried service stats metrics
- Return type
- class datarobot.models.deployment.ServiceStatsOverTime(buckets=None, summary=None, metric=None, model_id=None)¶
Deployment service stats over time information.
- Attributes
- model_idstr
the model used to retrieve accuracy metric
- metricstr
the service stat metric being retrieved
- bucketsdict
how the service stat metric changes over time
- summarydict
summary for the service stat metric
- classmethod get(deployment_id, metric=None, model_id=None, start_time=None, end_time=None, bucket_size=None, quantile=None, threshold=None)¶
Retrieve information about how a service stat metric changes over a certain time period.
New in version v2.18.
- Parameters
- deployment_idstr
the id of the deployment
- metricSERVICE_STAT_METRIC, optional
the service stat metric to retrieve
- model_idstr, optional
the id of the model
- start_timedatetime, optional
start of the time period
- end_timedatetime, optional
end of the time period
- bucket_sizestr, optional
time duration of a bucket, in ISO 8601 time duration format
- quantilefloat, optional
quantile for ‘executionTime’ or ‘responseTime’, ignored when querying other metrics
- thresholdint, optional
threshold for ‘slowQueries’, ignored when querying other metrics
- Returns
- service_stats_over_timeServiceStatsOverTime
the queried service stat over time information
- Return type
- property bucket_values: OrderedDict[str, Union[int, float, None]]¶
The metric value for all time buckets, keyed by start time of the bucket.
- Returns
- bucket_values: OrderedDict
- class datarobot.models.deployment.TargetDrift(period=None, metric=None, model_id=None, target_name=None, drift_score=None, sample_size=None, baseline_sample_size=None)¶
Deployment target drift information.
- Attributes
- model_idstr
the model used to retrieve target drift metric
- perioddict
the time period used to retrieve target drift metric
- metricstr
the data drift metric
- target_namestr
name of the target
- drift_scorefloat
target drift score
- sample_sizeint
count of data points for comparison
- baseline_sample_sizeint
count of data points for baseline
- classmethod get(deployment_id, model_id=None, start_time=None, end_time=None, metric=None)¶
Retrieve target drift information over a certain time period.
New in version v2.21.
- Parameters
- deployment_idstr
the id of the deployment
- model_idstr
the id of the model
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- metricstr
(New in version v2.22) metric used to calculate the drift score
- Returns
- target_driftTargetDrift
the queried target drift information
Examples
from datarobot import Deployment, TargetDrift deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') target_drift = TargetDrift.get(deployment.id) target_drift.period['end'] >>>'2019-08-01 00:00:00+00:00' target_drift.drift_score >>>0.03423 accuracy.target_name >>>'readmitted'
- Return type
- class datarobot.models.deployment.FeatureDrift(period=None, metric=None, model_id=None, name=None, drift_score=None, feature_impact=None, sample_size=None, baseline_sample_size=None)¶
Deployment feature drift information.
- Attributes
- model_idstr
the model used to retrieve feature drift metric
- perioddict
the time period used to retrieve feature drift metric
- metricstr
the data drift metric
- namestr
name of the feature
- drift_scorefloat
feature drift score
- sample_sizeint
count of data points for comparison
- baseline_sample_sizeint
count of data points for baseline
- classmethod list(deployment_id, model_id=None, start_time=None, end_time=None, metric=None)¶
Retrieve drift information for deployment’s features over a certain time period.
New in version v2.21.
- Parameters
- deployment_idstr
the id of the deployment
- model_idstr
the id of the model
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- metricstr
(New in version v2.22) metric used to calculate the drift score
- Returns
- feature_drift_data[FeatureDrift]
the queried feature drift information
Examples
from datarobot import Deployment, TargetDrift deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') feature_drift = FeatureDrift.list(deployment.id)[0] feature_drift.period >>>'2019-08-01 00:00:00+00:00' feature_drift.drift_score >>>0.252 feature_drift.name >>>'age'
- Return type
List
[FeatureDrift
]
- class datarobot.models.deployment.PredictionsOverTime(baselines=None, buckets=None)¶
Deployment predictions over time information.
- Attributes
- baselinesList
target baseline for each model queried
- bucketsList
predictions over time bucket for each model and bucket queried
- classmethod get(deployment_id, model_ids=None, start_time=None, end_time=None, bucket_size=None, target_classes=None, include_percentiles=False)¶
Retrieve information for deployment’s prediction response over a certain time period.
New in version v3.2.
- Parameters
- deployment_idstr
the id of the deployment
- model_idslist[str]
ID of models to retrieve prediction stats
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- bucket_sizeBUCKET_SIZE
time duration of each bucket
- target_classeslist[str]
class names of target, only for deployments with multiclass target
- include_percentilesbool
if the returned data includes percentiles, only for a deployment with a binary and regression target
- Returns
- predictions_over_timePredictionsOverTime
the queried predictions over time information
- Return type
- class datarobot.models.deployment.Accuracy(period=None, metrics=None, model_id=None)¶
Deployment accuracy information.
- Attributes
- model_idstr
the model used to retrieve accuracy metrics
- perioddict
the time period used to retrieve accuracy metrics
- metricsdict
the accuracy metrics
- classmethod get(deployment_id, model_id=None, start_time=None, end_time=None, target_classes=None)¶
Retrieve values of accuracy metrics over a certain time period.
New in version v2.18.
- Parameters
- deployment_idstr
the id of the deployment
- model_idstr
the id of the model
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- target_classeslist[str], optional
Optional list of target class strings
- Returns
- accuracyAccuracy
the queried accuracy metrics information
Examples
from datarobot import Deployment, Accuracy deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') accuracy = Accuracy.get(deployment.id) accuracy.period['end'] >>>'2019-08-01 00:00:00+00:00' accuracy.metric['LogLoss']['value'] >>>0.7533 accuracy.metric_values['LogLoss'] >>>0.7533
- Return type
- property metric_values: Dict[str, Optional[int]]¶
The value for all metrics, keyed by metric name.
- Returns
- metric_values: Dict
- Return type
Dict
[str
,Optional
[int
]]
- property metric_baselines: Dict[str, Optional[int]]¶
The baseline value for all metrics, keyed by metric name.
- Returns
- metric_baselines: Dict
- Return type
Dict
[str
,Optional
[int
]]
- property percent_changes: Dict[str, Optional[int]]¶
The percent change of value over baseline for all metrics, keyed by metric name.
- Returns
- percent_changes: Dict
- Return type
Dict
[str
,Optional
[int
]]
- class datarobot.models.deployment.AccuracyOverTime(buckets=None, summary=None, baseline=None, metric=None, model_id=None)¶
Deployment accuracy over time information.
- Attributes
- model_idstr
the model used to retrieve accuracy metric
- metricstr
the accuracy metric being retrieved
- bucketsdict
how the accuracy metric changes over time
- summarydict
summary for the accuracy metric
- baselinedict
baseline for the accuracy metric
- classmethod get(deployment_id, metric=None, model_id=None, start_time=None, end_time=None, bucket_size=None, target_classes=None)¶
Retrieve information about how an accuracy metric changes over a certain time period.
New in version v2.18.
- Parameters
- deployment_idstr
the id of the deployment
- metricACCURACY_METRIC
the accuracy metric to retrieve
- model_idstr
the id of the model
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- bucket_sizestr
time duration of a bucket, in ISO 8601 time duration format
- target_classeslist[str], optional
Optional list of target class strings
- Returns
- accuracy_over_timeAccuracyOverTime
the queried accuracy metric over time information
Examples
from datarobot import Deployment, AccuracyOverTime from datarobot.enums import ACCURACY_METRICS deployment = Deployment.get(deployment_id='5c939e08962d741e34f609f0') accuracy_over_time = AccuracyOverTime.get(deployment.id, metric=ACCURACY_METRIC.LOGLOSS) accuracy_over_time.metric >>>'LogLoss' accuracy_over_time.metric_values >>>{datetime.datetime(2019, 8, 1): 0.73, datetime.datetime(2019, 8, 2): 0.55}
- Return type
- classmethod get_as_dataframe(deployment_id, metrics=None, model_id=None, start_time=None, end_time=None, bucket_size=None)¶
Retrieve information about how a list of accuracy metrics change over a certain time period as pandas DataFrame.
In the returned DataFrame, the columns corresponds to the metrics being retrieved; the rows are labeled with the start time of each bucket.
- Parameters
- deployment_idstr
the id of the deployment
- metrics[ACCURACY_METRIC]
the accuracy metrics to retrieve
- model_idstr
the id of the model
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- bucket_sizestr
time duration of a bucket, in ISO 8601 time duration format
- Returns
- accuracy_over_time: pd.DataFrame
- Return type
DataFrame
- property bucket_values: Dict[datetime, int]¶
The metric value for all time buckets, keyed by start time of the bucket.
- Returns
- bucket_values: Dict
- Return type
Dict
[datetime
,int
]
- property bucket_sample_sizes: Dict[datetime, int]¶
The sample size for all time buckets, keyed by start time of the bucket.
- Returns
- bucket_sample_sizes: Dict
- Return type
Dict
[datetime
,int
]
- class datarobot.models.deployment.PredictionsVsActualsOverTime(summary=None, baselines=None, buckets=None)¶
Deployment predictions vs actuals over time information.
- Attributes
- summarydict
predictions vs actuals over time summary for all models and buckets queried
- baselinesList
target baseline for each model queried
- bucketsList
predictions vs actuals over time bucket for each model and bucket queried
- classmethod get(deployment_id, model_ids=None, start_time=None, end_time=None, bucket_size=None, target_classes=None)¶
Retrieve information for deployment’s predictions vs actuals over a certain time period.
New in version v3.3.
- Parameters
- deployment_idstr
the id of the deployment
- model_idslist[str]
ID of models to retrieve predictions vs actuals stats
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- bucket_sizeBUCKET_SIZE
time duration of each bucket
- target_classeslist[str]
class names of target, only for deployments with multiclass target
- Returns
- predictions_vs_actuals_over_timePredictionsVsActualsOverTime
the queried predictions vs actuals over time information
- Return type
- class datarobot.models.deployment.bias_and_fairness.FairnessScoresOverTime(summary=None, buckets=None, protected_feature=None, fairness_threshold=None, model_id=None, model_package_id=None, favorable_target_outcome=None)¶
Deployment fairness over time information.
- Attributes
- bucketsList
fairness over time bucket for each model and bucket queried
- summarydict
summary for the fairness score
- protected_featurestr
name of protected feature
- fairnessThresholdfloat
threshold used to compute fairness results
- modelIdstr
model id for which fairness is computed
- modelPackageIdstr
model package (version) id for which fairness is computed
- favorableTargetOutcomebool
preferable class of the target
- classmethod get(deployment_id, model_id=None, start_time=None, end_time=None, bucket_size=None, fairness_metric=None, protected_feature=None)¶
Retrieve information for deployment’s fairness score response over a certain time period.
New in version FUTURE.
- Parameters
- deployment_idstr
the id of the deployment
- model_idstr
id of models to retrieve fairness score stats
- start_timedatetime
start of the time period
- end_timedatetime
end of the time period
- protected_featurestr
name of the protected feature
- fairness_metricstr
A consolidation of the fairness metrics by the use case.
- bucket_sizeBUCKET_SIZE
time duration of each bucket
- Returns
- fairness_scores_over_timeFairnessScoresOverTime
the queried fairness score over time information
- Return type
- Parameters
- share_recipient_type: enum(‘user’, ‘group’, ‘organization’)
Describes the recipient type, either user, group, or organization.
- role: str, one of enum(‘CONSUMER’, ‘USER’, ‘OWNER’)
The role of the org/group/user on this deployment.
- id: str
The ID of the recipient organization, group or user.
- name: string
The name of the recipient organization, group or user.
- Parameters
- share_recipient_type: enum(‘user’, ‘group’, ‘organization’)
Describes the recipient type, either user, group, or organization.
- role: enum(‘OWNER’, ‘USER’, ‘OBSERVER’, ‘NO_ROLE’)
The role of the recipient on this entity. One of OWNER, USER, OBSERVER, NO_ROLE. If NO_ROLE is specified, any existing role for the recipient will be removed.
- id: str
The ID of the recipient.
- Parameters
- role: string
The role of the recipient on this entity. One of OWNER, USER, CONSUMER, NO_ROLE. If NO_ROLE is specified, any existing role for the user will be removed.
- username: string
Username of the user to update the access role for.
- class datarobot.models.deployment.deployment.FeatureDict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
- class datarobot.models.deployment.deployment.ForecastDateSettings() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
- class datarobot.models.deployment.deployment.ChallengerModelsSettings() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
- class datarobot.models.deployment.deployment.SegmentAnalysisSettings() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
- class datarobot.models.deployment.deployment.BiasAndFairnessSettings() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
- class datarobot.models.deployment.deployment.ChallengerReplaySettings() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
- class datarobot.models.deployment.deployment.HealthSettings() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
- class datarobot.models.deployment.deployment.DriftTrackingSettings() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
- class datarobot.models.deployment.deployment.PredictionWarningSettings() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
- class datarobot.models.deployment.deployment.PredictionIntervalsSettings() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
External Baseline Validation¶
- class datarobot.models.external_baseline_validation.ExternalBaselineValidationInfo(baseline_validation_job_id, project_id, catalog_version_id, target, datetime_partition_column, is_external_baseline_dataset_valid, multiseries_id_columns=None, holdout_start_date=None, holdout_end_date=None, backtests=None, forecast_window_start=None, forecast_window_end=None, message=None)¶
An object containing information about external time series baseline predictions validation results.
- Attributes
- baseline_validation_job_idstr
the identifier of the baseline validation job
- project_idstr
the identifier of the project
- catalog_version_idstr
the identifier of the catalog version used in the validation job
- targetstr
the name of the target feature
- datetime_partition_columnstr
the name of the column whose values as dates are used to assign a row to a particular partition
- is_external_baseline_dataset_validbool
whether the external baseline dataset passes the validation check
- multiseries_id_columnslist of str or null
a list of the names of multiseries id columns to define series within the training data. Currently only one multiseries id column is supported.
- holdout_start_datestr or None
the start date of holdout scoring data
- holdout_end_datestr or None
the end date of holdout scoring data
- backtestslist of dicts containing validation_start_date and validation_end_date or None
the configured backtests of the time series project
- forecast_window_startint
offset into the future to define how far forward relative to the forecast point the forecast window should start.
- forecast_window_endint
offset into the future to define how far forward relative to the forecast point the forecast window should end.
- messagestr or None
the description of the issue with external baseline validation job
- classmethod get(project_id, validation_job_id)¶
Get information about external baseline validation job
- Parameters
- project_idstring
the identifier of the project
- validation_job_idstring
the identifier of the external baseline validation job
- Returns
- info: ExternalBaselineValidationInfo
information about external baseline validation job
- Return type
External Scores and Insights¶
- class datarobot.ExternalScores(project_id, scores, model_id=None, dataset_id=None, actual_value_column=None)¶
Metric scores on prediction dataset with target or actual value column in unsupervised case. Contains project metrics for supervised and special classification metrics set for unsupervised projects.
New in version v2.21.
Examples
List all scores for a dataset
import datarobot as dr scores = dr.Scores.list(project_id, dataset_id=dataset_id)
- Attributes
- project_id: str
id of the project the model belongs to
- model_id: str
id of the model
- dataset_id: str
id of the prediction dataset with target or actual value column for unsupervised case
- actual_value_column: str, optional
For unsupervised projects only. Actual value column which was used to calculate the classification metrics and insights on the prediction dataset.
- scores: list of dicts in a form of {‘label’: metric_name, ‘value’: score}
Scores on the dataset.
- classmethod create(project_id, model_id, dataset_id, actual_value_column=None)¶
Compute an external dataset insights for the specified model.
- Parameters
- project_idstr
id of the project the model belongs to
- model_idstr
id of the model for which insights is requested
- dataset_idstr
id of the dataset for which insights is requested
- actual_value_columnstr, optional
actual values column label, for unsupervised projects only
- Returns
- jobJob
an instance of created async job
- Return type
- classmethod list(project_id, model_id=None, dataset_id=None, offset=0, limit=100)¶
Fetch external scores list for the project and optionally for model and dataset.
- Parameters
- project_id: str
id of the project
- model_id: str, optional
if specified, only scores for this model will be retrieved
- dataset_id: str, optional
if specified, only scores for this dataset will be retrieved
- offset: int, optional
this many results will be skipped, default: 0
- limit: int, optional
at most this many results are returned, default: 100, max 1000. To return all results, specify 0
- Returns
- A list ofpy:class:External Scores <datarobot.ExternalScores> objects
- Return type
List
[ExternalScores
]
- classmethod get(project_id, model_id, dataset_id)¶
Retrieve external scores for the project, model and dataset.
- Parameters
- project_id: str
id of the project
- model_id: str
if specified, only scores for this model will be retrieved
- dataset_id: str
if specified, only scores for this dataset will be retrieved
- Returns
External Scores
object
- Return type
- class datarobot.ExternalLiftChart(dataset_id, bins)¶
Lift chart for the model and prediction dataset with target or actual value column in unsupervised case.
New in version v2.21.
LiftChartBin
is a dict containing the following:actual
(float) Sum of actual target values in binpredicted
(float) Sum of predicted target values in binbin_weight
(float) The weight of the bin. For weighted projects, it is the sum of the weights of the rows in the bin. For unweighted projects, it is the number of rows in the bin.
- Attributes
- dataset_id: str
id of the prediction dataset with target or actual value column for unsupervised case
- bins: list of dict
List of dicts with schema described as
LiftChartBin
above.
- classmethod list(project_id, model_id, dataset_id=None, offset=0, limit=100)¶
Retrieve list of the lift charts for the model.
- Parameters
- project_id: str
id of the project
- model_id: str
if specified, only lift chart for this model will be retrieved
- dataset_id: str, optional
if specified, only lift chart for this dataset will be retrieved
- offset: int, optional
this many results will be skipped, default: 0
- limit: int, optional
at most this many results are returned, default: 100, max 1000. To return all results, specify 0
- Returns
- A list ofpy:class:ExternalLiftChart <datarobot.ExternalLiftChart> objects
- Return type
List
[ExternalLiftChart
]
- classmethod get(project_id, model_id, dataset_id)¶
Retrieve lift chart for the model and prediction dataset.
- Parameters
- project_id: str
project id
- model_id: str
model id
- dataset_id: str
prediction dataset id with target or actual value column for unsupervised case
- Returns
ExternalLiftChart
object
- Return type
- class datarobot.ExternalRocCurve(dataset_id, roc_points, negative_class_predictions, positive_class_predictions)¶
ROC curve data for the model and prediction dataset with target or actual value column in unsupervised case.
New in version v2.21.
- Attributes
- dataset_id: str
id of the prediction dataset with target or actual value column for unsupervised case
- roc_points: list of dict
List of precalculated metrics associated with thresholds for ROC curve.
- negative_class_predictions: list of float
List of predictions from example for negative class
- positive_class_predictions: list of float
List of predictions from example for positive class
- classmethod list(project_id, model_id, dataset_id=None, offset=0, limit=100)¶
Retrieve list of the roc curves for the model.
- Parameters
- project_id: str
id of the project
- model_id: str
if specified, only lift chart for this model will be retrieved
- dataset_id: str, optional
if specified, only lift chart for this dataset will be retrieved
- offset: int, optional
this many results will be skipped, default: 0
- limit: int, optional
at most this many results are returned, default: 100, max 1000. To return all results, specify 0
- Returns
- A list ofpy:class:ExternalRocCurve <datarobot.ExternalRocCurve> objects
- Return type
List
[ExternalRocCurve
]
- classmethod get(project_id, model_id, dataset_id)¶
Retrieve ROC curve chart for the model and prediction dataset.
- Parameters
- project_id: str
project id
- model_id: str
model id
- dataset_id: str
prediction dataset id with target or actual value column for unsupervised case
- Returns
ExternalRocCurve
object
- Return type
Insights¶
- class datarobot.insights.base.BaseInsight(id, entity_id, project_id, source, data, data_slice_id=None, external_dataset_id=None)¶
Base Insight class for modern insights
This class serves as a template for modern insights created using the Root Insights framework. It provides most necessary functions for easily implementing classes that wrap specific insights.
- get_uri()¶
This should define the URI to their browser based interactions
- Return type
str
- classmethod from_server_data(data, keep_attrs=None)¶
Override from_server_data to handle paginated responses
- Return type
Self
- classmethod compute(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, **kwargs)¶
Submit an insight compute request. You can use create if you want to wait synchronously for the completion of the job. May be overridden by insight subclasses to accept additional parameters.
- Parameters
- entity_id: str
ID of the entity to compute the insight.
- source: str
Source type to use when computing the insight.
- data_slice_id: Optional[str]
Data slice ID to use when computing the insight.
- external_dataset_id: Optional[str]
External dataset ID to use when computing the insight.
- entity_type: Optional[ENTITY_TYPES]
Type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”.
- Returns
- StatusCheckJob
Status check job entity for the asynchronous insight calculation.
- Return type
- classmethod create(entity_id, source=INSIGHTS_SOURCES.VALIDATION, data_slice_id=None, external_dataset_id=None, entity_type=ENTITY_TYPES.DATAROBOT_MODEL, max_wait=600, **kwargs)¶
Create an insight and wait for completion. May be overridden by insight subclasses to accept additional parameters.
- Parameters
- entity_id: str
ID of the entity to compute the insight.
- source: str
Source type to use when computing the insight.
- data_slice_id: Optional[str]
Data slice ID to use when computing the insight.
- external_dataset_id: Optional[str]
External dataset ID to use when computing the insight.
- entity_type: Optional[ENTITY_TYPES]
Type of the entity associated with the insight. Select one of the ENTITY_TYPE enum values, or accept the default, “datarobotModel”.
- max_wait: int
Number of seconds to wait for the result.
- Returns
- Self
Entity of the newly or already computed insights.
- Return type
Self
- classmethod list(entity_id)¶
List all generated insights.
- Parameters
- entity_id: str
ID of the entity to list all generated insights.
- Returns
- List[Any]
List of newly or already computed insights.
- Return type
List
[Self
]
- class datarobot.insights.ShapMatrix(id, entity_id, project_id, source, data, data_slice_id=None, external_dataset_id=None)¶
Class for SHAP Matrix calculations. Use the standard methods of BaseInsight to compute and retrieve SHAP matrices: * compute: submit a request to compute a SHAP matrix, and return immediately * create: submit a request to compute a SHAP matrix, and wait for it to finish * list: retrieve all ShapMatrix results for a model, possibly on multiple datasets or data slices.
- property matrix: Any¶
SHAP matrix values.
- Return type
Any
- property base_value: float¶
SHAP base value for the matrix values
- Return type
float
- property columns: List[str]¶
List of columns associated with the SHAP matrix
- Return type
List
[str
]
- property link_function: str¶
Link function used to generate the SHAP matrix
- Return type
str
- class datarobot.insights.ShapPreview(id, entity_id, project_id, source, data, data_slice_id=None, external_dataset_id=None)¶
Class for SHAP Preview calculations. Use the standard methods of BaseInsight to compute and retrieve SHAP matrices: * compute: submit a request to compute a SHAP preview, and return immediately * create: submit a request to compute a SHAP preview, and wait for it to finish * list: retrieve all ShapPreview results for a model, possibly on multiple datasets or data slices.
- property previews: List[Dict[str, Any]]¶
SHAP preview values
- Returns
- previewList[Dict[str, Any]]
A list of the ShapPreview values for each row
- Return type
List
[Dict
[str
,Any
]]
- property previews_count: int¶
Number of shap preview rows
- Returns
- int
- Return type
int
- class datarobot.insights.ShapImpact(id, entity_id, project_id, source, data, data_slice_id=None, external_dataset_id=None)¶
Class for SHAP Impact calculations. Use the standard methods of BaseInsight to compute and retrieve SHAP matrices: * compute: submit a request to compute a SHAP impact, and return immediately * create: submit a request to compute a SHAP impact, and wait for it to finish * list: retrieve all ShapImpact results for a model, possibly on multiple datasets or data slices.
- property shap_impacts: List[List[Any]]¶
SHAP impact values
- Returns
- shap impacts
A list of the SHAP impact values
- Return type
List
[List
[Any
]]
- property base_value: List[float]¶
A list of base prediction values
- Return type
List
[float
]
- property capping: Optional[Dict[str, Any]]¶
Capping for the models in the blender
- Return type
Optional
[Dict
[str
,Any
]]
- property link: Optional[str]¶
Shared link function of the models in the blender
- Return type
Optional
[str
]
- property row_count: int¶
Number of SHAP impact rows
- Return type
int
Feature¶
- class datarobot.models.Feature(id, project_id=None, name=None, feature_type=None, importance=None, low_information=None, unique_count=None, na_count=None, date_format=None, min=None, max=None, mean=None, median=None, std_dev=None, time_series_eligible=None, time_series_eligibility_reason=None, time_step=None, time_unit=None, target_leakage=None, feature_lineage_id=None, key_summary=None, multilabel_insights=None)¶
A feature from a project’s dataset
These are features either included in the originally uploaded dataset or added to it via feature transformations. In time series projects, these will be distinct from the
ModelingFeature
s created during partitioning; otherwise, they will correspond to the same features. For more information about input and modeling features, see the time series documentation.The
min
,max
,mean
,median
, andstd_dev
attributes provide information about the distribution of the feature in the EDA sample data. For non-numeric features or features created prior to these summary statistics becoming available, they will be None. For features where the summary statistics are available, they will be in a format compatible with the data type, i.e. date type features will have their summary statistics expressed as ISO-8601 formatted date strings.- Attributes
- idint
the id for the feature - note that name is used to reference the feature instead of id
- project_idstr
the id of the project the feature belongs to
- namestr
the name of the feature
- feature_typestr
the type of the feature, e.g. ‘Categorical’, ‘Text’
- importancefloat or None
numeric measure of the strength of relationship between the feature and target (independent of any model or other features); may be None for non-modeling features such as partition columns
- low_informationbool
whether a feature is considered too uninformative for modeling (e.g. because it has too few values)
- unique_countint
number of unique values
- na_countint or None
number of missing values
- date_formatstr or None
For Date features, the date format string for how this feature was interpreted, compatible with https://docs.python.org/2/library/time.html#time.strftime . For other feature types, None.
- minstr, int, float, or None
The minimum value of the source data in the EDA sample
- maxstr, int, float, or None
The maximum value of the source data in the EDA sample
- meanstr, int, or, float
The arithmetic mean of the source data in the EDA sample
- medianstr, int, float, or None
The median of the source data in the EDA sample
- std_devstr, int, float, or None
The standard deviation of the source data in the EDA sample
- time_series_eligiblebool
Whether this feature can be used as the datetime partition column in a time series project.
- time_series_eligibility_reasonstr
Why the feature is ineligible for the datetime partition column in a time series project, or ‘suitable’ when it is eligible.
- time_stepint or None
For time series eligible features, a positive integer determining the interval at which windows can be specified. If used as the datetime partition column on a time series project, the feature derivation and forecast windows must start and end at an integer multiple of this value. None for features that are not time series eligible.
- time_unitstr or None
For time series eligible features, the time unit covered by a single time step, e.g. ‘HOUR’, or None for features that are not time series eligible.
- target_leakagestr
Whether a feature is considered to have target leakage or not. A value of ‘SKIPPED_DETECTION’ indicates that target leakage detection was not run on the feature. ‘FALSE’ indicates no leakage, ‘MODERATE’ indicates a moderate risk of target leakage, and ‘HIGH_RISK’ indicates a high risk of target leakage
- feature_lineage_idstr
id of a lineage for automatically discovered features or derived time series features.
- key_summary: list of dict
Statistics for top 50 keys (truncated to 103 characters) of Summarized Categorical column example:
{‘key’:’DataRobot’, ‘summary’:{‘min’:0, ‘max’:29815.0, ‘stdDev’:6498.029, ‘mean’:1490.75, ‘median’:0.0, ‘pctRows’:5.0}}
- where,
- key: string or None
name of the key
- summary: dict
statistics of the key
max: maximum value of the key. min: minimum value of the key. mean: mean value of the key. median: median value of the key. stdDev: standard deviation of the key. pctRows: percentage occurrence of key in the EDA sample of the feature.
- multilabel_insights_keystr or None
For multicategorical columns this will contain a key for multilabel insights. The key is unique for a project, feature and EDA stage combination. This will be the key for the most recent, finished EDA stage.
- classmethod get(project_id, feature_name)¶
Retrieve a single feature
- Parameters
- project_idstr
The ID of the project the feature is associated with.
- feature_namestr
The name of the feature to retrieve
- Returns
- featureFeature
The queried instance
- get_multiseries_properties(multiseries_id_columns, max_wait=600)¶
Retrieve time series properties for a potential multiseries datetime partition column
Multiseries time series projects use multiseries id columns to model multiple distinct series within a single project. This function returns the time series properties (time step and time unit) of this column if it were used as a datetime partition column with the specified multiseries id columns, running multiseries detection automatically if it had not previously been successfully ran.
- Parameters
- multiseries_id_columnslist of str
the name(s) of the multiseries id columns to use with this datetime partition column. Currently only one multiseries id column is supported.
- max_waitint, optional
if a multiseries detection task is run, the maximum amount of time to wait for it to complete before giving up
- Returns
- propertiesdict
A dict with three keys:
time_series_eligible : bool, whether the column can be used as a partition column
time_unit : str or null, the inferred time unit if used as a partition column
time_step : int or null, the inferred time step if used as a partition column
- get_cross_series_properties(datetime_partition_column, cross_series_group_by_columns, max_wait=600)¶
Retrieve cross-series properties for multiseries ID column.
This function returns the cross-series properties (eligibility as group-by column) of this column if it were used with specified datetime partition column and with current multiseries id column, running cross-series group-by validation automatically if it had not previously been successfully ran.
- Parameters
- datetime_partition_columndatetime partition column
- cross_series_group_by_columnslist of str
the name(s) of the columns to use with this multiseries ID column. Currently only one cross-series group-by column is supported.
- max_waitint, optional
if a multiseries detection task is run, the maximum amount of time to wait for it to complete before giving up
- Returns
- propertiesdict
A dict with three keys:
name : str, column name
eligibility : str, reason for column eligibility
isEligible : bool, is column eligible as cross-series group-by
- get_multicategorical_histogram()¶
Retrieve multicategorical histogram for this feature
New in version v2.24.
- Returns
- Raises
- datarobot.errors.InvalidUsageError
if this method is called on a unsuited feature
- ValueError
if no multilabel_insights_key is present for this feature
- get_pairwise_correlations()¶
Retrieve pairwise label correlation for multicategorical features
New in version v2.24.
- Returns
- Raises
- datarobot.errors.InvalidUsageError
if this method is called on a unsuited feature
- ValueError
if no multilabel_insights_key is present for this feature
- get_pairwise_joint_probabilities()¶
Retrieve pairwise label joint probabilities for multicategorical features
New in version v2.24.
- Returns
- Raises
- datarobot.errors.InvalidUsageError
if this method is called on a unsuited feature
- ValueError
if no multilabel_insights_key is present for this feature
- get_pairwise_conditional_probabilities()¶
Retrieve pairwise label conditional probabilities for multicategorical features
New in version v2.24.
- Returns
- Raises
- datarobot.errors.InvalidUsageError
if this method is called on a unsuited feature
- ValueError
if no multilabel_insights_key is present for this feature
- classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters
- datadict
Correctly snake_cased keys and their values.
- Return type
TypeVar
(T
, bound=APIObject
)
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
TypeVar
(T
, bound=APIObject
)
- get_histogram(bin_limit=None)¶
Retrieve a feature histogram
- Parameters
- bin_limitint or None
Desired max number of histogram bins. If omitted, by default endpoint will use 60.
- Returns
- featureHistogramFeatureHistogram
The requested histogram with desired number or bins
- class datarobot.models.ModelingFeature(project_id=None, name=None, feature_type=None, importance=None, low_information=None, unique_count=None, na_count=None, date_format=None, min=None, max=None, mean=None, median=None, std_dev=None, parent_feature_names=None, key_summary=None, is_restored_after_reduction=None)¶
A feature used for modeling
In time series projects, a new set of modeling features is created after setting the partitioning options. These features are automatically derived from those in the project’s dataset and are the features used for modeling. Modeling features are only accessible once the target and partitioning options have been set. In projects that don’t use time series modeling, once the target has been set, ModelingFeatures and Features will behave the same.
For more information about input and modeling features, see the time series documentation.
As with the
Feature
object, the min, max, `mean, median, and std_dev attributes provide information about the distribution of the feature in the EDA sample data. For non-numeric features, they will be None. For features where the summary statistics are available, they will be in a format compatible with the data type, i.e. date type features will have their summary statistics expressed as ISO-8601 formatted date strings.- Attributes
- project_idstr
the id of the project the feature belongs to
- namestr
the name of the feature
- feature_typestr
the type of the feature, e.g. ‘Categorical’, ‘Text’
- importancefloat or None
numeric measure of the strength of relationship between the feature and target (independent of any model or other features); may be None for non-modeling features such as partition columns
- low_informationbool
whether a feature is considered too uninformative for modeling (e.g. because it has too few values)
- unique_countint
number of unique values
- na_countint or None
number of missing values
- date_formatstr or None
For Date features, the date format string for how this feature was interpreted, compatible with https://docs.python.org/2/library/time.html#time.strftime . For other feature types, None.
- minstr, int, float, or None
The minimum value of the source data in the EDA sample
- maxstr, int, float, or None
The maximum value of the source data in the EDA sample
- meanstr, int, or, float
The arithmetic mean of the source data in the EDA sample
- medianstr, int, float, or None
The median of the source data in the EDA sample
- std_devstr, int, float, or None
The standard deviation of the source data in the EDA sample
- parent_feature_nameslist of str
A list of the names of input features used to derive this modeling feature. In cases where the input features and modeling features are the same, this will simply contain the feature’s name. Note that if a derived feature was used to create this modeling feature, the values here will not necessarily correspond to the features that must be supplied at prediction time.
- key_summary: list of dict
Statistics for top 50 keys (truncated to 103 characters) of Summarized Categorical column example:
{‘key’:’DataRobot’, ‘summary’:{‘min’:0, ‘max’:29815.0, ‘stdDev’:6498.029, ‘mean’:1490.75, ‘median’:0.0, ‘pctRows’:5.0}}
- where,
- key: string or None
name of the key
- summary: dict
statistics of the key
max: maximum value of the key. min: minimum value of the key. mean: mean value of the key. median: median value of the key. stdDev: standard deviation of the key. pctRows: percentage occurrence of key in the EDA sample of the feature.
- classmethod get(project_id, feature_name)¶
Retrieve a single modeling feature
- Parameters
- project_idstr
The ID of the project the feature is associated with.
- feature_namestr
The name of the feature to retrieve
- Returns
- featureModelingFeature
The requested feature
- class datarobot.models.DatasetFeature(id_, dataset_id=None, dataset_version_id=None, name=None, feature_type=None, low_information=None, unique_count=None, na_count=None, date_format=None, min_=None, max_=None, mean=None, median=None, std_dev=None, time_series_eligible=None, time_series_eligibility_reason=None, time_step=None, time_unit=None, target_leakage=None, target_leakage_reason=None)¶
A feature from a project’s dataset
These are features either included in the originally uploaded dataset or added to it via feature transformations.
The
min
,max
,mean
,median
, andstd_dev
attributes provide information about the distribution of the feature in the EDA sample data. For non-numeric features or features created prior to these summary statistics becoming available, they will be None. For features where the summary statistics are available, they will be in a format compatible with the data type, i.e. date type features will have their summary statistics expressed as ISO-8601 formatted date strings.- Attributes
- idint
the id for the feature - note that name is used to reference the feature instead of id
- dataset_idstr
the id of the dataset the feature belongs to
- dataset_version_idstr
the id of the dataset version the feature belongs to
- namestr
the name of the feature
- feature_typestr, optional
the type of the feature, e.g. ‘Categorical’, ‘Text’
- low_informationbool, optional
whether a feature is considered too uninformative for modeling (e.g. because it has too few values)
- unique_countint, optional
number of unique values
- na_countint, optional
number of missing values
- date_formatstr, optional
For Date features, the date format string for how this feature was interpreted, compatible with https://docs.python.org/2/library/time.html#time.strftime . For other feature types, None.
- minstr, int, float, optional
The minimum value of the source data in the EDA sample
- maxstr, int, float, optional
The maximum value of the source data in the EDA sample
- meanstr, int, float, optional
The arithmetic mean of the source data in the EDA sample
- medianstr, int, float, optional
The median of the source data in the EDA sample
- std_devstr, int, float, optional
The standard deviation of the source data in the EDA sample
- time_series_eligiblebool, optional
Whether this feature can be used as the datetime partition column in a time series project.
- time_series_eligibility_reasonstr, optional
Why the feature is ineligible for the datetime partition column in a time series project, or ‘suitable’ when it is eligible.
- time_stepint, optional
For time series eligible features, a positive integer determining the interval at which windows can be specified. If used as the datetime partition column on a time series project, the feature derivation and forecast windows must start and end at an integer multiple of this value. None for features that are not time series eligible.
- time_unitstr, optional
For time series eligible features, the time unit covered by a single time step, e.g. ‘HOUR’, or None for features that are not time series eligible.
- target_leakagestr, optional
Whether a feature is considered to have target leakage or not. A value of ‘SKIPPED_DETECTION’ indicates that target leakage detection was not run on the feature. ‘FALSE’ indicates no leakage, ‘MODERATE’ indicates a moderate risk of target leakage, and ‘HIGH_RISK’ indicates a high risk of target leakage
- target_leakage_reason: string, optional
The descriptive text explaining the reason for target leakage, if any.
- get_histogram(bin_limit=None)¶
Retrieve a feature histogram
- Parameters
- bin_limitint or None
Desired max number of histogram bins. If omitted, by default endpoint will use 60.
- Returns
- featureHistogramDatasetFeatureHistogram
The requested histogram with desired number or bins
- class datarobot.models.DatasetFeatureHistogram(plot)¶
- classmethod get(dataset_id, feature_name, bin_limit=None, key_name=None)¶
Retrieve a single feature histogram
- Parameters
- dataset_idstr
The ID of the Dataset the feature is associated with.
- feature_namestr
The name of the feature to retrieve
- bin_limitint or None
Desired max number of histogram bins. If omitted, by default the endpoint will use 60.
- key_name: string or None
(Only required for summarized categorical feature) Name of the top 50 keys for which plot to be retrieved
- Returns
- featureHistogramFeatureHistogram
The queried instance with plot attribute in it.
- class datarobot.models.FeatureHistogram(plot)¶
- classmethod get(project_id, feature_name, bin_limit=None, key_name=None)¶
Retrieve a single feature histogram
- Parameters
- project_idstr
The ID of the project the feature is associated with.
- feature_namestr
The name of the feature to retrieve
- bin_limitint or None
Desired max number of histogram bins. If omitted, by default endpoint will use 60.
- key_name: string or None
(Only required for summarized categorical feature) Name of the top 50 keys for which plot to be retrieved
- Returns
- featureHistogramFeatureHistogram
The queried instance with plot attribute in it.
- class datarobot.models.InteractionFeature(rows, source_columns, bars, bubbles)¶
Interaction feature data
New in version v2.21.
- Attributes
- rows: int
Total number of rows
- source_columns: list(str)
names of two categorical features which were combined into this one
- bars: list(dict)
dictionaries representing frequencies of each independent value from the source columns
- bubbles: list(dict)
dictionaries representing frequencies of each combined value in the interaction feature.
- classmethod get(project_id, feature_name)¶
Retrieve a single Interaction feature
- Parameters
- project_idstr
The id of the project the feature belongs to
- feature_namestr
The name of the Interaction feature to retrieve
- Returns
- featureInteractionFeature
The queried instance
- class datarobot.models.MulticategoricalHistogram(feature_name, histogram)¶
Histogram for Multicategorical feature.
New in version v2.24.
Notes
HistogramValues
contains:values.[].label
: string - Label namevalues.[].plot
: list - Histogram for labelvalues.[].plot.[].label_relevance
: int - Label relevance valuevalues.[].plot.[].row_count
: int - Row count where label has given relevancevalues.[].plot.[].row_pct
: float - Percentage of rows where label has given relevance
- Attributes
- feature_namestr
Name of the feature
- valueslist(dict)
List of Histogram values with a schema described as
HistogramValues
- classmethod get(multilabel_insights_key)¶
Retrieves multicategorical histogram
You might find it more convenient to use
Feature.get_multicategorical_histogram
instead.- Parameters
- multilabel_insights_key: string
Key for multilabel insights, unique for a project, feature and EDA stage combination. The multilabel_insights_key can be retrieved via
Feature.multilabel_insights_key
.
- Returns
- MulticategoricalHistogram
The multicategorical histogram for multilabel_insights_key
- to_dataframe()¶
Convenience method to get all the information from this multicategorical_histogram instance in form of a
pandas.DataFrame
.- Returns
- pandas.DataFrame
Histogram information as a multicategorical_histogram. The dataframe will contain these columns: feature_name, label, label_relevance, row_count and row_pct
- class datarobot.models.PairwiseCorrelations(*args, **kwargs)¶
Correlation of label pairs for multicategorical feature.
New in version v2.24.
Notes
CorrelationValues
contain:values.[].label_configuration
: list of length 2 - Configuration of the label pairvalues.[].label_configuration.[].label
: str – Label namevalues.[].statistic_value
: float – Statistic value
- Attributes
- feature_namestr
Name of the feature
- valueslist(dict)
List of correlation values with a schema described as
CorrelationValues
- statistic_dataframepandas.DataFrame
Correlation values for all label pairs as a DataFrame
- classmethod get(multilabel_insights_key)¶
Retrieves pairwise correlations
You might find it more convenient to use
Feature.get_pairwise_correlations
instead.- Parameters
- multilabel_insights_key: string
Key for multilabel insights, unique for a project, feature and EDA stage combination. The multilabel_insights_key can be retrieved via
Feature.multilabel_insights_key
.
- Returns
- PairwiseCorrelations
The pairwise label correlations
- as_dataframe()¶
The pairwise label correlations as a (num_labels x num_labels) DataFrame.
- Returns
- pandas.DataFrame
The pairwise label correlations. Index and column names allow the interpretation of the values.
- class datarobot.models.PairwiseJointProbabilities(*args, **kwargs)¶
Joint probabilities of label pairs for multicategorical feature.
New in version v2.24.
Notes
ProbabilityValues
contain:values.[].label_configuration
: list of length 2 - Configuration of the label pairvalues.[].label_configuration.[].relevance
: int – 0 for absence of the labels, 1 for the presence of labelsvalues.[].label_configuration.[].label
: str – Label namevalues.[].statistic_value
: float – Statistic value
- Attributes
- feature_namestr
Name of the feature
- valueslist(dict)
List of joint probability values with a schema described as
ProbabilityValues
- statistic_dataframesdict(pandas.DataFrame)
Joint Probability values as DataFrames for different relevance combinations.
E.g. The probability P(A=0,B=1) can be retrieved via:
pairwise_joint_probabilities.statistic_dataframes[(0,1)].loc['A', 'B']
- classmethod get(multilabel_insights_key)¶
Retrieves pairwise joint probabilities
You might find it more convenient to use
Feature.get_pairwise_joint_probabilities
instead.- Parameters
- multilabel_insights_key: string
Key for multilabel insights, unique for a project, feature and EDA stage combination. The multilabel_insights_key can be retrieved via
Feature.multilabel_insights_key
.
- Returns
- PairwiseJointProbabilities
The pairwise joint probabilities
- as_dataframe(relevance_configuration)¶
Joint probabilities of label pairs as a (num_labels x num_labels) DataFrame.
- Parameters
- relevance_configuration: tuple of length 2
Valid options are (0, 0), (0, 1), (1, 0) and (1, 1). Values of 0 indicate absence of labels and 1 indicates presence of labels. The first value describes the presence for the labels in axis=0 and the second value describes the presence for the labels in axis=1.
For example the matrix values for a relevance configuration of (0, 1) describe the probabilities of absent labels in the index axis and present labels in the column axis.
E.g. The probability P(A=0,B=1) can be retrieved via:
pairwise_joint_probabilities.as_dataframe((0,1)).loc['A', 'B']
- Returns
- pandas.DataFrame
The joint probabilities for the requested
relevance_configuration
. Index and column names allow the interpretation of the values.
- class datarobot.models.PairwiseConditionalProbabilities(*args, **kwargs)¶
Conditional probabilities of label pairs for multicategorical feature.
New in version v2.24.
Notes
ProbabilityValues
contain:values.[].label_configuration
: list of length 2 - Configuration of the label pairvalues.[].label_configuration.[].relevance
: int – 0 for absence of the labels, 1 for the presence of labelsvalues.[].label_configuration.[].label
: str – Label namevalues.[].statistic_value
: float – Statistic value
- Attributes
- feature_namestr
Name of the feature
- valueslist(dict)
List of conditional probability values with a schema described as
ProbabilityValues
- statistic_dataframesdict(pandas.DataFrame)
Conditional Probability values as DataFrames for different relevance combinations. The label names in the columns are the events, on which we condition. The label names in the index are the events whose conditional probability given the indexes is in the dataframe.
E.g. The probability P(A=0|B=1) can be retrieved via:
pairwise_conditional_probabilities.statistic_dataframes[(0,1)].loc['A', 'B']
- classmethod get(multilabel_insights_key)¶
Retrieves pairwise conditional probabilities
You might find it more convenient to use
Feature.get_pairwise_conditional_probabilities
instead.- Parameters
- multilabel_insights_key: string
Key for multilabel insights, unique for a project, feature and EDA stage combination. The multilabel_insights_key can be retrieved via
Feature.multilabel_insights_key
.
- Returns
- PairwiseConditionalProbabilities
The pairwise conditional probabilities
- as_dataframe(relevance_configuration)¶
Conditional probabilities of label pairs as a (num_labels x num_labels) DataFrame. The label names in the columns are the events, on which we condition. The label names in the index are the events whose conditional probability given the indexes is in the dataframe.
E.g. The probability P(A=0|B=1) can be retrieved via:
pairwise_conditional_probabilities.as_dataframe((0, 1)).loc['A', 'B']
- Parameters
- relevance_configuration: tuple of length 2
Valid options are (0, 0), (0, 1), (1, 0) and (1, 1). Values of 0 indicate absence of labels and 1 indicates presence of labels. The first value describes the presence for the labels in axis=0 and the second value describes the presence for the labels in axis=1.
For example the matrix values for a relevance configuration of (0, 1) describe the probabilities of absent labels in the index axis given the presence of labels in the column axis.
- Returns
- pandas.DataFrame
The conditional probabilities for the requested
relevance_configuration
. Index and column names allow the interpretation of the values.
Feature Association¶
- class datarobot.models.FeatureAssociationMatrix(strengths=None, features=None, project_id=None)¶
Feature association statistics for a project.
Note
Projects created prior to v2.17 are not supported by this feature.
Examples
import datarobot as dr # retrieve feature association matrix feature_association_matrix = dr.FeatureAssociationMatrix.get(project_id) feature_association_matrix.strengths feature_association_matrix.features # retrieve feature association matrix for a metric, association type or a feature list feature_association_matrix = dr.FeatureAssociationMatrix.get( project_id, metric=enums.FEATURE_ASSOCIATION_METRIC.SPEARMAN, association_type=enums.FEATURE_ASSOCIATION_TYPE.CORRELATION, featurelist_id=featurelist_id, )
- Attributes
- project_idstr
Id of the associated project.
- strengthslist of dict
Pairwise statistics for the available features as structured below.
- featureslist of dict
Metadata for each feature and where it goes in the matrix.
- classmethod get(project_id, metric=None, association_type=None, featurelist_id=None)¶
Get feature association statistics.
- Parameters
- project_idstr
Id of the project that contains the requested associations.
- metricenums.FEATURE_ASSOCIATION_METRIC
The name of a metric to get pairwise data for. Since ‘v2.19’ this is optional and defaults to enums.FEATURE_ASSOCIATION_METRIC.MUTUAL_INFO.
- association_typeenums.FEATURE_ASSOCIATION_TYPE
The type of dependence for the data. Since ‘v2.19’ this is optional and defaults to enums.FEATURE_ASSOCIATION_TYPE.ASSOCIATION.
- featurelist_idstr or None
Optional, the feature list to lookup FAM data for. By default, depending on the type of the project “Informative Features” or “Timeseries Informative Features” list will be used. (New in version v2.19)
- Returns
- FeatureAssociationMatrix
Feature association pairwise metric strength data, feature clustering data, and ordering data for Feature Association Matrix visualization.
- Return type
- classmethod create(project_id, featurelist_id)¶
Compute the Feature Association Matrix for a Feature List
- Parameters
- project_idstr
The ID of the project that the feature list belongs to.
- featurelist_idstr
The ID of the feature list for which insights are requested.
- Returns
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type
Feature Association Matrix Details¶
- class datarobot.models.FeatureAssociationMatrixDetails(project_id=None, chart_type=None, values=None, features=None, types=None, featurelist_id=None)¶
Plotting details for a pair of passed features present in the feature association matrix.
Note
Projects created prior to v2.17 are not supported by this feature.
- Attributes
- project_idstr
Id of the project that contains the requested associations.
- chart_typestr
Which type of plotting the pair of features gets in the UI. e.g. ‘HORIZONTAL_BOX’, ‘VERTICAL_BOX’, ‘SCATTER’ or ‘CONTINGENCY’
- valueslist
The data triplets for pairwise plotting e.g. {“values”: [[460.0, 428.5, 0.001], [1679.3, 259.0, 0.001], …] The first entry of each list is a value of feature1, the second entry of each list is a value of feature2, and the third is the relative frequency of the pair of datapoints in the sample.
- featureslist
A list of the requested features, [feature1, feature2]
- typeslist
The type of feature1 and feature2. Possible values: “CATEGORICAL”, “NUMERIC”
- featurelist_idstr
Id of the feature list to lookup FAM details for.
- classmethod get(project_id, feature1, feature2, featurelist_id=None)¶
Get a sample of the actual values used to measure the association between a pair of features
New in version v2.17.
- Parameters
- project_idstr
Id of the project of interest.
- feature1str
Feature name for the first feature of interest.
- feature2str
Feature name for the second feature of interest.
- featurelist_idstr
Optional, the feature list to lookup FAM data for. By default, depending on the type of the project “Informative Features” or “Timeseries Informative Features” list will be used.
- Returns
- FeatureAssociationMatrixDetails
The feature association plotting for provided pair of features.
- Return type
Feature Association Featurelists¶
- class datarobot.models.FeatureAssociationFeaturelists(project_id=None, featurelists=None)¶
Featurelists with feature association matrix availability flags for a project.
- Attributes
- project_idstr
Id of the project that contains the requested associations.
- featurelistslist fo dict
The featurelists with the featurelist_id, title and the has_fam flag.
- classmethod get(project_id)¶
Get featurelists with feature association status for each.
- Parameters
- project_idstr
Id of the project of interest.
- Returns
- FeatureAssociationFeaturelists
Featurelist with feature association status for each.
- Return type
Feature Discovery¶
Relationships Configuration¶
- class datarobot.models.RelationshipsConfiguration(id, dataset_definitions=None, relationships=None, feature_discovery_mode=None, feature_discovery_settings=None)¶
A Relationships configuration specifies a set of secondary datasets as well as the relationships among them. It is used to configure Feature Discovery for a project to generate features automatically from these datasets.
- Attributes
- idstring
Id of the created relationships configuration
- dataset_definitions: list
Each element is a dataset_definitions for a dataset.
- relationships: list
Each element is a relationship between two datasets
- feature_discovery_mode: str
Mode of feature discovery. Supported values are ‘default’ and ‘manual’
- feature_discovery_settings: list
List of feature discovery settings used to customize the feature discovery process
- The `dataset_definitions` structure is
- identifier: string
Alias of the dataset (used directly as part of the generated feature names)
- catalog_id: str, or None
Identifier of the catalog item
- catalog_version_id: str
Identifier of the catalog item version
- primary_temporal_key: string, optional
Name of the column indicating time of record creation
- feature_list_id: string, optional
Identifier of the feature list. This decides which columns in the dataset are used for feature generation
- snapshot_policy: str
Policy to use when creating a project or making predictions. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)
- feature_lists: list
List of feature list info
- data_source: dict
Data source info if the dataset is from data source
- data_sources: list
List of Data source details for a JDBC datasets
- is_deleted: bool, optional
Whether the dataset is deleted or not
- The `data source info` structured is
- data_store_id: str
Id of the data store.
- data_store_namestr
User-friendly name of the data store.
- urlstr
Url used to connect to the data store.
- dbtablestr
Name of table from the data store.
- schema: str
Schema definition of the table from the data store
- catalog: str
Catalog name of the data source.
- The `feature list info` structure is
- idstr
Id of the featurelist
- namestr
Name of the featurelist
- featureslist of str
Names of all the Features in the featurelist
- dataset_idstr
Project the featurelist belongs to
- creation_datedatetime.datetime
When the featurelist was created
- user_createdbool
Whether the featurelist was created by a user or by DataRobot automation
- created_by: str
Name of user who created it
- descriptionstr
Description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.
- dataset_id: str
Dataset which is associated with the feature list
- dataset_version_id: str or None
Version of the dataset which is associated with feature list. Only relevant for Informative features
- The `relationships` schema is
- dataset1_identifier: str or None
Identifier of the first dataset in this relationship. This is specified in the identifier field of dataset_definition structure. If None, then the relationship is with the primary dataset.
- dataset2_identifier: str
Identifier of the second dataset in this relationship. This is specified in the identifier field of dataset_definition schema.
- dataset1_keys: list of str (max length: 10 min length: 1)
Column(s) from the first dataset which are used to join to the second dataset
- dataset2_keys: list of str (max length: 10 min length: 1)
Column(s) from the second dataset that are used to join to the first dataset
- time_unit: str, or None
Time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR. If present, the feature engineering Graph will perform time-aware joins.
- feature_derivation_window_start: int, or None
How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.
- feature_derivation_window_end: int, or None
How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.
- feature_derivation_window_time_unit: int or None
Time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR If present, time-aware joins will be used. Only applicable when dataset1Identifier is not provided.
- feature_derivation_windows: list of dict, or None
List of feature derivation windows settings. If present, time-aware joins will be used. Only allowed when feature_derivation_window_start, feature_derivation_window_end and feature_derivation_window_time_unit are not provided.
- prediction_point_rounding: int, or None
Closest value of prediction_point_rounding_time_unit to round the prediction point into the past when applying the feature derivation window. Will be a positive integer, if present.Only applicable when dataset1_identifier is not provided.
- prediction_point_rounding_time_unit: str, or None
time unit of the prediction point rounding. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR Only applicable when dataset1_identifier is not provided.
- The `feature_derivation_windows` is a list of dictionary with schema:
- start: int
How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin.
- end: int
How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end.
- unit: string
Time unit of the feature derivation window. One of
datarobot.enums.AllowedTimeUnitsSAFER
.
- The `feature_discovery_settings` structure is:
- name: str
Name of the feature discovery setting
- value: bool
Value of the feature discovery setting
- To see the list of possible settings, create a RelationshipConfiguration without specifying
- settings and check its `feature_discovery_settings` attribute, which is a list of possible
- settings with their default values.
- classmethod create(dataset_definitions, relationships, feature_discovery_settings=None)¶
Create a Relationships Configuration
- Parameters
- dataset_definitions: list of dataset definitions
Each element is a
datarobot.helpers.feature_discovery.DatasetDefinition
- relationships: list of relationships
Each element is a
datarobot.helpers.feature_discovery.Relationship
- feature_discovery_settingslist of feature discovery settings, optional
Each element is a dictionary or a
datarobot.helpers.feature_discovery.FeatureDiscoverySetting
. If not provided, default settings will be used.
- Returns
- relationships_configuration: RelationshipsConfiguration
Created relationships configuration
Examples
dataset_definition = dr.DatasetDefinition( identifier='profile', catalog_id='5fd06b4af24c641b68e4d88f', catalog_version_id='5fd06b4af24c641b68e4d88f' ) relationship = dr.Relationship( dataset2_identifier='profile', dataset1_keys=['CustomerID'], dataset2_keys=['CustomerID'], feature_derivation_window_start=-14, feature_derivation_window_end=-1, feature_derivation_window_time_unit='DAY', prediction_point_rounding=1, prediction_point_rounding_time_unit='DAY' ) dataset_definitions = [dataset_definition] relationships = [relationship] relationship_config = dr.RelationshipsConfiguration.create( dataset_definitions=dataset_definitions, relationships=relationships, feature_discovery_settings = [ {'name': 'enable_categorical_statistics', 'value': True}, {'name': 'enable_numeric_skewness', 'value': True}, ] ) >>> relationship_config.id '5c88a37770fc42a2fcc62759'
- get()¶
Retrieve the Relationships configuration for a given id
- Returns
- relationships_configuration: RelationshipsConfiguration
The requested relationships configuration
- Raises
- ClientError
Raised if an invalid relationships config id is provided.
Examples
relationships_config = dr.RelationshipsConfiguration(valid_config_id) result = relationships_config.get() >>> result.id '5c88a37770fc42a2fcc62759'
- replace(dataset_definitions, relationships, feature_discovery_settings=None)¶
Update the Relationships Configuration which is not used in the feature discovery Project
- Parameters
- dataset_definitions: list of dataset definition
Each element is a
datarobot.helpers.feature_discovery.DatasetDefinition
- relationships: list of relationships
Each element is a
datarobot.helpers.feature_discovery.Relationship
- feature_discovery_settingslist of feature discovery settings, optional
Each element is a dictionary or a
datarobot.helpers.feature_discovery.FeatureDiscoverySetting
. If not provided, default settings will be used.
- Returns
- relationships_configuration: RelationshipsConfiguration
the updated relationships configuration
- delete()¶
Delete the Relationships configuration
- Raises
- ClientError
Raised if an invalid relationships config id is provided.
Examples
# Deleting with a valid id relationships_config = dr.RelationshipsConfiguration(valid_config_id) status_code = relationships_config.delete() status_code >>> 204 relationships_config.get() >>> ClientError: Relationships Configuration not found
Dataset Definition¶
- class datarobot.helpers.feature_discovery.DatasetDefinition(identifier, catalog_id, catalog_version_id, snapshot_policy='latest', feature_list_id=None, primary_temporal_key=None)¶
Dataset definition for the Feature Discovery
New in version v2.25.
Examples
import datarobot as dr dataset_definition = dr.DatasetDefinition( identifier='profile', catalog_id='5ec4aec1f072bc028e3471ae', catalog_version_id='5ec4aec2f072bc028e3471b1', ) dataset_definition = dr.DatasetDefinition( identifier='transaction', catalog_id='5ec4aec1f072bc028e3471ae', catalog_version_id='5ec4aec2f072bc028e3471b1', primary_temporal_key='Date' )
- Attributes
- identifier: string
Alias of the dataset (used directly as part of the generated feature names)
- catalog_id: string, optional
Identifier of the catalog item
- catalog_version_id: string
Identifier of the catalog item version
- primary_temporal_key: string, optional
Name of the column indicating time of record creation
- feature_list_id: string, optional
Identifier of the feature list. This decides which columns in the dataset are used for feature generation
- snapshot_policy: string, optional
Policy to use when creating a project or making predictions. If omitted, by default endpoint will use ‘latest’. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)
Relationship¶
- class datarobot.helpers.feature_discovery.Relationship(dataset2_identifier, dataset1_keys, dataset2_keys, dataset1_identifier=None, feature_derivation_window_start=None, feature_derivation_window_end=None, feature_derivation_window_time_unit=None, feature_derivation_windows=None, prediction_point_rounding=None, prediction_point_rounding_time_unit=None)¶
Relationship between dataset defined in DatasetDefinition
New in version v2.25.
Examples
import datarobot as dr relationship = dr.Relationship( dataset1_identifier='profile', dataset2_identifier='transaction', dataset1_keys=['CustomerID'], dataset2_keys=['CustomerID'] ) relationship = dr.Relationship( dataset2_identifier='profile', dataset1_keys=['CustomerID'], dataset2_keys=['CustomerID'], feature_derivation_window_start=-14, feature_derivation_window_end=-1, feature_derivation_window_time_unit='DAY', prediction_point_rounding=1, prediction_point_rounding_time_unit='DAY' )
- Attributes
- dataset1_identifier: string, optional
Identifier of the first dataset in this relationship. This is specified in the identifier field of dataset_definition structure. If None, then the relationship is with the primary dataset.
- dataset2_identifier: string
Identifier of the second dataset in this relationship. This is specified in the identifier field of dataset_definition schema.
- dataset1_keys: list of string (max length: 10 min length: 1)
Column(s) from the first dataset which are used to join to the second dataset
- dataset2_keys: list of string (max length: 10 min length: 1)
Column(s) from the second dataset that are used to join to the first dataset
- feature_derivation_window_start: int, or None
How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.
- feature_derivation_window_end: int, optional
How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.
- feature_derivation_window_time_unit: int, optional
Time unit of the feature derivation window. One of
datarobot.enums.AllowedTimeUnitsSAFER
If present, time-aware joins will be used. Only applicable when dataset1_identifier is not provided.- feature_derivation_windows: list of dict, or None
List of feature derivation windows settings. If present, time-aware joins will be used. Only allowed when feature_derivation_window_start, feature_derivation_window_end and feature_derivation_window_time_unit are not provided.
- prediction_point_rounding: int, optional
Closest value of prediction_point_rounding_time_unit to round the prediction point into the past when applying the feature derivation window. Will be a positive integer, if present.Only applicable when dataset1_identifier is not provided.
- prediction_point_rounding_time_unit: string, optional
Time unit of the prediction point rounding. One of
datarobot.enums.AllowedTimeUnitsSAFER
Only applicable when dataset1_identifier is not provided.- The `feature_derivation_windows` is a list of dictionary with schema:
- start: int
How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin.
- end: int
How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end.
- unit: string
Time unit of the feature derivation window. One of
datarobot.enums.AllowedTimeUnitsSAFER
.
Feature Lineage¶
- class datarobot.models.FeatureLineage(steps=None)¶
Lineage of an automatically engineered feature.
- Attributes
- steps: list
list of steps which were applied to build the feature.
- `steps` structure is:
- idint
step id starting with 0.
- step_type: str
one of the data/action/json/generatedData.
- name: str
name of the step.
- description: str
description of the step.
- parents: list[int]
references to other steps id.
- is_time_aware: bool
indicator of step being time aware. Mandatory only for action and join steps. action step provides additional information about feature derivation window in the timeInfo field.
- catalog_id: str
id of the catalog for a data step.
- catalog_version_id: str
id of the catalog version for a data step.
- group_by: list[str]
list of columns which this action step aggregated by.
- columns: list
names of columns involved into the feature generation. Available only for data steps.
- time_info: dict
description of the feature derivation window which was applied to this action step.
- join_info: list[dict]
join step details.
- `columns` structure is
- data_type: str
the type of the feature, e.g. ‘Categorical’, ‘Text’
- is_input: bool
indicates features which provided data to transform in this lineage.
- name: str
feature name.
- is_cutoff: bool
indicates a cutoff column.
- `time_info` structure is:
- latest: dict
end of the feature derivation window applied.
- duration: dict
size of the feature derivation window applied.
- `latest` and `duration` structure is:
- time_unit: str
time unit name like ‘MINUTE’, ‘DAY’, ‘MONTH’ etc.
- duration: int
value/size of this duration object.
- `join_info` structure is:
- join_type: str
kind of join, left/right.
- left_table: dict
information about a dataset which was considered as left.
- right_table: str
information about a dataset which was considered as right.
- `left_table` and `right_table` structure is:
- columns: list[str]
list of columns which datasets were joined by.
- datasteps: list[int]
list of data steps id which brought the columns into the current step dataset.
- classmethod get(project_id, id)¶
Retrieve a single FeatureLineage.
- Parameters
- project_idstr
The id of the project the feature belongs to
- idstr
id of a feature lineage to retrieve
- Returns
- lineageFeatureLineage
The queried instance
Secondary Dataset Configurations¶
- class datarobot.models.SecondaryDatasetConfigurations(id, project_id, config=None, secondary_datasets=None, name=None, creator_full_name=None, creator_user_id=None, created=None, featurelist_id=None, credential_ids=None, is_default=None, project_version=None)¶
Create secondary dataset configurations for a given project
New in version v2.20.
- Attributes
- idstr
Id of this secondary dataset configuration
- project_idstr
Id of the associated project.
- config: list of DatasetConfiguration (Deprecated in version v2.23)
List of secondary dataset configurations
- secondary_datasets: list of SecondaryDataset (new in v2.23)
List of secondary datasets (secondaryDataset)
- name: str
Verbose name of the SecondaryDatasetConfig. null if it wasn’t specified.
- created: datetime.datetime
DR-formatted datetime. null for legacy (before DR 6.0) db records.
- creator_user_id: str
Id of the user created this config.
- creator_full_name: str
fullname or email of the user created this config.
- featurelist_id: str, optional
Id of the feature list. null if it wasn’t specified.
- credential_ids: list of DatasetsCredentials, optional
credentials used by the secondary datasets if the datasets used in the configuration are from datasource
- is_default: bool, optional
Boolean flag if default config created during feature discovery aim
- project_version: str, optional
Version of project when its created (Release version)
- classmethod create(project_id, secondary_datasets, name, featurelist_id=None)¶
create secondary dataset configurations
New in version v2.20.
- Parameters
- project_idstr
id of the associated project.
- secondary_datasets: list of SecondaryDataset (New in version v2.23)
list of secondary datasets used by the configuration each element is a
datarobot.helpers.feature_discovery.SecondaryDataset
- name: str (New in version v2.23)
Name of the secondary datasets configuration
- featurelist_id: str, or None (New in version v2.23)
Id of the featurelist
- Returns
- an instance of SecondaryDatasetConfigurations
- Raises
- ClientError
raised if incorrect configuration parameters are provided
Examples
profile_secondary_dataset = dr.SecondaryDataset( identifier='profile', catalog_id='5ec4aec1f072bc028e3471ae', catalog_version_id='5ec4aec2f072bc028e3471b1', snapshot_policy='latest' ) transaction_secondary_dataset = dr.SecondaryDataset( identifier='transaction', catalog_id='5ec4aec268f0f30289a03901', catalog_version_id='5ec4aec268f0f30289a03900', snapshot_policy='latest' ) secondary_datasets = [profile_secondary_dataset, transaction_secondary_dataset] new_secondary_dataset_config = dr.SecondaryDatasetConfigurations.create( project_id=project.id, name='My config', secondary_datasets=secondary_datasets ) >>> new_secondary_dataset_config.id '5fd1e86c589238a4e635e93d'
- Return type
- delete()¶
Removes the Secondary datasets configuration
New in version v2.21.
- Raises
- ClientError
Raised if an invalid or already deleted secondary dataset config id is provided
Examples
# Deleting with a valid secondary_dataset_config id status_code = dr.SecondaryDatasetConfigurations.delete(some_config_id) status_code >>> 204
- Return type
None
- get()¶
Retrieve a single secondary dataset configuration for a given id
New in version v2.21.
- Returns
- secondary_dataset_configurationsSecondaryDatasetConfigurations
The requested secondary dataset configurations
Examples
config_id = '5fd1e86c589238a4e635e93d' secondary_dataset_config = dr.SecondaryDatasetConfigurations(id=config_id).get() >>> secondary_dataset_config { 'created': datetime.datetime(2020, 12, 9, 6, 16, 22, tzinfo=tzutc()), 'creator_full_name': u'[email protected]', 'creator_user_id': u'asdf4af1gf4bdsd2fba1de0a', 'credential_ids': None, 'featurelist_id': None, 'id': u'5fd1e86c589238a4e635e93d', 'is_default': True, 'name': u'My config', 'project_id': u'5fd06afce2456ec1e9d20457', 'project_version': None, 'secondary_datasets': [ { 'snapshot_policy': u'latest', 'identifier': u'profile', 'catalog_version_id': u'5fd06b4af24c641b68e4d88f', 'catalog_id': u'5fd06b4af24c641b68e4d88e' }, { 'snapshot_policy': u'dynamic', 'identifier': u'transaction', 'catalog_version_id': u'5fd1e86c589238a4e635e98e', 'catalog_id': u'5fd1e86c589238a4e635e98d' } ] }
- Return type
- classmethod list(project_id, featurelist_id=None, limit=None, offset=None)¶
Returns list of secondary dataset configurations.
New in version v2.23.
- Parameters
- project_id: str
The Id of project
- featurelist_id: str, optional
Id of the feature list to filter the secondary datasets configurations
- Returns
- secondary_dataset_configurationslist of SecondaryDatasetConfigurations
The requested list of secondary dataset configurations for a given project
Examples
pid = '5fd06afce2456ec1e9d20457' secondary_dataset_configs = dr.SecondaryDatasetConfigurations.list(pid) >>> secondary_dataset_configs[0] { 'created': datetime.datetime(2020, 12, 9, 6, 16, 22, tzinfo=tzutc()), 'creator_full_name': u'[email protected]', 'creator_user_id': u'asdf4af1gf4bdsd2fba1de0a', 'credential_ids': None, 'featurelist_id': None, 'id': u'5fd1e86c589238a4e635e93d', 'is_default': True, 'name': u'My config', 'project_id': u'5fd06afce2456ec1e9d20457', 'project_version': None, 'secondary_datasets': [ { 'snapshot_policy': u'latest', 'identifier': u'profile', 'catalog_version_id': u'5fd06b4af24c641b68e4d88f', 'catalog_id': u'5fd06b4af24c641b68e4d88e' }, { 'snapshot_policy': u'dynamic', 'identifier': u'transaction', 'catalog_version_id': u'5fd1e86c589238a4e635e98e', 'catalog_id': u'5fd1e86c589238a4e635e98d' } ] }
- Return type
Secondary Dataset¶
- class datarobot.helpers.feature_discovery.SecondaryDataset(identifier, catalog_id, catalog_version_id, snapshot_policy='latest')¶
A secondary dataset to be used for feature discovery
New in version v2.25.
Examples
import datarobot as dr dataset_definition = dr.SecondaryDataset( identifier='profile', catalog_id='5ec4aec1f072bc028e3471ae', catalog_version_id='5ec4aec2f072bc028e3471b1', )
- Attributes
- identifier: string
Alias of the dataset (used directly as part of the generated feature names)
- catalog_id: string
Identifier of the catalog item
- catalog_version_id: string
Identifier of the catalog item version
- snapshot_policy: string, optional
Policy to use while creating a project or making predictions. If omitted, by default endpoint will use ‘latest’. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)
Feature Effects¶
- class datarobot.models.FeatureEffects(project_id, model_id, source, feature_effects, data_slice_id=None, backtest_index=None)¶
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Notes
featureEffects
is a dict containing the following:feature_name
(string) Name of the featurefeature_type
(string) dr.enums.FEATURE_TYPE, Feature type either numeric, categorical or datetimefeature_impact_score
(float) Feature impact scoreweight_label
(string) optional, Weight label if configured for the project else nullpartial_dependence
(List) Partial dependence resultspredicted_vs_actual
(List) optional, Predicted versus actual results, may be omitted if there are insufficient qualified samples
partial_dependence
is a dict containing the following:is_capped
(bool) Indicates whether the data for computation is cappeddata
(List) partial dependence results in the following format
data
is a list of dict containing the following:label
(string) Contains label for categorical and numeric features as stringdependence
(float) Value of partial dependence
predicted_vs_actual
is a dict containing the following:is_capped
(bool) Indicates whether the data for computation is cappeddata
(List) pred vs actual results in the following format
data
is a list of dict containing the following:label
(string) Contains label for categorical features for numeric features contains range or numeric value.bin
(List) optional, For numeric features contains labels for left and right bin limitspredicted
(float) Predicted valueactual
(float) Actual value. Actual value is null for unsupervised timeseries modelsrow_count
(int or float) Number of rows for the label and bin. Type is float if weight or exposure is set for the project.
- Attributes
- project_id: string
The project that contains requested model
- model_id: string
The model to retrieve Feature Effects for
- source: string
The source to retrieve Feature Effects for
- data_slice_id: string or None
The slice to retrieve Feature Effects for; if None, retrieve unsliced data
- feature_effects: list
Feature Effects for every feature
- backtest_index: string, required only for DatetimeModels,
The backtest index to retrieve Feature Effects for.
- classmethod from_server_data(data, *args, use_insights_format=False, **kwargs)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing.
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- use_insights_formatbool, optional
Whether to repack the data from the format used in the GET /insights/featureEffects/ URL to the format used in the legacy URL.
- class datarobot.models.FeatureEffectMetadata(status, sources)¶
Feature Effect Metadata for model, contains status and available model sources.
Notes
source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.
- class datarobot.models.FeatureEffectMetadataDatetime(data)¶
Feature Effect Metadata for datetime model, contains list of feature effect metadata per backtest.
Notes
feature effect metadata per backtest
contains:status
: string.backtest_index
: string.sources
: list(string).
source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.
backtest_index is expected parameter to submit compute request and retrieve Feature Effect. One of provided backtest indexes shall be used.
- Attributes
- datalist[FeatureEffectMetadataDatetimePerBacktest]
List feature effect metadata per backtest
- class datarobot.models.FeatureEffectMetadataDatetimePerBacktest(ff_metadata_datetime_per_backtest)¶
Convert dictionary into feature effect metadata per backtest which contains backtest_index, status and sources.
Feature List¶
- class datarobot.DatasetFeaturelist(id=None, name=None, features=None, dataset_id=None, dataset_version_id=None, creation_date=None, created_by=None, user_created=None, description=None)¶
A set of features attached to a dataset in the AI Catalog
- Attributes
- idstr
the id of the dataset featurelist
- dataset_idstr
the id of the dataset the featurelist belongs to
- dataset_version_id: str, optional
the version id of the dataset this featurelist belongs to
- namestr
the name of the dataset featurelist
- featureslist of str
a list of the names of features included in this dataset featurelist
- creation_datedatetime.datetime
when the featurelist was created
- created_bystr
the user name of the user who created this featurelist
- user_createdbool
whether the featurelist was created by a user or by DataRobot automation
- descriptionstr, optional
the description of the featurelist. Only present on DataRobot-created featurelists.
- classmethod get(dataset_id, featurelist_id)¶
Retrieve a dataset featurelist
- Parameters
- dataset_idstr
the id of the dataset the featurelist belongs to
- featurelist_idstr
the id of the dataset featurelist to retrieve
- Returns
- featurelistDatasetFeatureList
the specified featurelist
- Return type
TypeVar
(TDatasetFeaturelist
, bound=DatasetFeaturelist
)
- delete()¶
Delete a dataset featurelist
Featurelists configured into the dataset as a default featurelist cannot be deleted.
- Return type
None
- update(name=None)¶
Update the name of an existing featurelist
Note that only user-created featurelists can be renamed, and that names must not conflict with names used by other featurelists.
- Parameters
- namestr, optional
the new name for the featurelist
- Return type
None
- class datarobot.models.Featurelist(id=None, name=None, features=None, project_id=None, created=None, is_user_created=None, num_models=None, description=None)¶
A set of features used in modeling
- Attributes
- idstr
the id of the featurelist
- namestr
the name of the featurelist
- featureslist of str
the names of all the Features in the featurelist
- project_idstr
the project the featurelist belongs to
- createddatetime.datetime
(New in version v2.13) when the featurelist was created
- is_user_createdbool
(New in version v2.13) whether the featurelist was created by a user or by DataRobot automation
- num_modelsint
(New in version v2.13) the number of models currently using this featurelist. A model is considered to use a featurelist if it is used to train the model or as a monotonic constraint featurelist, or if the model is a blender with at least one component model using the featurelist.
- descriptionstr
(New in version v2.13) the description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.
- classmethod from_data(data)¶
Overrides the parent method to ensure description is always populated
- Parameters
- datadict
the data from the server, having gone through processing
- Return type
TypeVar
(TFeaturelist
, bound=Featurelist
)
- classmethod get(project_id, featurelist_id)¶
Retrieve a known feature list
- Parameters
- project_idstr
The id of the project the featurelist is associated with
- featurelist_idstr
The ID of the featurelist to retrieve
- Returns
- featurelistFeaturelist
The queried instance
- Raises
- ValueError
passed
project_id
parameter value is of not supported type
- Return type
TypeVar
(TFeaturelist
, bound=Featurelist
)
- delete(dry_run=False, delete_dependencies=False)¶
Delete a featurelist, and any models and jobs using it
All models using a featurelist, whether as the training featurelist or as a monotonic constraint featurelist, will also be deleted when the deletion is executed and any queued or running jobs using it will be cancelled. Similarly, predictions made on these models will also be deleted. All the entities that are to be deleted with a featurelist are described as “dependencies” of it. To preview the results of deleting a featurelist, call delete with dry_run=True
When deleting a featurelist with dependencies, users must specify delete_dependencies=True to confirm they want to delete the featurelist and all its dependencies. Without that option, only featurelists with no dependencies may be successfully deleted and others will error.
Featurelists configured into the project as a default featurelist or as a default monotonic constraint featurelist cannot be deleted.
Featurelists used in a model deployment cannot be deleted until the model deployment is deleted.
- Parameters
- dry_runbool, optional
specify True to preview the result of deleting the featurelist, instead of actually deleting it.
- delete_dependenciesbool, optional
specify True to successfully delete featurelists with dependencies; if left False by default, featurelists without dependencies can be successfully deleted and those with dependencies will error upon attempting to delete them.
- Returns
- resultdict
- A dictionary describing the result of deleting the featurelist, with the following keys
dry_run : bool, whether the deletion was a dry run or an actual deletion
can_delete : bool, whether the featurelist can actually be deleted
deletion_blocked_reason : str, why the featurelist can’t be deleted (if it can’t)
num_affected_models : int, the number of models using this featurelist
num_affected_jobs : int, the number of jobs using this featurelist
- Return type
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
TypeVar
(T
, bound=APIObject
)
- update(name=None, description=None)¶
Update the name or description of an existing featurelist
Note that only user-created featurelists can be renamed, and that names must not conflict with names used by other featurelists.
- Parameters
- namestr, optional
the new name for the featurelist
- descriptionstr, optional
the new description for the featurelist
- Return type
None
- class datarobot.models.ModelingFeaturelist(id=None, name=None, features=None, project_id=None, created=None, is_user_created=None, num_models=None, description=None)¶
A set of features that can be used to build a model
In time series projects, a new set of modeling features is created after setting the partitioning options. These features are automatically derived from those in the project’s dataset and are the features used for modeling. Modeling features are only accessible once the target and partitioning options have been set. In projects that don’t use time series modeling, once the target has been set, ModelingFeaturelists and Featurelists will behave the same.
For more information about input and modeling features, see the time series documentation.
- Attributes
- idstr
the id of the modeling featurelist
- project_idstr
the id of the project the modeling featurelist belongs to
- namestr
the name of the modeling featurelist
- featureslist of str
a list of the names of features included in this modeling featurelist
- createddatetime.datetime
(New in version v2.13) when the featurelist was created
- is_user_createdbool
(New in version v2.13) whether the featurelist was created by a user or by DataRobot automation
- num_modelsint
(New in version v2.13) the number of models currently using this featurelist. A model is considered to use a featurelist if it is used to train the model or as a monotonic constraint featurelist, or if the model is a blender with at least one component model using the featurelist.
- descriptionstr
(New in version v2.13) the description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.
- classmethod get(project_id, featurelist_id)¶
Retrieve a modeling featurelist
Modeling featurelists can only be retrieved once the target and partitioning options have been set.
- Parameters
- project_idstr
the id of the project the modeling featurelist belongs to
- featurelist_idstr
the id of the modeling featurelist to retrieve
- Returns
- featurelistModelingFeaturelist
the specified featurelist
- Return type
TypeVar
(TModelingFeaturelist
, bound=ModelingFeaturelist
)
- update(name=None, description=None)¶
Update the name or description of an existing featurelist
Note that only user-created featurelists can be renamed, and that names must not conflict with names used by other featurelists.
- Parameters
- namestr, optional
the new name for the featurelist
- descriptionstr, optional
the new description for the featurelist
- Return type
None
- delete(dry_run=False, delete_dependencies=False)¶
Delete a featurelist, and any models and jobs using it
All models using a featurelist, whether as the training featurelist or as a monotonic constraint featurelist, will also be deleted when the deletion is executed and any queued or running jobs using it will be cancelled. Similarly, predictions made on these models will also be deleted. All the entities that are to be deleted with a featurelist are described as “dependencies” of it. To preview the results of deleting a featurelist, call delete with dry_run=True
When deleting a featurelist with dependencies, users must specify delete_dependencies=True to confirm they want to delete the featurelist and all its dependencies. Without that option, only featurelists with no dependencies may be successfully deleted and others will error.
Featurelists configured into the project as a default featurelist or as a default monotonic constraint featurelist cannot be deleted.
Featurelists used in a model deployment cannot be deleted until the model deployment is deleted.
- Parameters
- dry_runbool, optional
specify True to preview the result of deleting the featurelist, instead of actually deleting it.
- delete_dependenciesbool, optional
specify True to successfully delete featurelists with dependencies; if left False by default, featurelists without dependencies can be successfully deleted and those with dependencies will error upon attempting to delete them.
- Returns
- resultdict
- A dictionary describing the result of deleting the featurelist, with the following keys
dry_run : bool, whether the deletion was a dry run or an actual deletion
can_delete : bool, whether the featurelist can actually be deleted
deletion_blocked_reason : str, why the featurelist can’t be deleted (if it can’t)
num_affected_models : int, the number of models using this featurelist
num_affected_jobs : int, the number of jobs using this featurelist
- Return type
- class datarobot.models.featurelist.DeleteFeatureListResult() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
Restoring Discarded Features¶
- class datarobot.models.restore_discarded_features.DiscardedFeaturesInfo(total_restore_limit, remaining_restore_limit, count, features)¶
An object containing information about time series features which were reduced during time series feature generation process. These features can be restored back to the project. They will be included into All Time Series Features and can be used to create new feature lists.
New in version v2.27.
- Attributes
- total_restore_limitint
The total limit indicating how many features can be restored in this project.
- remaining_restore_limitint
The remaining available number of the features which can be restored in this project.
- featureslist of strings
Discarded features which can be restored.
- countint
Discarded features count.
- classmethod restore(project_id, features_to_restore, max_wait=600)¶
Restore discarded during time series feature generation process features back to the project. After restoration features will be included into All Time Series Features.
New in version v2.27.
- Parameters
- project_id: string
- features_to_restore: list of strings
List of the feature names to restore
- max_wait: int, optional
max time to wait for features to be restored. Defaults to 10 min
- Returns
- status: FeatureRestorationStatus
information about features which were restored and which were not.
- Return type
- classmethod retrieve(project_id)¶
Retrieve the discarded features information for a given project.
New in version v2.27.
- Parameters
- project_id: string
- Returns
- info: DiscardedFeaturesInfo
information about features which were discarded during feature generation process and limits how many features can be restored.
- Return type
- class datarobot.models.restore_discarded_features.FeatureRestorationStatus(warnings, features_to_restore)¶
Status of the feature restoration process.
New in version v2.27.
- Attributes
- warningslist of strings
Warnings generated for those features which failed to restore
- remaining_restore_limitint
The remaining available number of the features which can be restored in this project.
- restored_featureslist of strings
Features which were restored
Job¶
- class datarobot.models.Job(data, completed_resource_url=None)¶
Tracks asynchronous work being done within a project
- Attributes
- idint
the id of the job
- project_idstr
the id of the project the job belongs to
- statusstr
the status of the job - will be one of
datarobot.enums.QUEUE_STATUS
- job_typestr
what kind of work the job is doing - will be one of
datarobot.enums.JOB_TYPE
- is_blockedbool
if true, the job is blocked (cannot be executed) until its dependencies are resolved
- classmethod get(project_id, job_id)¶
Fetches one job.
- Parameters
- project_idstr
The identifier of the project in which the job resides
- job_idstr
The job id
- Returns
- jobJob
The job
- Raises
- AsyncFailureError
Querying this resource gave a status code other than 200 or 303
- Return type
- cancel()¶
Cancel this job. If this job has not finished running, it will be removed and canceled.
- get_result(params=None)¶
- Parameters
- paramsdict or None
Query parameters to be added to request to get results.
- For featureEffects, source param is required to define source,
- otherwise the default is `training`
- Returns
- resultobject
- Return type depends on the job type:
for model jobs, a Model is returned
for predict jobs, a pandas.DataFrame (with predictions) is returned
for featureImpact jobs, a list of dicts by default (see
with_metadata
parameter of theFeatureImpactJob
class and itsget()
method).for primeRulesets jobs, a list of Rulesets
for primeModel jobs, a PrimeModel
for primeDownloadValidation jobs, a PrimeFile
for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
for predictionExplanations jobs, a PredictionExplanations
for featureEffects, a FeatureEffects
- Raises
- JobNotFinished
If the job is not finished, the result is not available.
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
- get_result_when_complete(max_wait=600, params=None)¶
- Parameters
- max_waitint, optional
How long to wait for the job to finish.
- paramsdict, optional
Query parameters to be added to request.
- Returns
- result: object
Return type is the same as would be returned by Job.get_result.
- Raises
- AsyncTimeoutError
If the job does not finish in time
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
- refresh()¶
Update this object with the latest job data from the server.
- wait_for_completion(max_wait=600)¶
Waits for job to complete.
- Parameters
- max_waitint, optional
How long to wait for the job to finish.
- Return type
None
- class datarobot.models.TrainingPredictionsJob(data, model_id, data_subset, **kwargs)¶
- classmethod get(project_id, job_id, model_id=None, data_subset=None)¶
Fetches one training predictions job.
The resulting
TrainingPredictions
object will be annotated with model_id and data_subset.- Parameters
- project_idstr
The identifier of the project in which the job resides
- job_idstr
The job id
- model_idstr
The identifier of the model used for computing training predictions
- data_subsetdr.enums.DATA_SUBSET, optional
Data subset used for computing training predictions
- Returns
- jobTrainingPredictionsJob
The job
- refresh()¶
Update this object with the latest job data from the server.
- cancel()¶
Cancel this job. If this job has not finished running, it will be removed and canceled.
- get_result(params=None)¶
- Parameters
- paramsdict or None
Query parameters to be added to request to get results.
- For featureEffects, source param is required to define source,
- otherwise the default is `training`
- Returns
- resultobject
- Return type depends on the job type:
for model jobs, a Model is returned
for predict jobs, a pandas.DataFrame (with predictions) is returned
for featureImpact jobs, a list of dicts by default (see
with_metadata
parameter of theFeatureImpactJob
class and itsget()
method).for primeRulesets jobs, a list of Rulesets
for primeModel jobs, a PrimeModel
for primeDownloadValidation jobs, a PrimeFile
for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
for predictionExplanations jobs, a PredictionExplanations
for featureEffects, a FeatureEffects
- Raises
- JobNotFinished
If the job is not finished, the result is not available.
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
- get_result_when_complete(max_wait=600, params=None)¶
- Parameters
- max_waitint, optional
How long to wait for the job to finish.
- paramsdict, optional
Query parameters to be added to request.
- Returns
- result: object
Return type is the same as would be returned by Job.get_result.
- Raises
- AsyncTimeoutError
If the job does not finish in time
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
- wait_for_completion(max_wait=600)¶
Waits for job to complete.
- Parameters
- max_waitint, optional
How long to wait for the job to finish.
- Return type
None
- class datarobot.models.ShapMatrixJob(data, model_id=None, dataset_id=None, **kwargs)¶
- classmethod get(project_id, job_id, model_id=None, dataset_id=None)¶
Fetches one SHAP matrix job.
- Parameters
- project_idstr
The identifier of the project in which the job resides
- job_idstr
The job identifier
- model_idstr
The identifier of the model used for computing prediction explanations
- dataset_idstr
The identifier of the dataset against which prediction explanations should be computed
- Returns
- jobShapMatrixJob
The job
- Raises
- AsyncFailureError
Querying this resource gave a status code other than 200 or 303
- Return type
- refresh()¶
Update this object with the latest job data from the server.
- Return type
None
- cancel()¶
Cancel this job. If this job has not finished running, it will be removed and canceled.
- get_result(params=None)¶
- Parameters
- paramsdict or None
Query parameters to be added to request to get results.
- For featureEffects, source param is required to define source,
- otherwise the default is `training`
- Returns
- resultobject
- Return type depends on the job type:
for model jobs, a Model is returned
for predict jobs, a pandas.DataFrame (with predictions) is returned
for featureImpact jobs, a list of dicts by default (see
with_metadata
parameter of theFeatureImpactJob
class and itsget()
method).for primeRulesets jobs, a list of Rulesets
for primeModel jobs, a PrimeModel
for primeDownloadValidation jobs, a PrimeFile
for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
for predictionExplanations jobs, a PredictionExplanations
for featureEffects, a FeatureEffects
- Raises
- JobNotFinished
If the job is not finished, the result is not available.
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
- get_result_when_complete(max_wait=600, params=None)¶
- Parameters
- max_waitint, optional
How long to wait for the job to finish.
- paramsdict, optional
Query parameters to be added to request.
- Returns
- result: object
Return type is the same as would be returned by Job.get_result.
- Raises
- AsyncTimeoutError
If the job does not finish in time
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
- wait_for_completion(max_wait=600)¶
Waits for job to complete.
- Parameters
- max_waitint, optional
How long to wait for the job to finish.
- Return type
None
- class datarobot.models.FeatureImpactJob(data, completed_resource_url=None, with_metadata=False)¶
Custom Feature Impact job to handle different return value structures.
The original implementation had just the the data and the new one also includes some metadata.
In general, we aim to keep the number of Job classes low by just utilizing the job_type attribute to control any specific formatting; however in this case when we needed to support a new representation with the _same_ job_type, customizing the behavior of _make_result_from_location allowed us to achieve our ends without complicating the _make_result_from_json method.
- classmethod get(project_id, job_id, with_metadata=False)¶
Fetches one job.
- Parameters
- project_idstr
The identifier of the project in which the job resides
- job_idstr
The job id
- with_metadatabool
To make this job return the metadata (i.e. the full object of the completed resource) set the with_metadata flag to True.
- Returns
- jobJob
The job
- Raises
- AsyncFailureError
Querying this resource gave a status code other than 200 or 303
- cancel()¶
Cancel this job. If this job has not finished running, it will be removed and canceled.
- get_result(params=None)¶
- Parameters
- paramsdict or None
Query parameters to be added to request to get results.
- For featureEffects, source param is required to define source,
- otherwise the default is `training`
- Returns
- resultobject
- Return type depends on the job type:
for model jobs, a Model is returned
for predict jobs, a pandas.DataFrame (with predictions) is returned
for featureImpact jobs, a list of dicts by default (see
with_metadata
parameter of theFeatureImpactJob
class and itsget()
method).for primeRulesets jobs, a list of Rulesets
for primeModel jobs, a PrimeModel
for primeDownloadValidation jobs, a PrimeFile
for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
for predictionExplanations jobs, a PredictionExplanations
for featureEffects, a FeatureEffects
- Raises
- JobNotFinished
If the job is not finished, the result is not available.
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
- get_result_when_complete(max_wait=600, params=None)¶
- Parameters
- max_waitint, optional
How long to wait for the job to finish.
- paramsdict, optional
Query parameters to be added to request.
- Returns
- result: object
Return type is the same as would be returned by Job.get_result.
- Raises
- AsyncTimeoutError
If the job does not finish in time
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
- refresh()¶
Update this object with the latest job data from the server.
- wait_for_completion(max_wait=600)¶
Waits for job to complete.
- Parameters
- max_waitint, optional
How long to wait for the job to finish.
- Return type
None
Lift Chart¶
- class datarobot.models.lift_chart.LiftChart(source, bins, source_model_id, target_class, data_slice_id=None)¶
Lift chart data for model.
Notes
LiftChartBin
is a dict containing the following:actual
(float) Sum of actual target values in binpredicted
(float) Sum of predicted target values in binbin_weight
(float) The weight of the bin. For weighted projects, it is the sum of the weights of the rows in the bin. For unweighted projects, it is the number of rows in the bin.
- Attributes
- sourcestr
Lift chart data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.
- binslist of dict
List of dicts with schema described as
LiftChartBin
above.- source_model_idstr
ID of the model this lift chart represents; in some cases, insights from the parent of a frozen model may be used
- target_classstr, optional
For multiclass lift - target class for this lift chart data.
- data_slice_id: string or None
The slice to retrieve Lift Chart for; if None, retrieve unsliced data.
- classmethod from_server_data(data, keep_attrs=None, use_insights_format=False, **kwargs)¶
Overwrite APIObject.from_server_data to handle lift chart data retrieved from either legacy URL or /insights/ new URL.
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- use_insights_formatbool, optional
Whether to repack the data from the format used in the GET /insights/liftChart/ URL to the format used in the legacy URL.
Missing Values Report¶
- class datarobot.models.missing_report.MissingValuesReport(missing_values_report)¶
Missing values report for model, contains list of reports per feature sorted by missing count in descending order.
Notes
Report per feature
contains:feature
: feature name.type
: feature type – ‘Numeric’ or ‘Categorical’.missing_count
: missing values count in training data.missing_percentage
: missing values percentage in training data.tasks
: list of information per each task, which was applied to feature.
task information
contains:id
: a number of task in the blueprint diagram.name
: task name.descriptions
: human readable aggregated information about how the task handles missing values. The following descriptions may be present: what value is imputed for missing values, whether the feature being missing is treated as a feature by the task, whether missing values are treated as infrequent values, whether infrequent values are treated as missing values, and whether missing values are ignored.
- classmethod get(project_id, model_id)¶
Retrieve a missing report.
- Parameters
- project_idstr
The project’s id.
- model_idstr
The model’s id.
- Returns
- MissingValuesReport
The queried missing report.
- Return type
Models¶
GenericModel¶
- class datarobot.models.GenericModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, is_starred=None, model_family=None, model_number=None, parent_model_id=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, is_trained_into_validation=None, is_trained_into_holdout=None, number_of_clusters=None)¶
GenericModel [ModelRecord] is the object which is returned from /modelRecords list route. Contains most generic model information.
Model¶
- class datarobot.models.Model(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, is_n_clusters_dynamically_determined=None, blueprint_id=None, metrics=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, n_clusters=None, has_empty_clusters=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)¶
A model trained on a project’s dataset capable of making predictions.
All durations are specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. See datetime partitioned project documentation for more information on duration strings.- Attributes
- idstr
ID of the model.
- project_idstr
ID of the project the model belongs to.
- processeslist of str
Processes used by the model.
- featurelist_namestr
Name of the featurelist used by the model.
- featurelist_idstr
ID of the featurelist used by the model.
- sample_pctfloat or None
Percentage of the project dataset used in model training. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date / training_end_date instead.
- training_row_countint or None
Number of rows of the project dataset used in model training. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date is used for training_row_count.
- training_durationstr or None
For datetime partitioned projects only. If specified, defines the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_datedatetime or None
For frozen models in datetime partitioned projects only. If specified, the start date of the data used to train the model.
- training_end_datedatetime or None
For frozen models in datetime partitioned projects only. If specified, the end date of the data used to train the model.
- model_typestr
Type of model, for example ‘Nystroem Kernel SVM Regressor’.
- model_categorystr
Category of model, for example ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models.
- is_frozenbool
Whether this model is a frozen model.
- is_n_clusters_dynamically_determinedbool
(New in version v2.27) Optional. Whether this model determines the number of clusters dynamically.
- blueprint_idstr
ID of the blueprint used to build this model.
- metricsdict
Mapping from each metric to the model’s score for that metric.
- monotonic_increasing_featurelist_idstr
Optional. ID of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_idstr
Optional. ID of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- n_clustersint
(New in version v2.27) Optional. Number of data clusters discovered by model.
- has_empty_clusters: bool
(New in version v2.27) Optional. Whether clustering model produces empty clusters.
- supports_monotonic_constraintsbool
Optional. Whether this model supports enforcing monotonic constraints.
- is_starredbool
Whether this model is marked as a starred model.
- prediction_thresholdfloat
Binary classification projects only. Threshold used for predictions.
- prediction_threshold_read_onlybool
Whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- model_numberinteger
Model number assigned to the model.
- parent_model_idstr or None
(New in version v2.20) ID of the model that tuning parameters are derived from.
- supports_composable_mlbool or None
(New in version v2.26) Whether this model is supported Composable ML.
- classmethod get(project, model_id)¶
Retrieve a specific model.
- Parameters
- projectstr
Project ID.
- model_idstr
ID of the model to retrieve.
- Returns
- modelModel
Queried instance.
- Raises
- ValueError
passed
project
parameter value is of not supported type
- Return type
- advanced_tune(params, description=None)¶
Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Parameters
- paramsdict
Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- descriptionstr
Human-readable string describing the newly advanced-tuned model
- Returns
- ModelJob
The created job to build the model
- Return type
- cross_validate()¶
Run cross validation on the model.
Note
To perform Cross Validation on a new model with new parameters, use
train
instead.- Returns
- ModelJob
The created job to build the model
- delete()¶
Delete a model from the project’s leaderboard.
- Return type
None
- download_scoring_code(file_name, source_code=False)¶
Download the Scoring Code JAR.
- Parameters
- file_namestr
File path where scoring code will be saved.
- source_codebool, optional
Set to True to download source code archive. It will not be executable.
- Return type
None
- download_training_artifact(file_name)¶
Retrieve trained artifact(s) from a model containing one or more custom tasks.
Artifact(s) will be downloaded to the specified local filepath.
- Parameters
- file_namestr
File path where trained model artifact(s) will be saved.
- classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters
- datadict
Correctly snake_cased keys and their values.
- Return type
TypeVar
(T
, bound=APIObject
)
- get_advanced_tuning_parameters()¶
Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns
- dict
A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.
tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.
tuning_parameters is a list of a dicts, each has the following keys
parameter_name : (str) name of the parameter (unique per task, see below)
parameter_id : (str) opaque ID string uniquely identifying parameter
default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)
current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.
task_name : (str) name of the task that this parameter belongs to
constraints: (dict) see the notes below
vertex_id: (str) ID of vertex that this parameter belongs to
Notes
The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.
"constraints": { "select": { "values": [<list(basestring or number) : possible values>] }, "ascii": {}, "unicode": {}, "int": { "min": <int : minimum valid value>, "max": <int : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "float": { "min": <float : minimum valid value>, "max": <float : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "intList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <int : minimum valid value>, "max_val": <int : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "floatList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <float : minimum valid value>, "max_val": <float : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> } }
The keys have meaning as follows:
select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.
- Return type
- get_all_confusion_charts(fallback_to_parent_insights=False)¶
Retrieve a list of all confusion matrices available for the model.
- Parameters
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns
- list of ConfusionChart
Data for all available confusion charts for model.
- get_all_feature_impacts(data_slice_filter=None)¶
Retrieve a list of all feature impact results available for the model.
- Parameters
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.
- Returns
- list of dicts
Data for all available model feature impacts. Or an empty list if not data found.
Examples
model = datarobot.Model(id='model-id', project_id='project-id') # Get feature impact insights for sliced data data_slice = datarobot.DataSlice(id='data-slice-id') sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get feature impact insights for unsliced data data_slice = datarobot.DataSlice() unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get all feature impact insights all_fi = model.get_all_feature_impacts()
- get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶
Retrieve a list of all Lift charts available for the model.
- Parameters
- fallback_to_parent_insightsbool, optional
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns
- list of LiftChart
Data for all available model lift charts. Or an empty list if no data found.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get lift chart insights for sliced data sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id') # Get lift chart insights for unsliced data unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True) # Get all lift chart insights all_lift_charts = model.get_all_lift_charts()
- get_all_multiclass_lift_charts(fallback_to_parent_insights=False)¶
Retrieve a list of all Lift charts available for the model.
- Parameters
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns
- list of LiftChart
Data for all available model lift charts.
- get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶
Retrieve a list of all residuals charts available for the model.
- Parameters
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns
- list of ResidualsChart
Data for all available model residuals charts.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get residuals chart insights for sliced data sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id') # Get residuals chart insights for unsliced data unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True) # Get all residuals chart insights all_residuals_charts = model.get_all_residuals_charts()
- get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)¶
Retrieve a list of all ROC curves available for the model.
- Parameters
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns
- list of RocCurve
Data for all available model ROC curves. Or an empty list if no RocCurves are found.
Examples
model = datarobot.Model.get('project-id', 'model-id') ds_filter=DataSlice(id='data-slice-id') # Get roc curve insights for sliced data sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get roc curve insights for unsliced data data_slice_filter=DataSlice(id=None) unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get all roc curve insights all_roc_curves = model.get_all_roc_curves()
- get_confusion_chart(source, fallback_to_parent_insights=False)¶
Retrieve a multiclass model’s confusion matrix for the specified source.
- Parameters
- sourcestr
Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns
- ConfusionChart
Model ConfusionChart data
- Raises
- ClientError
If the insight is not available for this model
- get_cross_class_accuracy_scores()¶
Retrieves a list of Cross Class Accuracy scores for the model.
- Returns
- json
- get_cross_validation_scores(partition=None, metric=None)¶
Return a dictionary, keyed by metric, showing cross validation scores per partition.
Cross Validation should already have been performed using
cross_validate
ortrain
.Note
Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.
- Parameters
- partitionfloat
optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.
- metric: unicode
optional name of the metric to filter to resulting cross validation scores by
- Returns
- cross_validation_scores: dict
A dictionary keyed by metric showing cross validation scores per partition.
- get_data_disparity_insights(feature, class_name1, class_name2)¶
Retrieve a list of Cross Class Data Disparity insights for the model.
- Parameters
- featurestr
Bias and Fairness protected feature name.
- class_name1str
One of the compared classes
- class_name2str
Another compared class
- Returns
- json
- get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)¶
Retrieve a list of Per Class Bias insights for the model.
- Parameters
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- offsetint, optional
Number of items to skip.
- limitint, optional
Number of items to return.
- Returns
- json
- get_feature_effect(source, data_slice_id=None)¶
Retrieve Feature Effects for the model.
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the available sources.- Parameters
- sourcestring
The source Feature Effects are retrieved for.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, retrieve unsliced insight data.
- Returns
- feature_effectsFeatureEffects
The feature effects data.
- Raises
- ClientError (404)
If the feature effects have not been computed or source is not valid value.
- get_feature_effect_metadata()¶
Retrieve Feature Effects metadata. Response contains status and available model sources.
Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for the project.
Use source to retrieve Feature Effects, selecting one of the provided sources.
- Returns
- feature_effect_metadata: FeatureEffectMetadata
- get_feature_effects_multiclass(source='training', class_=None)¶
Retrieve Feature Effects for the multiclass model.
Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the available sources.- Parameters
- sourcestr
The source Feature Effects are retrieved for.
- class_str or None
The class name Feature Effects are retrieved for.
- Returns
- list
The list of multiclass feature effects.
- Raises
- ClientError (404)
If Feature Effects have not been computed or source is not valid value.
- get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.
Elsewhere this technique is sometimes called ‘Permutation Importance’.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Parameters
- with_metadatabool
The flag indicating if the result should include the metadata as well.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.
- Returns
- list or dict
The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.
Each List item is a dict with the keys
featureName
,impactNormalized
, andimpactUnnormalized
,redundantWith
andcount
.For dict response available keys are:
featureImpacts
- Feature Impact data as a dictionary. Each item is a dict withkeys:
featureName
,impactNormalized
, andimpactUnnormalized
, andredundantWith
.
shapBased
- A boolean that indicates whether Feature Impact was calculated usingShapley values.
ranRedundancyDetection
- A boolean that indicates whether redundant featureidentification was run while calculating this Feature Impact.
rowCount
- An integer or None that indicates the number of rows that was used tocalculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.
count
- An integer with the number of features under thefeatureImpacts
.
- Raises
- ClientError (404)
If the feature impacts have not been computed.
- ValueError
If data_slice_filter passed as None
- get_features_used()¶
Query the server to determine which features were used.
Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.
- Returns
- featureslist of str
The names of the features used in the model.
- Return type
List
[str
]
- get_frozen_child_models()¶
Retrieve the IDs for all models that are frozen from this model.
- Returns
- A list of Models
- get_labelwise_roc_curves(source, fallback_to_parent_insights=False)¶
Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .
New in version v2.24.
- Parameters
- sourcestr
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- Returns
- list ofclass:LabelwiseRocCurve <datarobot.models.roc_curve.LabelwiseRocCurve>
Labelwise ROC Curve instances for
source
and all labels
- Raises
- ClientError
If the insight is not available for this model
- get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve the model Lift chart for the specified source.
- Parameters
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- Returns
- LiftChart
Model lift chart data
- Raises
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- get_missing_report_info()¶
Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.
- Returns
- An iterable of MissingReportPerFeature
The queried model missing report, sorted by missing count (DESCENDING order).
- get_model_blueprint_chart()¶
Retrieve a diagram that can be used to understand data flow in the blueprint.
- Returns
- ModelBlueprintChart
The queried model blueprint chart.
- get_model_blueprint_documents()¶
Get documentation for tasks used in this model.
- Returns
- list of BlueprintTaskDocument
All documents available for the model.
- get_model_blueprint_json()¶
Get the blueprint json representation used by this model.
- Returns
- BlueprintJson
Json representation of the blueprint stages.
- Return type
Dict
[str
,Tuple
[List
[str
],List
[str
],str
]]
- get_multiclass_feature_impact()¶
For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Returns
- feature_impactslist of dict
The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
- Raises
- ClientError (404)
If the multiclass feature impacts have not been computed.
- get_multiclass_lift_chart(source, fallback_to_parent_insights=False)¶
Retrieve model Lift chart for the specified source.
- Parameters
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns
- list of LiftChart
Model lift chart data for each saved target class
- Raises
- ClientError
If the insight is not available for this model
- get_multilabel_lift_charts(source, fallback_to_parent_insights=False)¶
Retrieve model Lift charts for the specified source.
New in version v2.24.
- Parameters
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns
- list of LiftChart
Model lift chart data for each saved target class
- Raises
- ClientError
If the insight is not available for this model
- get_num_iterations_trained()¶
Retrieves the number of estimators trained by early-stopping tree-based models.
– versionadded:: v2.22
- Returns
- projectId: str
id of project containing the model
- modelId: str
id of the model
- data: array
list of numEstimatorsItem objects, one for each modeling stage.
- numEstimatorsItem will be of the form:
- stage: str
indicates the modeling stage (for multi-stage models); None of single-stage models
- numIterations: int
the number of estimators or iterations trained by the model
- get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)¶
Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.
See
get_feature_effect_metadata
for retrieving information of source.- Parameters
- sourcestring
The source Feature Effects are retrieved for.
- max_waitint, optional
The maximum time to wait for a requested Feature Effect job to complete before erroring.
- row_countint, optional
(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- feature_effectsFeatureEffects
The Feature Effects data.
- get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)¶
Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.
- Parameters
- sourcestring
The source Feature Effects retrieve for.
- class_str or None
The class name Feature Effects retrieve for.
- row_countint
The number of rows from dataset to use for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
- featureslist or None
The list of features used to calculate Feature Effects.
- max_waitint, optional
The maximum time to wait for a requested Feature Effects job to complete before erroring.
- Returns
- feature_effectslist of FeatureEffectsMulticlass
The list of multiclass feature effects data.
- get_or_request_feature_impact(max_wait=600, **kwargs)¶
Retrieve feature impact for the model, requesting a job if it hasn’t been run previously
- Parameters
- max_waitint, optional
The maximum time to wait for a requested feature impact job to complete before erroring
- **kwargs
Arbitrary keyword arguments passed to
request_feature_impact
.
- Returns
- feature_impactslist or dict
The feature impact data. See
get_feature_impact
for the exact schema.
- get_parameters()¶
Retrieve model parameters.
- Returns
- ModelParameters
Model parameters for this model.
- get_pareto_front()¶
Retrieve the Pareto Front for a Eureqa model.
This method is only supported for Eureqa models.
- Returns
- ParetoFront
Model ParetoFront data
- get_prime_eligibility()¶
Check if this model can be approximated with DataRobot Prime
- Returns
- prime_eligibilitydict
a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
- get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve model residuals chart for the specified source.
- Parameters
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.
- Returns
- ResidualsChart
Model residuals chart data
- Raises
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.
- Parameters
- sourcestr
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
- Returns
- RocCurve
Model ROC curve data
- Raises
- ClientError
If the insight is not available for this model
- (New in version v3.0) TypeError
If the underlying project type is multilabel
- ValueError
If data_slice_filter passed as None
- get_rulesets()¶
List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.
- Returns
- rulesetslist of Ruleset
- Return type
List
[Ruleset
]
- get_supported_capabilities()¶
Retrieves a summary of the capabilities supported by a model.
New in version v2.14.
- Returns
- supportsBlending: bool
whether the model supports blending
- supportsMonotonicConstraints: bool
whether the model supports monotonic constraints
- hasWordCloud: bool
whether the model has word cloud data available
- eligibleForPrime: bool
whether the model is eligible for Prime
- hasParameters: bool
whether the model has parameters that can be retrieved
- supportsCodeGeneration: bool
(New in version v2.18) whether the model supports code generation
- supportsShap: bool
- (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based
feature Importance
- supportsEarlyStopping: bool
(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.
- get_uri()¶
- Returns
- urlstr
Permanent static hyperlink to this model at leaderboard.
- Return type
str
- get_word_cloud(exclude_stop_words=False)¶
Retrieve word cloud data for the model.
- Parameters
- exclude_stop_wordsbool, optional
Set to True if you want stopwords filtered out of response.
- Returns
- WordCloud
Word cloud data for the model.
- incremental_train(data_stage_id, training_data_name=None)¶
Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.
- Return type
- classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)¶
Retrieve paginated model records, sorted by scores, with optional filtering.
- Parameters
- sort_by_partition: str, one of `validation`, `backtesting`, `crossValidation` or `holdout`
Set the partition to use for sorted (by score) list of models. validation is the default.
- sort_by_metric: str
Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.
- with_metric: str
For a single-metric list of results, specify that project metric.
- search_term: str
If specified, only models containing the term in their name or processes are returned.
- featurelists: list of str
If specified, only models trained on selected featurelists are returned.
- families: list of str
If specified, only models belonging to selected families are returned.
- blueprints: list of str
If specified, only models trained on specified blueprint IDs are returned.
- labels: list of str, `starred` or `prepared for deployment`
If specified, only models tagged with all listed labels are returned.
- characteristics: list of str
If specified, only models matching all listed characteristics are returned.
- training_filters: list of str
If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings
- number_of_clusters: list of int
Filter models by number of clusters. Applicable only in unsupervised clustering projects.
- limit: int
- offset: int
- Returns
- generic_models: list of GenericModel
- Return type
List
[GenericModel
]
- open_in_browser()¶
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type
None
- request_approximation()¶
Request an approximation of this model using DataRobot Prime
This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.
- Returns
- jobJob
the job generating the rulesets
- request_cross_class_accuracy_scores()¶
Request data disparity insights to be computed for the model.
- Returns
- status_idstr
A statusId of computation request.
- request_data_disparity_insights(feature, compared_class_names)¶
Request data disparity insights to be computed for the model.
- Parameters
- featurestr
Bias and Fairness protected feature name.
- compared_class_nameslist(str)
List of two classes to compare
- Returns
- status_idstr
A statusId of computation request.
- request_external_test(dataset_id, actual_value_column=None)¶
Request external test to compute scores and insights on an external test dataset
- Parameters
- dataset_idstring
The dataset to make predictions against (as uploaded from Project.upload_dataset)
- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- Returns
- ——-
- jobJob
a Job representing external dataset insights computation
- request_fairness_insights(fairness_metrics_set=None)¶
Request fairness insights to be computed for the model.
- Parameters
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- Returns
- status_idstr
A statusId of computation request.
- request_feature_effect(row_count=None, data_slice_id=None)¶
Submit request to compute Feature Effects for the model.
See
get_feature_effect
for more information on the result of the job.- Parameters
- row_countint
(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- jobJob
A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
- Raises
- JobAlreadyRequested (422)
If the feature effect have already been requested.
- request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)¶
Request Feature Effects computation for the multiclass model.
See
get_feature_effect
for more information on the result of the job.- Parameters
- row_countint
The number of rows from dataset to use for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by feature impact) used to calculate Feature Effects.
- featureslist or None
The list of features used to calculate Feature Effects.
- Returns
- jobJob
A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.
- request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)¶
Request feature impacts to be computed for the model.
See
get_feature_impact
for more information on the result of the job.- Parameters
- row_countint, optional
The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.
- with_metadatabool, optional
Flag indicating whether the result should include the metadata. If true, metadata is included.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- jobJob or status_id
Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.
- Raises
- JobAlreadyRequested (422)
If the feature impacts have already been requested.
- request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)¶
Train a new frozen model with parameters from this model.
Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.
Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.
Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Parameters
- training_row_countint, optional
the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
- training_durationstr, optional
a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
- training_start_datedatetime.datetime, optional
the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
- training_end_datedatetime.datetime, optional
the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
- time_window_sample_pctint, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_methodstr, optional
(New in version v2.23) defines the way training data is selected. Can be either
random
orlatest
. In combination withtraining_row_count
defines how rows are selected from backtest (latest
by default). When training data is defined using time range (training_duration
oruse_project_settings
) this setting changes the waytime_window_sample_pct
is applied (random
by default). Applicable to OTV projects only.
- Returns
- model_jobModelJob
the modeling job training a frozen model
- Return type
- request_frozen_model(sample_pct=None, training_row_count=None)¶
Train a new frozen model with parameters from this model
Note
This method only works if project the model belongs to is not datetime partitioned. If it is, use
request_frozen_datetime_model
instead.Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
- Parameters
- sample_pctfloat
optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.
- training_row_countint
(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.
- Returns
- model_jobModelJob
the modeling job training a frozen model
- Return type
- request_lift_chart(source, data_slice_id=None)¶
Request the model Lift Chart for the specified source.
- Parameters
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type
- request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)¶
Requests predictions against a previously uploaded dataset.
- Parameters
- dataset_idstring, optional
The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataset
Dataset
, optional The dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataframepd.DataFrame, optional
(New in v3.0) The dataframe to make predictions against
- file_pathstr, optional
(New in v3.0) Path to file to make predictions against
- fileIOBase, optional
(New in v3.0) File to make predictions against
- include_prediction_intervalsbool, optional
(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_sizeint, optional
(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_pointdatetime.datetime or None, optional
(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the
response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
- max_explanations: (New in version v2.21) int optional; specifies the maximum number of
explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
- max_ngram_explanations: optional; int or str
(New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.
- Returns
- jobPredictJob
The job computing the predictions
- Return type
- request_residuals_chart(source, data_slice_id=None)¶
Request the model residuals chart for the specified source.
- Parameters
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type
- request_roc_curve(source, data_slice_id=None)¶
Request the model Roc Curve for the specified source.
- Parameters
- sourcestr
Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type
- request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)¶
Start a job to build training predictions
- Parameters
- data_subsetstr
data set definition to build predictions on. Choices are:
- dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
models in datetime partitioned projects
- dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
all data except training set. Not valid for models in datetime partitioned projects
dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
- dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
- explanation_algorithmdr.enums.EXPLANATIONS_ALGORITHM
(New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).
- max_explanationsint
(New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the
max_explanations
, the sum of remaining values will also be returned asshap_remaining_total
. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored ifexplanation_algorithm
is not set.
- Returns
- Job
an instance of created async job
- retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)¶
Submit a job to the queue to train a blender model.
- Parameters
- sample_pct: float, optional
The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
- featurelist_idstr, optional
The featurelist id
- training_row_countint, optional
The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.
- n_clusters: int, optional
(new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.
- Returns
- jobModelJob
The created job that is retraining the model
- Return type
- set_prediction_threshold(threshold)¶
Set a custom prediction threshold for the model.
May not be used once
prediction_threshold_read_only
is True for this model.- Parameters
- thresholdfloat
only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).
- star_model()¶
Mark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type
None
- start_advanced_tuning_session()¶
Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns
- AdvancedTuningSession
Session for setting up and running Advanced Tuning on a model
- train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)¶
Train the blueprint used in model on a particular featurelist or amount of data.
This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.
Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.
In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.
Note
For datetime partitioned projects, see
train_datetime
instead.- Parameters
- sample_pctfloat, optional
The amount of data to use for training, as a percentage of the project dataset from 0 to 100.
- featurelist_idstr, optional
The identifier of the featurelist to use. If not defined, the featurelist of this model is used.
- scoring_typestr, optional
Either
validation
orcrossValidation
(alsodr.SCORING_TYPE.validation
ordr.SCORING_TYPE.cross_validation
).validation
is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning,crossValidation
can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.- training_row_countint, optional
The number of rows to use to train the requested model.
- monotonic_increasing_featurelist_idstr
(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_idstr
(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.
- Returns
- model_job_idstr
id of created job, can be used as parameter to
ModelJob.get
method orwait_for_async_model_creation
function
Examples
project = Project.get('project-id') model = Model.get('project-id', 'model-id') model_job_id = model.train(training_row_count=project.max_train_rows)
- Return type
str
- train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)¶
Trains this model on a different featurelist or sample size.
Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Parameters
- featurelist_idstr, optional
the featurelist to use to train the model. If not specified, the featurelist of this model is used.
- training_row_countint, optional
the number of rows of data that should be used to train the model. If specified, neither
training_duration
noruse_project_settings
may be specified.- training_durationstr, optional
a duration string specifying what time range the data used to train the model should span. If specified, neither
training_row_count
noruse_project_settings
may be specified.- use_project_settingsbool, optional
(New in version v2.20) defaults to
False
. IfTrue
, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neithertraining_row_count
nortraining_duration
may be specified.- time_window_sample_pctint, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_methodstr, optional
(New in version v2.23) defines the way training data is selected. Can be either
random
orlatest
. In combination withtraining_row_count
defines how rows are selected from backtest (latest
by default). When training data is defined using time range (training_duration
oruse_project_settings
) this setting changes the waytime_window_sample_pct
is applied (random
by default). Applicable to OTV projects only.- monotonic_increasing_featurelist_idstr, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_idstr, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- n_clusters: int, optional
(New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.
- Returns
- jobModelJob
the created job to build the model
- Return type
- train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)¶
Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.
This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.
- Parameters
- data_stage_id: str
The id of the data stage to use for training.
- training_data_namestr, optional
The name of the iteration or data stage to indicate what the incremental learning was performed on.
- data_stage_encodingstr, optional
The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252
- data_stage_encodingstr, optional
The delimiter used by the data in the data stage (default: ‘,’).
- data_stage_compressionstr, optional
The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip
- Returns
- jobModelJob
The created job that is retraining the model
- unstar_model()¶
Unmark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type
None
- class datarobot.models.model.AdvancedTuningParamsType() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
- class datarobot.models.model.BiasMitigationFeatureInfo(messages)¶
PrimeModel¶
- class datarobot.models.PrimeModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, ruleset_id=None, rule_count=None, score=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)¶
Represents a DataRobot Prime model approximating a parent model with downloadable code.
All durations are specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Attributes
- idstr
the id of the model
- project_idstr
the id of the project the model belongs to
- processeslist of str
the processes used by the model
- featurelist_namestr
the name of the featurelist used by the model
- featurelist_idstr
the id of the featurelist used by the model
- sample_pctfloat
the percentage of the project dataset used in training the model
- training_row_countint or None
the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
- training_durationstr or None
only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- model_typestr
what model this is, e.g. ‘DataRobot Prime’
- model_categorystr
what kind of model this is - always ‘prime’ for DataRobot Prime models
- is_frozenbool
whether this model is a frozen model
- blueprint_idstr
the id of the blueprint used in this model
- metricsdict
a mapping from each metric to the model’s scores for that metric
- rulesetRuleset
the ruleset used in the Prime model
- parent_model_idstr
the id of the model that this Prime model approximates
- monotonic_increasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraintsbool
optional, whether this model supports enforcing monotonic constraints
- is_starredbool
whether this model is marked as starred
- prediction_thresholdfloat
for binary classification projects, the threshold used for predictions
- prediction_threshold_read_onlybool
indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- supports_composable_mlbool or None
(New in version v2.26) whether this model is supported in the Composable ML.
- classmethod get(project_id, model_id)¶
Retrieve a specific prime model.
- Parameters
- project_idstr
The id of the project the prime model belongs to
- model_idstr
The
model_id
of the prime model to retrieve.
- Returns
- modelPrimeModel
The queried instance.
- request_download_validation(language)¶
Prep and validate the downloadable code for the ruleset associated with this model.
- Parameters
- languagestr
the language the code should be downloaded in - see
datarobot.enums.PRIME_LANGUAGE
for available languages
- Returns
- jobJob
A job tracking the code preparation and validation
- advanced_tune(params, description=None)¶
Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Parameters
- paramsdict
Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- descriptionstr
Human-readable string describing the newly advanced-tuned model
- Returns
- ModelJob
The created job to build the model
- Return type
- cross_validate()¶
Run cross validation on the model.
Note
To perform Cross Validation on a new model with new parameters, use
train
instead.- Returns
- ModelJob
The created job to build the model
- delete()¶
Delete a model from the project’s leaderboard.
- Return type
None
- download_scoring_code(file_name, source_code=False)¶
Download the Scoring Code JAR.
- Parameters
- file_namestr
File path where scoring code will be saved.
- source_codebool, optional
Set to True to download source code archive. It will not be executable.
- Return type
None
- download_training_artifact(file_name)¶
Retrieve trained artifact(s) from a model containing one or more custom tasks.
Artifact(s) will be downloaded to the specified local filepath.
- Parameters
- file_namestr
File path where trained model artifact(s) will be saved.
- classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters
- datadict
Correctly snake_cased keys and their values.
- Return type
TypeVar
(T
, bound=APIObject
)
- get_advanced_tuning_parameters()¶
Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns
- dict
A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.
tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.
tuning_parameters is a list of a dicts, each has the following keys
parameter_name : (str) name of the parameter (unique per task, see below)
parameter_id : (str) opaque ID string uniquely identifying parameter
default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)
current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.
task_name : (str) name of the task that this parameter belongs to
constraints: (dict) see the notes below
vertex_id: (str) ID of vertex that this parameter belongs to
Notes
The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.
"constraints": { "select": { "values": [<list(basestring or number) : possible values>] }, "ascii": {}, "unicode": {}, "int": { "min": <int : minimum valid value>, "max": <int : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "float": { "min": <float : minimum valid value>, "max": <float : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "intList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <int : minimum valid value>, "max_val": <int : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "floatList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <float : minimum valid value>, "max_val": <float : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> } }
The keys have meaning as follows:
select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.
- Return type
- get_all_confusion_charts(fallback_to_parent_insights=False)¶
Retrieve a list of all confusion matrices available for the model.
- Parameters
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns
- list of ConfusionChart
Data for all available confusion charts for model.
- get_all_feature_impacts(data_slice_filter=None)¶
Retrieve a list of all feature impact results available for the model.
- Parameters
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.
- Returns
- list of dicts
Data for all available model feature impacts. Or an empty list if not data found.
Examples
model = datarobot.Model(id='model-id', project_id='project-id') # Get feature impact insights for sliced data data_slice = datarobot.DataSlice(id='data-slice-id') sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get feature impact insights for unsliced data data_slice = datarobot.DataSlice() unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get all feature impact insights all_fi = model.get_all_feature_impacts()
- get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶
Retrieve a list of all Lift charts available for the model.
- Parameters
- fallback_to_parent_insightsbool, optional
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns
- list of LiftChart
Data for all available model lift charts. Or an empty list if no data found.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get lift chart insights for sliced data sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id') # Get lift chart insights for unsliced data unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True) # Get all lift chart insights all_lift_charts = model.get_all_lift_charts()
- get_all_multiclass_lift_charts(fallback_to_parent_insights=False)¶
Retrieve a list of all Lift charts available for the model.
- Parameters
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns
- list of LiftChart
Data for all available model lift charts.
- get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶
Retrieve a list of all residuals charts available for the model.
- Parameters
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns
- list of ResidualsChart
Data for all available model residuals charts.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get residuals chart insights for sliced data sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id') # Get residuals chart insights for unsliced data unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True) # Get all residuals chart insights all_residuals_charts = model.get_all_residuals_charts()
- get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)¶
Retrieve a list of all ROC curves available for the model.
- Parameters
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns
- list of RocCurve
Data for all available model ROC curves. Or an empty list if no RocCurves are found.
Examples
model = datarobot.Model.get('project-id', 'model-id') ds_filter=DataSlice(id='data-slice-id') # Get roc curve insights for sliced data sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get roc curve insights for unsliced data data_slice_filter=DataSlice(id=None) unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get all roc curve insights all_roc_curves = model.get_all_roc_curves()
- get_confusion_chart(source, fallback_to_parent_insights=False)¶
Retrieve a multiclass model’s confusion matrix for the specified source.
- Parameters
- sourcestr
Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns
- ConfusionChart
Model ConfusionChart data
- Raises
- ClientError
If the insight is not available for this model
- get_cross_class_accuracy_scores()¶
Retrieves a list of Cross Class Accuracy scores for the model.
- Returns
- json
- get_cross_validation_scores(partition=None, metric=None)¶
Return a dictionary, keyed by metric, showing cross validation scores per partition.
Cross Validation should already have been performed using
cross_validate
ortrain
.Note
Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.
- Parameters
- partitionfloat
optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.
- metric: unicode
optional name of the metric to filter to resulting cross validation scores by
- Returns
- cross_validation_scores: dict
A dictionary keyed by metric showing cross validation scores per partition.
- get_data_disparity_insights(feature, class_name1, class_name2)¶
Retrieve a list of Cross Class Data Disparity insights for the model.
- Parameters
- featurestr
Bias and Fairness protected feature name.
- class_name1str
One of the compared classes
- class_name2str
Another compared class
- Returns
- json
- get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)¶
Retrieve a list of Per Class Bias insights for the model.
- Parameters
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- offsetint, optional
Number of items to skip.
- limitint, optional
Number of items to return.
- Returns
- json
- get_feature_effect(source, data_slice_id=None)¶
Retrieve Feature Effects for the model.
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the available sources.- Parameters
- sourcestring
The source Feature Effects are retrieved for.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, retrieve unsliced insight data.
- Returns
- feature_effectsFeatureEffects
The feature effects data.
- Raises
- ClientError (404)
If the feature effects have not been computed or source is not valid value.
- get_feature_effect_metadata()¶
Retrieve Feature Effects metadata. Response contains status and available model sources.
Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for the project.
Use source to retrieve Feature Effects, selecting one of the provided sources.
- Returns
- feature_effect_metadata: FeatureEffectMetadata
- get_feature_effects_multiclass(source='training', class_=None)¶
Retrieve Feature Effects for the multiclass model.
Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the available sources.- Parameters
- sourcestr
The source Feature Effects are retrieved for.
- class_str or None
The class name Feature Effects are retrieved for.
- Returns
- list
The list of multiclass feature effects.
- Raises
- ClientError (404)
If Feature Effects have not been computed or source is not valid value.
- get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.
Elsewhere this technique is sometimes called ‘Permutation Importance’.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Parameters
- with_metadatabool
The flag indicating if the result should include the metadata as well.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.
- Returns
- list or dict
The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.
Each List item is a dict with the keys
featureName
,impactNormalized
, andimpactUnnormalized
,redundantWith
andcount
.For dict response available keys are:
featureImpacts
- Feature Impact data as a dictionary. Each item is a dict withkeys:
featureName
,impactNormalized
, andimpactUnnormalized
, andredundantWith
.
shapBased
- A boolean that indicates whether Feature Impact was calculated usingShapley values.
ranRedundancyDetection
- A boolean that indicates whether redundant featureidentification was run while calculating this Feature Impact.
rowCount
- An integer or None that indicates the number of rows that was used tocalculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.
count
- An integer with the number of features under thefeatureImpacts
.
- Raises
- ClientError (404)
If the feature impacts have not been computed.
- ValueError
If data_slice_filter passed as None
- get_features_used()¶
Query the server to determine which features were used.
Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.
- Returns
- featureslist of str
The names of the features used in the model.
- Return type
List
[str
]
- get_frozen_child_models()¶
Retrieve the IDs for all models that are frozen from this model.
- Returns
- A list of Models
- get_labelwise_roc_curves(source, fallback_to_parent_insights=False)¶
Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .
New in version v2.24.
- Parameters
- sourcestr
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- Returns
- list ofclass:LabelwiseRocCurve <datarobot.models.roc_curve.LabelwiseRocCurve>
Labelwise ROC Curve instances for
source
and all labels
- Raises
- ClientError
If the insight is not available for this model
- get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve the model Lift chart for the specified source.
- Parameters
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- Returns
- LiftChart
Model lift chart data
- Raises
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- get_missing_report_info()¶
Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.
- Returns
- An iterable of MissingReportPerFeature
The queried model missing report, sorted by missing count (DESCENDING order).
- get_model_blueprint_chart()¶
Retrieve a diagram that can be used to understand data flow in the blueprint.
- Returns
- ModelBlueprintChart
The queried model blueprint chart.
- get_model_blueprint_documents()¶
Get documentation for tasks used in this model.
- Returns
- list of BlueprintTaskDocument
All documents available for the model.
- get_model_blueprint_json()¶
Get the blueprint json representation used by this model.
- Returns
- BlueprintJson
Json representation of the blueprint stages.
- Return type
Dict
[str
,Tuple
[List
[str
],List
[str
],str
]]
- get_multiclass_feature_impact()¶
For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Returns
- feature_impactslist of dict
The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
- Raises
- ClientError (404)
If the multiclass feature impacts have not been computed.
- get_multiclass_lift_chart(source, fallback_to_parent_insights=False)¶
Retrieve model Lift chart for the specified source.
- Parameters
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns
- list of LiftChart
Model lift chart data for each saved target class
- Raises
- ClientError
If the insight is not available for this model
- get_multilabel_lift_charts(source, fallback_to_parent_insights=False)¶
Retrieve model Lift charts for the specified source.
New in version v2.24.
- Parameters
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns
- list of LiftChart
Model lift chart data for each saved target class
- Raises
- ClientError
If the insight is not available for this model
- get_num_iterations_trained()¶
Retrieves the number of estimators trained by early-stopping tree-based models.
– versionadded:: v2.22
- Returns
- projectId: str
id of project containing the model
- modelId: str
id of the model
- data: array
list of numEstimatorsItem objects, one for each modeling stage.
- numEstimatorsItem will be of the form:
- stage: str
indicates the modeling stage (for multi-stage models); None of single-stage models
- numIterations: int
the number of estimators or iterations trained by the model
- get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)¶
Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.
See
get_feature_effect_metadata
for retrieving information of source.- Parameters
- sourcestring
The source Feature Effects are retrieved for.
- max_waitint, optional
The maximum time to wait for a requested Feature Effect job to complete before erroring.
- row_countint, optional
(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- feature_effectsFeatureEffects
The Feature Effects data.
- get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)¶
Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.
- Parameters
- sourcestring
The source Feature Effects retrieve for.
- class_str or None
The class name Feature Effects retrieve for.
- row_countint
The number of rows from dataset to use for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
- featureslist or None
The list of features used to calculate Feature Effects.
- max_waitint, optional
The maximum time to wait for a requested Feature Effects job to complete before erroring.
- Returns
- feature_effectslist of FeatureEffectsMulticlass
The list of multiclass feature effects data.
- get_or_request_feature_impact(max_wait=600, **kwargs)¶
Retrieve feature impact for the model, requesting a job if it hasn’t been run previously
- Parameters
- max_waitint, optional
The maximum time to wait for a requested feature impact job to complete before erroring
- **kwargs
Arbitrary keyword arguments passed to
request_feature_impact
.
- Returns
- feature_impactslist or dict
The feature impact data. See
get_feature_impact
for the exact schema.
- get_parameters()¶
Retrieve model parameters.
- Returns
- ModelParameters
Model parameters for this model.
- get_pareto_front()¶
Retrieve the Pareto Front for a Eureqa model.
This method is only supported for Eureqa models.
- Returns
- ParetoFront
Model ParetoFront data
- get_prime_eligibility()¶
Check if this model can be approximated with DataRobot Prime
- Returns
- prime_eligibilitydict
a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
- get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve model residuals chart for the specified source.
- Parameters
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.
- Returns
- ResidualsChart
Model residuals chart data
- Raises
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.
- Parameters
- sourcestr
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
- Returns
- RocCurve
Model ROC curve data
- Raises
- ClientError
If the insight is not available for this model
- (New in version v3.0) TypeError
If the underlying project type is multilabel
- ValueError
If data_slice_filter passed as None
- get_rulesets()¶
List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.
- Returns
- rulesetslist of Ruleset
- Return type
List
[Ruleset
]
- get_supported_capabilities()¶
Retrieves a summary of the capabilities supported by a model.
New in version v2.14.
- Returns
- supportsBlending: bool
whether the model supports blending
- supportsMonotonicConstraints: bool
whether the model supports monotonic constraints
- hasWordCloud: bool
whether the model has word cloud data available
- eligibleForPrime: bool
whether the model is eligible for Prime
- hasParameters: bool
whether the model has parameters that can be retrieved
- supportsCodeGeneration: bool
(New in version v2.18) whether the model supports code generation
- supportsShap: bool
- (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based
feature Importance
- supportsEarlyStopping: bool
(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.
- get_uri()¶
- Returns
- urlstr
Permanent static hyperlink to this model at leaderboard.
- Return type
str
- get_word_cloud(exclude_stop_words=False)¶
Retrieve word cloud data for the model.
- Parameters
- exclude_stop_wordsbool, optional
Set to True if you want stopwords filtered out of response.
- Returns
- WordCloud
Word cloud data for the model.
- incremental_train(data_stage_id, training_data_name=None)¶
Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.
- Return type
- classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)¶
Retrieve paginated model records, sorted by scores, with optional filtering.
- Parameters
- sort_by_partition: str, one of `validation`, `backtesting`, `crossValidation` or `holdout`
Set the partition to use for sorted (by score) list of models. validation is the default.
- sort_by_metric: str
Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.
- with_metric: str
For a single-metric list of results, specify that project metric.
- search_term: str
If specified, only models containing the term in their name or processes are returned.
- featurelists: list of str
If specified, only models trained on selected featurelists are returned.
- families: list of str
If specified, only models belonging to selected families are returned.
- blueprints: list of str
If specified, only models trained on specified blueprint IDs are returned.
- labels: list of str, `starred` or `prepared for deployment`
If specified, only models tagged with all listed labels are returned.
- characteristics: list of str
If specified, only models matching all listed characteristics are returned.
- training_filters: list of str
If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings
- number_of_clusters: list of int
Filter models by number of clusters. Applicable only in unsupervised clustering projects.
- limit: int
- offset: int
- Returns
- generic_models: list of GenericModel
- Return type
List
[GenericModel
]
- open_in_browser()¶
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type
None
- request_cross_class_accuracy_scores()¶
Request data disparity insights to be computed for the model.
- Returns
- status_idstr
A statusId of computation request.
- request_data_disparity_insights(feature, compared_class_names)¶
Request data disparity insights to be computed for the model.
- Parameters
- featurestr
Bias and Fairness protected feature name.
- compared_class_nameslist(str)
List of two classes to compare
- Returns
- status_idstr
A statusId of computation request.
- request_external_test(dataset_id, actual_value_column=None)¶
Request external test to compute scores and insights on an external test dataset
- Parameters
- dataset_idstring
The dataset to make predictions against (as uploaded from Project.upload_dataset)
- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- Returns
- ——-
- jobJob
a Job representing external dataset insights computation
- request_fairness_insights(fairness_metrics_set=None)¶
Request fairness insights to be computed for the model.
- Parameters
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- Returns
- status_idstr
A statusId of computation request.
- request_feature_effect(row_count=None, data_slice_id=None)¶
Submit request to compute Feature Effects for the model.
See
get_feature_effect
for more information on the result of the job.- Parameters
- row_countint
(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- jobJob
A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
- Raises
- JobAlreadyRequested (422)
If the feature effect have already been requested.
- request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)¶
Request Feature Effects computation for the multiclass model.
See
get_feature_effect
for more information on the result of the job.- Parameters
- row_countint
The number of rows from dataset to use for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by feature impact) used to calculate Feature Effects.
- featureslist or None
The list of features used to calculate Feature Effects.
- Returns
- jobJob
A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.
- request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)¶
Request feature impacts to be computed for the model.
See
get_feature_impact
for more information on the result of the job.- Parameters
- row_countint, optional
The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.
- with_metadatabool, optional
Flag indicating whether the result should include the metadata. If true, metadata is included.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- jobJob or status_id
Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.
- Raises
- JobAlreadyRequested (422)
If the feature impacts have already been requested.
- request_lift_chart(source, data_slice_id=None)¶
Request the model Lift Chart for the specified source.
- Parameters
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type
- request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)¶
Requests predictions against a previously uploaded dataset.
- Parameters
- dataset_idstring, optional
The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataset
Dataset
, optional The dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataframepd.DataFrame, optional
(New in v3.0) The dataframe to make predictions against
- file_pathstr, optional
(New in v3.0) Path to file to make predictions against
- fileIOBase, optional
(New in v3.0) File to make predictions against
- include_prediction_intervalsbool, optional
(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_sizeint, optional
(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_pointdatetime.datetime or None, optional
(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the
response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
- max_explanations: (New in version v2.21) int optional; specifies the maximum number of
explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
- max_ngram_explanations: optional; int or str
(New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.
- Returns
- jobPredictJob
The job computing the predictions
- Return type
- request_residuals_chart(source, data_slice_id=None)¶
Request the model residuals chart for the specified source.
- Parameters
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type
- request_roc_curve(source, data_slice_id=None)¶
Request the model Roc Curve for the specified source.
- Parameters
- sourcestr
Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type
- request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)¶
Start a job to build training predictions
- Parameters
- data_subsetstr
data set definition to build predictions on. Choices are:
- dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
models in datetime partitioned projects
- dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
all data except training set. Not valid for models in datetime partitioned projects
dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
- dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
- explanation_algorithmdr.enums.EXPLANATIONS_ALGORITHM
(New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).
- max_explanationsint
(New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the
max_explanations
, the sum of remaining values will also be returned asshap_remaining_total
. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored ifexplanation_algorithm
is not set.
- Returns
- Job
an instance of created async job
- retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)¶
Submit a job to the queue to train a blender model.
- Parameters
- sample_pct: float, optional
The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
- featurelist_idstr, optional
The featurelist id
- training_row_countint, optional
The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.
- n_clusters: int, optional
(new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.
- Returns
- jobModelJob
The created job that is retraining the model
- Return type
- set_prediction_threshold(threshold)¶
Set a custom prediction threshold for the model.
May not be used once
prediction_threshold_read_only
is True for this model.- Parameters
- thresholdfloat
only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).
- star_model()¶
Mark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type
None
- start_advanced_tuning_session()¶
Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns
- AdvancedTuningSession
Session for setting up and running Advanced Tuning on a model
- train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)¶
Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.
This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.
- Parameters
- data_stage_id: str
The id of the data stage to use for training.
- training_data_namestr, optional
The name of the iteration or data stage to indicate what the incremental learning was performed on.
- data_stage_encodingstr, optional
The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252
- data_stage_encodingstr, optional
The delimiter used by the data in the data stage (default: ‘,’).
- data_stage_compressionstr, optional
The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip
- Returns
- jobModelJob
The created job that is retraining the model
- unstar_model()¶
Unmark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type
None
BlenderModel¶
- class datarobot.models.BlenderModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, model_ids=None, blender_method=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)¶
Represents blender model that combines prediction results from other models.
All durations are specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Attributes
- idstr
the id of the model
- project_idstr
the id of the project the model belongs to
- processeslist of str
the processes used by the model
- featurelist_namestr
the name of the featurelist used by the model
- featurelist_idstr
the id of the featurelist used by the model
- sample_pctfloat
the percentage of the project dataset used in training the model
- training_row_countint or None
the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
- training_durationstr or None
only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- model_typestr
what model this is, e.g. ‘DataRobot Prime’
- model_categorystr
what kind of model this is - always ‘prime’ for DataRobot Prime models
- is_frozenbool
whether this model is a frozen model
- blueprint_idstr
the id of the blueprint used in this model
- metricsdict
a mapping from each metric to the model’s scores for that metric
- model_idslist of str
List of model ids used in blender
- blender_methodstr
Method used to blend results from underlying models
- monotonic_increasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraintsbool
optional, whether this model supports enforcing monotonic constraints
- is_starredbool
whether this model marked as starred
- prediction_thresholdfloat
for binary classification projects, the threshold used for predictions
- prediction_threshold_read_onlybool
indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- model_numberinteger
model number assigned to a model
- parent_model_idstr or None
(New in version v2.20) the id of the model that tuning parameters are derived from
- supports_composable_mlbool or None
(New in version v2.26) whether this model is supported in the Composable ML.
- classmethod get(project_id, model_id)¶
Retrieve a specific blender.
- Parameters
- project_idstr
The project’s id.
- model_idstr
The
model_id
of the leaderboard item to retrieve.
- Returns
- modelBlenderModel
The queried instance.
- advanced_tune(params, description=None)¶
Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Parameters
- paramsdict
Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- descriptionstr
Human-readable string describing the newly advanced-tuned model
- Returns
- ModelJob
The created job to build the model
- Return type
- cross_validate()¶
Run cross validation on the model.
Note
To perform Cross Validation on a new model with new parameters, use
train
instead.- Returns
- ModelJob
The created job to build the model
- delete()¶
Delete a model from the project’s leaderboard.
- Return type
None
- download_scoring_code(file_name, source_code=False)¶
Download the Scoring Code JAR.
- Parameters
- file_namestr
File path where scoring code will be saved.
- source_codebool, optional
Set to True to download source code archive. It will not be executable.
- Return type
None
- download_training_artifact(file_name)¶
Retrieve trained artifact(s) from a model containing one or more custom tasks.
Artifact(s) will be downloaded to the specified local filepath.
- Parameters
- file_namestr
File path where trained model artifact(s) will be saved.
- classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters
- datadict
Correctly snake_cased keys and their values.
- Return type
TypeVar
(T
, bound=APIObject
)
- get_advanced_tuning_parameters()¶
Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns
- dict
A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.
tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.
tuning_parameters is a list of a dicts, each has the following keys
parameter_name : (str) name of the parameter (unique per task, see below)
parameter_id : (str) opaque ID string uniquely identifying parameter
default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)
current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.
task_name : (str) name of the task that this parameter belongs to
constraints: (dict) see the notes below
vertex_id: (str) ID of vertex that this parameter belongs to
Notes
The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.
"constraints": { "select": { "values": [<list(basestring or number) : possible values>] }, "ascii": {}, "unicode": {}, "int": { "min": <int : minimum valid value>, "max": <int : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "float": { "min": <float : minimum valid value>, "max": <float : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "intList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <int : minimum valid value>, "max_val": <int : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "floatList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <float : minimum valid value>, "max_val": <float : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> } }
The keys have meaning as follows:
select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.
- Return type
- get_all_confusion_charts(fallback_to_parent_insights=False)¶
Retrieve a list of all confusion matrices available for the model.
- Parameters
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns
- list of ConfusionChart
Data for all available confusion charts for model.
- get_all_feature_impacts(data_slice_filter=None)¶
Retrieve a list of all feature impact results available for the model.
- Parameters
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.
- Returns
- list of dicts
Data for all available model feature impacts. Or an empty list if not data found.
Examples
model = datarobot.Model(id='model-id', project_id='project-id') # Get feature impact insights for sliced data data_slice = datarobot.DataSlice(id='data-slice-id') sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get feature impact insights for unsliced data data_slice = datarobot.DataSlice() unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get all feature impact insights all_fi = model.get_all_feature_impacts()
- get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶
Retrieve a list of all Lift charts available for the model.
- Parameters
- fallback_to_parent_insightsbool, optional
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns
- list of LiftChart
Data for all available model lift charts. Or an empty list if no data found.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get lift chart insights for sliced data sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id') # Get lift chart insights for unsliced data unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True) # Get all lift chart insights all_lift_charts = model.get_all_lift_charts()
- get_all_multiclass_lift_charts(fallback_to_parent_insights=False)¶
Retrieve a list of all Lift charts available for the model.
- Parameters
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns
- list of LiftChart
Data for all available model lift charts.
- get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶
Retrieve a list of all residuals charts available for the model.
- Parameters
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns
- list of ResidualsChart
Data for all available model residuals charts.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get residuals chart insights for sliced data sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id') # Get residuals chart insights for unsliced data unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True) # Get all residuals chart insights all_residuals_charts = model.get_all_residuals_charts()
- get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)¶
Retrieve a list of all ROC curves available for the model.
- Parameters
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns
- list of RocCurve
Data for all available model ROC curves. Or an empty list if no RocCurves are found.
Examples
model = datarobot.Model.get('project-id', 'model-id') ds_filter=DataSlice(id='data-slice-id') # Get roc curve insights for sliced data sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get roc curve insights for unsliced data data_slice_filter=DataSlice(id=None) unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get all roc curve insights all_roc_curves = model.get_all_roc_curves()
- get_confusion_chart(source, fallback_to_parent_insights=False)¶
Retrieve a multiclass model’s confusion matrix for the specified source.
- Parameters
- sourcestr
Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns
- ConfusionChart
Model ConfusionChart data
- Raises
- ClientError
If the insight is not available for this model
- get_cross_class_accuracy_scores()¶
Retrieves a list of Cross Class Accuracy scores for the model.
- Returns
- json
- get_cross_validation_scores(partition=None, metric=None)¶
Return a dictionary, keyed by metric, showing cross validation scores per partition.
Cross Validation should already have been performed using
cross_validate
ortrain
.Note
Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.
- Parameters
- partitionfloat
optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.
- metric: unicode
optional name of the metric to filter to resulting cross validation scores by
- Returns
- cross_validation_scores: dict
A dictionary keyed by metric showing cross validation scores per partition.
- get_data_disparity_insights(feature, class_name1, class_name2)¶
Retrieve a list of Cross Class Data Disparity insights for the model.
- Parameters
- featurestr
Bias and Fairness protected feature name.
- class_name1str
One of the compared classes
- class_name2str
Another compared class
- Returns
- json
- get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)¶
Retrieve a list of Per Class Bias insights for the model.
- Parameters
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- offsetint, optional
Number of items to skip.
- limitint, optional
Number of items to return.
- Returns
- json
- get_feature_effect(source, data_slice_id=None)¶
Retrieve Feature Effects for the model.
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the available sources.- Parameters
- sourcestring
The source Feature Effects are retrieved for.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, retrieve unsliced insight data.
- Returns
- feature_effectsFeatureEffects
The feature effects data.
- Raises
- ClientError (404)
If the feature effects have not been computed or source is not valid value.
- get_feature_effect_metadata()¶
Retrieve Feature Effects metadata. Response contains status and available model sources.
Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for the project.
Use source to retrieve Feature Effects, selecting one of the provided sources.
- Returns
- feature_effect_metadata: FeatureEffectMetadata
- get_feature_effects_multiclass(source='training', class_=None)¶
Retrieve Feature Effects for the multiclass model.
Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the available sources.- Parameters
- sourcestr
The source Feature Effects are retrieved for.
- class_str or None
The class name Feature Effects are retrieved for.
- Returns
- list
The list of multiclass feature effects.
- Raises
- ClientError (404)
If Feature Effects have not been computed or source is not valid value.
- get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.
Elsewhere this technique is sometimes called ‘Permutation Importance’.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Parameters
- with_metadatabool
The flag indicating if the result should include the metadata as well.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.
- Returns
- list or dict
The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.
Each List item is a dict with the keys
featureName
,impactNormalized
, andimpactUnnormalized
,redundantWith
andcount
.For dict response available keys are:
featureImpacts
- Feature Impact data as a dictionary. Each item is a dict withkeys:
featureName
,impactNormalized
, andimpactUnnormalized
, andredundantWith
.
shapBased
- A boolean that indicates whether Feature Impact was calculated usingShapley values.
ranRedundancyDetection
- A boolean that indicates whether redundant featureidentification was run while calculating this Feature Impact.
rowCount
- An integer or None that indicates the number of rows that was used tocalculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.
count
- An integer with the number of features under thefeatureImpacts
.
- Raises
- ClientError (404)
If the feature impacts have not been computed.
- ValueError
If data_slice_filter passed as None
- get_features_used()¶
Query the server to determine which features were used.
Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.
- Returns
- featureslist of str
The names of the features used in the model.
- Return type
List
[str
]
- get_frozen_child_models()¶
Retrieve the IDs for all models that are frozen from this model.
- Returns
- A list of Models
- get_labelwise_roc_curves(source, fallback_to_parent_insights=False)¶
Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .
New in version v2.24.
- Parameters
- sourcestr
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- Returns
- list ofclass:LabelwiseRocCurve <datarobot.models.roc_curve.LabelwiseRocCurve>
Labelwise ROC Curve instances for
source
and all labels
- Raises
- ClientError
If the insight is not available for this model
- get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve the model Lift chart for the specified source.
- Parameters
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- Returns
- LiftChart
Model lift chart data
- Raises
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- get_missing_report_info()¶
Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.
- Returns
- An iterable of MissingReportPerFeature
The queried model missing report, sorted by missing count (DESCENDING order).
- get_model_blueprint_chart()¶
Retrieve a diagram that can be used to understand data flow in the blueprint.
- Returns
- ModelBlueprintChart
The queried model blueprint chart.
- get_model_blueprint_documents()¶
Get documentation for tasks used in this model.
- Returns
- list of BlueprintTaskDocument
All documents available for the model.
- get_model_blueprint_json()¶
Get the blueprint json representation used by this model.
- Returns
- BlueprintJson
Json representation of the blueprint stages.
- Return type
Dict
[str
,Tuple
[List
[str
],List
[str
],str
]]
- get_multiclass_feature_impact()¶
For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Returns
- feature_impactslist of dict
The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
- Raises
- ClientError (404)
If the multiclass feature impacts have not been computed.
- get_multiclass_lift_chart(source, fallback_to_parent_insights=False)¶
Retrieve model Lift chart for the specified source.
- Parameters
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns
- list of LiftChart
Model lift chart data for each saved target class
- Raises
- ClientError
If the insight is not available for this model
- get_multilabel_lift_charts(source, fallback_to_parent_insights=False)¶
Retrieve model Lift charts for the specified source.
New in version v2.24.
- Parameters
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns
- list of LiftChart
Model lift chart data for each saved target class
- Raises
- ClientError
If the insight is not available for this model
- get_num_iterations_trained()¶
Retrieves the number of estimators trained by early-stopping tree-based models.
– versionadded:: v2.22
- Returns
- projectId: str
id of project containing the model
- modelId: str
id of the model
- data: array
list of numEstimatorsItem objects, one for each modeling stage.
- numEstimatorsItem will be of the form:
- stage: str
indicates the modeling stage (for multi-stage models); None of single-stage models
- numIterations: int
the number of estimators or iterations trained by the model
- get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)¶
Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.
See
get_feature_effect_metadata
for retrieving information of source.- Parameters
- sourcestring
The source Feature Effects are retrieved for.
- max_waitint, optional
The maximum time to wait for a requested Feature Effect job to complete before erroring.
- row_countint, optional
(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- feature_effectsFeatureEffects
The Feature Effects data.
- get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)¶
Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.
- Parameters
- sourcestring
The source Feature Effects retrieve for.
- class_str or None
The class name Feature Effects retrieve for.
- row_countint
The number of rows from dataset to use for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
- featureslist or None
The list of features used to calculate Feature Effects.
- max_waitint, optional
The maximum time to wait for a requested Feature Effects job to complete before erroring.
- Returns
- feature_effectslist of FeatureEffectsMulticlass
The list of multiclass feature effects data.
- get_or_request_feature_impact(max_wait=600, **kwargs)¶
Retrieve feature impact for the model, requesting a job if it hasn’t been run previously
- Parameters
- max_waitint, optional
The maximum time to wait for a requested feature impact job to complete before erroring
- **kwargs
Arbitrary keyword arguments passed to
request_feature_impact
.
- Returns
- feature_impactslist or dict
The feature impact data. See
get_feature_impact
for the exact schema.
- get_parameters()¶
Retrieve model parameters.
- Returns
- ModelParameters
Model parameters for this model.
- get_pareto_front()¶
Retrieve the Pareto Front for a Eureqa model.
This method is only supported for Eureqa models.
- Returns
- ParetoFront
Model ParetoFront data
- get_prime_eligibility()¶
Check if this model can be approximated with DataRobot Prime
- Returns
- prime_eligibilitydict
a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
- get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve model residuals chart for the specified source.
- Parameters
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.
- Returns
- ResidualsChart
Model residuals chart data
- Raises
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.
- Parameters
- sourcestr
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
- Returns
- RocCurve
Model ROC curve data
- Raises
- ClientError
If the insight is not available for this model
- (New in version v3.0) TypeError
If the underlying project type is multilabel
- ValueError
If data_slice_filter passed as None
- get_rulesets()¶
List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.
- Returns
- rulesetslist of Ruleset
- Return type
List
[Ruleset
]
- get_supported_capabilities()¶
Retrieves a summary of the capabilities supported by a model.
New in version v2.14.
- Returns
- supportsBlending: bool
whether the model supports blending
- supportsMonotonicConstraints: bool
whether the model supports monotonic constraints
- hasWordCloud: bool
whether the model has word cloud data available
- eligibleForPrime: bool
whether the model is eligible for Prime
- hasParameters: bool
whether the model has parameters that can be retrieved
- supportsCodeGeneration: bool
(New in version v2.18) whether the model supports code generation
- supportsShap: bool
- (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based
feature Importance
- supportsEarlyStopping: bool
(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.
- get_uri()¶
- Returns
- urlstr
Permanent static hyperlink to this model at leaderboard.
- Return type
str
- get_word_cloud(exclude_stop_words=False)¶
Retrieve word cloud data for the model.
- Parameters
- exclude_stop_wordsbool, optional
Set to True if you want stopwords filtered out of response.
- Returns
- WordCloud
Word cloud data for the model.
- incremental_train(data_stage_id, training_data_name=None)¶
Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.
- Return type
- classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)¶
Retrieve paginated model records, sorted by scores, with optional filtering.
- Parameters
- sort_by_partition: str, one of `validation`, `backtesting`, `crossValidation` or `holdout`
Set the partition to use for sorted (by score) list of models. validation is the default.
- sort_by_metric: str
Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.
- with_metric: str
For a single-metric list of results, specify that project metric.
- search_term: str
If specified, only models containing the term in their name or processes are returned.
- featurelists: list of str
If specified, only models trained on selected featurelists are returned.
- families: list of str
If specified, only models belonging to selected families are returned.
- blueprints: list of str
If specified, only models trained on specified blueprint IDs are returned.
- labels: list of str, `starred` or `prepared for deployment`
If specified, only models tagged with all listed labels are returned.
- characteristics: list of str
If specified, only models matching all listed characteristics are returned.
- training_filters: list of str
If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings
- number_of_clusters: list of int
Filter models by number of clusters. Applicable only in unsupervised clustering projects.
- limit: int
- offset: int
- Returns
- generic_models: list of GenericModel
- Return type
List
[GenericModel
]
- open_in_browser()¶
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type
None
- request_approximation()¶
Request an approximation of this model using DataRobot Prime
This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.
- Returns
- jobJob
the job generating the rulesets
- request_cross_class_accuracy_scores()¶
Request data disparity insights to be computed for the model.
- Returns
- status_idstr
A statusId of computation request.
- request_data_disparity_insights(feature, compared_class_names)¶
Request data disparity insights to be computed for the model.
- Parameters
- featurestr
Bias and Fairness protected feature name.
- compared_class_nameslist(str)
List of two classes to compare
- Returns
- status_idstr
A statusId of computation request.
- request_external_test(dataset_id, actual_value_column=None)¶
Request external test to compute scores and insights on an external test dataset
- Parameters
- dataset_idstring
The dataset to make predictions against (as uploaded from Project.upload_dataset)
- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- Returns
- ——-
- jobJob
a Job representing external dataset insights computation
- request_fairness_insights(fairness_metrics_set=None)¶
Request fairness insights to be computed for the model.
- Parameters
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- Returns
- status_idstr
A statusId of computation request.
- request_feature_effect(row_count=None, data_slice_id=None)¶
Submit request to compute Feature Effects for the model.
See
get_feature_effect
for more information on the result of the job.- Parameters
- row_countint
(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- jobJob
A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
- Raises
- JobAlreadyRequested (422)
If the feature effect have already been requested.
- request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)¶
Request Feature Effects computation for the multiclass model.
See
get_feature_effect
for more information on the result of the job.- Parameters
- row_countint
The number of rows from dataset to use for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by feature impact) used to calculate Feature Effects.
- featureslist or None
The list of features used to calculate Feature Effects.
- Returns
- jobJob
A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.
- request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)¶
Request feature impacts to be computed for the model.
See
get_feature_impact
for more information on the result of the job.- Parameters
- row_countint, optional
The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.
- with_metadatabool, optional
Flag indicating whether the result should include the metadata. If true, metadata is included.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- jobJob or status_id
Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.
- Raises
- JobAlreadyRequested (422)
If the feature impacts have already been requested.
- request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)¶
Train a new frozen model with parameters from this model.
Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.
Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.
Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Parameters
- training_row_countint, optional
the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
- training_durationstr, optional
a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
- training_start_datedatetime.datetime, optional
the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
- training_end_datedatetime.datetime, optional
the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
- time_window_sample_pctint, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_methodstr, optional
(New in version v2.23) defines the way training data is selected. Can be either
random
orlatest
. In combination withtraining_row_count
defines how rows are selected from backtest (latest
by default). When training data is defined using time range (training_duration
oruse_project_settings
) this setting changes the waytime_window_sample_pct
is applied (random
by default). Applicable to OTV projects only.
- Returns
- model_jobModelJob
the modeling job training a frozen model
- Return type
- request_frozen_model(sample_pct=None, training_row_count=None)¶
Train a new frozen model with parameters from this model
Note
This method only works if project the model belongs to is not datetime partitioned. If it is, use
request_frozen_datetime_model
instead.Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
- Parameters
- sample_pctfloat
optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.
- training_row_countint
(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.
- Returns
- model_jobModelJob
the modeling job training a frozen model
- Return type
- request_lift_chart(source, data_slice_id=None)¶
Request the model Lift Chart for the specified source.
- Parameters
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type
- request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)¶
Requests predictions against a previously uploaded dataset.
- Parameters
- dataset_idstring, optional
The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataset
Dataset
, optional The dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataframepd.DataFrame, optional
(New in v3.0) The dataframe to make predictions against
- file_pathstr, optional
(New in v3.0) Path to file to make predictions against
- fileIOBase, optional
(New in v3.0) File to make predictions against
- include_prediction_intervalsbool, optional
(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_sizeint, optional
(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_pointdatetime.datetime or None, optional
(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the
response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
- max_explanations: (New in version v2.21) int optional; specifies the maximum number of
explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
- max_ngram_explanations: optional; int or str
(New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.
- Returns
- jobPredictJob
The job computing the predictions
- Return type
- request_residuals_chart(source, data_slice_id=None)¶
Request the model residuals chart for the specified source.
- Parameters
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type
- request_roc_curve(source, data_slice_id=None)¶
Request the model Roc Curve for the specified source.
- Parameters
- sourcestr
Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type
- request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)¶
Start a job to build training predictions
- Parameters
- data_subsetstr
data set definition to build predictions on. Choices are:
- dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
models in datetime partitioned projects
- dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
all data except training set. Not valid for models in datetime partitioned projects
dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
- dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
- explanation_algorithmdr.enums.EXPLANATIONS_ALGORITHM
(New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).
- max_explanationsint
(New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the
max_explanations
, the sum of remaining values will also be returned asshap_remaining_total
. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored ifexplanation_algorithm
is not set.
- Returns
- Job
an instance of created async job
- retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)¶
Submit a job to the queue to train a blender model.
- Parameters
- sample_pct: float, optional
The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
- featurelist_idstr, optional
The featurelist id
- training_row_countint, optional
The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.
- n_clusters: int, optional
(new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.
- Returns
- jobModelJob
The created job that is retraining the model
- Return type
- set_prediction_threshold(threshold)¶
Set a custom prediction threshold for the model.
May not be used once
prediction_threshold_read_only
is True for this model.- Parameters
- thresholdfloat
only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).
- star_model()¶
Mark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type
None
- start_advanced_tuning_session()¶
Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns
- AdvancedTuningSession
Session for setting up and running Advanced Tuning on a model
- train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)¶
Train the blueprint used in model on a particular featurelist or amount of data.
This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.
Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.
In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.
Note
For datetime partitioned projects, see
train_datetime
instead.- Parameters
- sample_pctfloat, optional
The amount of data to use for training, as a percentage of the project dataset from 0 to 100.
- featurelist_idstr, optional
The identifier of the featurelist to use. If not defined, the featurelist of this model is used.
- scoring_typestr, optional
Either
validation
orcrossValidation
(alsodr.SCORING_TYPE.validation
ordr.SCORING_TYPE.cross_validation
).validation
is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning,crossValidation
can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.- training_row_countint, optional
The number of rows to use to train the requested model.
- monotonic_increasing_featurelist_idstr
(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_idstr
(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.
- Returns
- model_job_idstr
id of created job, can be used as parameter to
ModelJob.get
method orwait_for_async_model_creation
function
Examples
project = Project.get('project-id') model = Model.get('project-id', 'model-id') model_job_id = model.train(training_row_count=project.max_train_rows)
- Return type
str
- train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)¶
Trains this model on a different featurelist or sample size.
Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Parameters
- featurelist_idstr, optional
the featurelist to use to train the model. If not specified, the featurelist of this model is used.
- training_row_countint, optional
the number of rows of data that should be used to train the model. If specified, neither
training_duration
noruse_project_settings
may be specified.- training_durationstr, optional
a duration string specifying what time range the data used to train the model should span. If specified, neither
training_row_count
noruse_project_settings
may be specified.- use_project_settingsbool, optional
(New in version v2.20) defaults to
False
. IfTrue
, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neithertraining_row_count
nortraining_duration
may be specified.- time_window_sample_pctint, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_methodstr, optional
(New in version v2.23) defines the way training data is selected. Can be either
random
orlatest
. In combination withtraining_row_count
defines how rows are selected from backtest (latest
by default). When training data is defined using time range (training_duration
oruse_project_settings
) this setting changes the waytime_window_sample_pct
is applied (random
by default). Applicable to OTV projects only.- monotonic_increasing_featurelist_idstr, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_idstr, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- n_clusters: int, optional
(New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.
- Returns
- jobModelJob
the created job to build the model
- Return type
- train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)¶
Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.
This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.
- Parameters
- data_stage_id: str
The id of the data stage to use for training.
- training_data_namestr, optional
The name of the iteration or data stage to indicate what the incremental learning was performed on.
- data_stage_encodingstr, optional
The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252
- data_stage_encodingstr, optional
The delimiter used by the data in the data stage (default: ‘,’).
- data_stage_compressionstr, optional
The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip
- Returns
- jobModelJob
The created job that is retraining the model
- unstar_model()¶
Unmark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type
None
DatetimeModel¶
- class datarobot.models.DatetimeModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, training_info=None, holdout_score=None, holdout_status=None, data_selection_method=None, backtests=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, effective_feature_derivation_window_start=None, effective_feature_derivation_window_end=None, forecast_window_start=None, forecast_window_end=None, windows_basis_unit=None, model_number=None, parent_model_id=None, supports_composable_ml=None, n_clusters=None, is_n_clusters_dynamically_determined=None, has_empty_clusters=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None, **kwargs)¶
Represents a model from a datetime partitioned project
All durations are specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.Note that only one of training_row_count, training_duration, and training_start_date and training_end_date will be specified, depending on the data_selection_method of the model. Whichever method was selected determines the amount of data used to train on when making predictions and scoring the backtests and the holdout.
- Attributes
- idstr
the id of the model
- project_idstr
the id of the project the model belongs to
- processeslist of str
the processes used by the model
- featurelist_namestr
the name of the featurelist used by the model
- featurelist_idstr
the id of the featurelist used by the model
- sample_pctfloat
the percentage of the project dataset used in training the model
- training_row_countint or None
If specified, an int specifying the number of rows used to train the model and evaluate backtest scores.
- training_durationstr or None
If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- time_window_sample_pctint or None
An integer between 1 and 99 indicating the percentage of sampling within the training window. The points kept are determined by a random uniform sample. If not specified, no sampling was done.
- sampling_methodstr or None
(New in v2.23) indicates the way training data has been selected (either how rows have been selected within backtest or how
time_window_sample_pct
has been applied).- model_typestr
what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
- model_categorystr
what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
- is_frozenbool
whether this model is a frozen model
- blueprint_idstr
the id of the blueprint used in this model
- metricsdict
a mapping from each metric to the model’s scores for that metric. The keys in metrics are the different metrics used to evaluate the model, and the values are the results. The dictionaries inside of metrics will be as described here: ‘validation’, the score for a single backtest; ‘crossValidation’, always None; ‘backtesting’, the average score for all backtests if all are available and computed, or None otherwise; ‘backtestingScores’, a list of scores for all backtests where the score is None if that backtest does not have a score available; and ‘holdout’, the score for the holdout or None if the holdout is locked or the score is unavailable.
- backtestslist of dict
describes what data was used to fit each backtest, the score for the project metric, and why the backtest score is unavailable if it is not provided.
- data_selection_methodstr
which of training_row_count, training_duration, or training_start_data and training_end_date were used to determine the data used to fit the model. One of ‘rowCount’, ‘duration’, or ‘selectedDateRange’.
- training_infodict
describes which data was used to train on when scoring the holdout and making predictions. training_info` will have the following keys: holdout_training_start_date, holdout_training_duration, holdout_training_row_count, holdout_training_end_date, prediction_training_start_date, prediction_training_duration, prediction_training_row_count, prediction_training_end_date. Start and end dates will be datetimes, durations will be duration strings, and rows will be integers.
- holdout_scorefloat or None
the score against the holdout, if available and the holdout is unlocked, according to the project metric.
- holdout_statusstring or None
the status of the holdout score, e.g. “COMPLETED”, “HOLDOUT_BOUNDARIES_EXCEEDED”. Unavailable if the holdout fold was disabled in the partitioning configuration.
- monotonic_increasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraintsbool
optional, whether this model supports enforcing monotonic constraints
- is_starredbool
whether this model marked as starred
- prediction_thresholdfloat
for binary classification projects, the threshold used for predictions
- prediction_threshold_read_onlybool
indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- effective_feature_derivation_window_startint or None
(New in v2.16) For time series projects only. How many units of the
windows_basis_unit
into the past relative to the forecast point the user needs to provide history for at prediction time. This can differ from thefeature_derivation_window_start
set on the project due to the differencing method and period selected, or if the model is a time series native model such as ARIMA. Will be a negative integer in time series projects andNone
otherwise.- effective_feature_derivation_window_endint or None
(New in v2.16) For time series projects only. How many units of the
windows_basis_unit
into the past relative to the forecast point the feature derivation window should end. Will be a non-positive integer in time series projects andNone
otherwise.- forecast_window_startint or None
(New in v2.16) For time series projects only. How many units of the
windows_basis_unit
into the future relative to the forecast point the forecast window should start. Note that this field will be the same as what is shown in the project settings. Will be a non-negative integer in time series projects and None otherwise.- forecast_window_endint or None
(New in v2.16) For time series projects only. How many units of the
windows_basis_unit
into the future relative to the forecast point the forecast window should end. Note that this field will be the same as what is shown in the project settings. Will be a non-negative integer in time series projects and None otherwise.- windows_basis_unitstr or None
(New in v2.16) For time series projects only. Indicates which unit is the basis for the feature derivation window and the forecast window. Note that this field will be the same as what is shown in the project settings. In time series projects, will be either the detected time unit or “ROW”, and None otherwise.
- model_numberinteger
model number assigned to a model
- parent_model_idstr or None
(New in version v2.20) the id of the model that tuning parameters are derived from
- supports_composable_mlbool or None
(New in version v2.26) whether this model is supported in the Composable ML.
- is_n_clusters_dynamically_determinedbool, optional
(New in version 2.27) if
True
, indicates that model determines number of clusters automatically.- n_clustersint, optional
(New in version 2.27) Number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.
- classmethod get(project, model_id)¶
Retrieve a specific datetime model.
If the project does not use datetime partitioning, a ClientError will occur.
- Parameters
- projectstr
the id of the project the model belongs to
- model_idstr
the id of the model to retrieve
- Returns
- modelDatetimeModel
the model
- score_backtests()¶
Compute the scores for all available backtests.
Some backtests may be unavailable if the model is trained into their validation data.
- Returns
- jobJob
a job tracking the backtest computation. When it is complete, all available backtests will have scores computed.
- cross_validate()¶
Inherited from the model. DatetimeModels cannot request cross validation scores; use backtests instead.
- Return type
NoReturn
- get_cross_validation_scores(partition=None, metric=None)¶
Inherited from Model - DatetimeModels cannot request Cross Validation scores,
Use
backtests
instead.- Return type
NoReturn
- request_training_predictions(data_subset, *args, **kwargs)¶
Start a job that builds training predictions.
- Parameters
- data_subsetstr
data set definition to build predictions on. Choices are:
dr.enums.DATA_SUBSET.HOLDOUT for holdout data set only
- dr.enums.DATA_SUBSET.ALL_BACKTESTS for downloading the predictions for all
backtest validation folds. Requires the model to have successfully scored all backtests.
- Returns
- ——-
- Job
an instance of created async job
- get_series_accuracy_as_dataframe(offset=0, limit=100, metric=None, multiseries_value=None, order_by=None, reverse=False)¶
Retrieve series accuracy results for the specified model as a pandas.DataFrame.
- Parameters
- offsetint, optional
The number of results to skip. Defaults to 0 if not specified.
- limitint, optional
The maximum number of results to return. Defaults to 100 if not specified.
- metricstr, optional
The name of the metric to retrieve scores for. If omitted, the default project metric will be used.
- multiseries_valuestr, optional
If specified, only the series containing the given value in one of the series ID columns will be returned.
- order_bystr, optional
Used for sorting the series. Attribute must be one of
datarobot.enums.SERIES_ACCURACY_ORDER_BY
.- reversebool, optional
Used for sorting the series. If
True
, will sort the series in descending order by the attribute specified byorder_by
.
- Returns
- data
A pandas.DataFrame with the Series Accuracy for the specified model.
- download_series_accuracy_as_csv(filename, encoding='utf-8', offset=0, limit=100, metric=None, multiseries_value=None, order_by=None, reverse=False)¶
Save series accuracy results for the specified model in a CSV file.
- Parameters
- filenamestr or file object
The path or file object to save the data to.
- encodingstr, optional
A string representing the encoding to use in the output csv file. Defaults to ‘utf-8’.
- offsetint, optional
The number of results to skip. Defaults to 0 if not specified.
- limitint, optional
The maximum number of results to return. Defaults to 100 if not specified.
- metricstr, optional
The name of the metric to retrieve scores for. If omitted, the default project metric will be used.
- multiseries_valuestr, optional
If specified, only the series containing the given value in one of the series ID columns will be returned.
- order_bystr, optional
Used for sorting the series. Attribute must be one of
datarobot.enums.SERIES_ACCURACY_ORDER_BY
.- reversebool, optional
Used for sorting the series. If
True
, will sort the series in descending order by the attribute specified byorder_by
.
- get_series_clusters(offset=0, limit=100, order_by=None, reverse=False)¶
Retrieve a dictionary of series and the clusters assigned to each series. This is only usable for clustering projects.
- Parameters
- offsetint, optional
The number of results to skip. Defaults to 0 if not specified.
- limitint, optional
The maximum number of results to return. Defaults to 100 if not specified.
- order_bystr, optional
Used for sorting the series. Attribute must be one of
datarobot.enums.SERIES_ACCURACY_ORDER_BY
.- reversebool, optional
Used for sorting the series. If
True
, will sort the series in descending order by the attribute specified byorder_by
.
- Returns
- Dict
A dictionary of the series in the dataset with their associated cluster
- Raises
- ValueError
If the model type returns an unsupported insight
- ClientError
If the insight is not available for this model
- Return type
Dict
[str
,str
]
- compute_series_accuracy(compute_all_series=False)¶
Compute series accuracy for the model.
- Parameters
- compute_all_seriesbool, optional
Calculate accuracy for all series or only first 1000.
- Returns
- Job
an instance of the created async job
- retrain(time_window_sample_pct=None, featurelist_id=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, sampling_method=None, n_clusters=None)¶
Retrain an existing datetime model using a new training period for the model’s training set (with optional time window sampling) or a different feature list.
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Parameters
- featurelist_idstr, optional
The ID of the featurelist to use.
- training_row_countint, optional
The number of rows to train the model on. If this parameter is used then sample_pct cannot be specified.
- time_window_sample_pctint, optional
An int between
1
and99
indicating the percentage of sampling within the time window. The points kept are determined by a random uniform sample. If specified, training_row_count must not be specified and either training_duration or training_start_date and training_end_date must be specified.- training_durationstr, optional
A duration string representing the training duration for the submitted model. If specified then training_row_count, training_start_date, and training_end_date cannot be specified.
- training_start_datestr, optional
A datetime string representing the start date of the data to use for training this model. If specified, training_end_date must also be specified, and training_duration cannot be specified. The value must be before the training_end_date value.
- training_end_datestr, optional
A datetime string representing the end date of the data to use for training this model. If specified, training_start_date must also be specified, and training_duration cannot be specified. The value must be after the training_start_date value.
- sampling_methodstr, optional
(New in version v2.23) defines the way training data is selected. Can be either
random
orlatest
. In combination withtraining_row_count
defines how rows are selected from backtest (latest
by default). When training data is defined using time range (training_duration
oruse_project_settings
) this setting changes the waytime_window_sample_pct
is applied (random
by default). Applicable to OTV projects only.- n_clustersint, optional
(New in version 2.27) Number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.
- Returns
- jobModelJob
The created job that is retraining the model
- get_feature_effect_metadata()¶
Retrieve Feature Effect metadata for each backtest. Response contains status and available sources for each backtest of the model.
Each backtest is available for training and validation
If holdout is configured for the project it has holdout as backtestIndex. It has training and holdout sources available.
Start/stop models contain a single response item with startstop value for backtestIndex.
Feature Effect of training is always available (except for the old project which supports only Feature Effect for validation).
When a model is trained into validation or holdout without stacked prediction (e.g. no out-of-sample prediction in validation or holdout), Feature Effect is not available for validation or holdout.
Feature Effect for holdout is not available when there is no holdout configured for the project.
source is expected parameter to retrieve Feature Effect. One of provided sources shall be used.
backtestIndex is expected parameter to submit compute request and retrieve Feature Effect. One of provided backtest indexes shall be used.
- Returns
- feature_effect_metadata: FeatureEffectMetadataDatetime
- request_feature_effect(backtest_index, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Request feature effects to be computed for the model.
See
get_feature_effect
for more information on the result of the job.See
get_feature_effect_metadata
for retrieving information of backtest_index.- Parameters
- backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.
The backtest index to retrieve Feature Effects for.
- Returns
- jobJob
A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
- Raises
- JobAlreadyRequested (422)
If the feature effect have already been requested.
- get_feature_effect(source, backtest_index, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve Feature Effects for the model.
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information of source, backtest_index.- Parameters
- source: string
The source Feature Effects are retrieved for. One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the available sources for feature effect.
- backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.
The backtest index to retrieve Feature Effects for.
- Returns
- feature_effects: FeatureEffects
The feature effects data.
- Raises
- ClientError (404)
If the feature effects have not been computed or source is not valid value.
- get_or_request_feature_effect(source, backtest_index, max_wait=600, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve Feature Effects computations for the model, requesting a new job if it hasn’t been run previously.
See
get_feature_effect_metadata
for retrieving information of source, backtest_index.- Parameters
- max_waitint, optional
The maximum time to wait for a requested feature effect job to complete before erroring
- sourcestring
The source Feature Effects are retrieved for. One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the available sources for feature effect.
- backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.
The backtest index to retrieve Feature Effects for.
- Returns
- feature_effectsFeatureEffects
The feature effects data.
- request_feature_effects_multiclass(backtest_index, row_count=None, top_n_features=None, features=None)¶
Request feature effects to be computed for the multiclass datetime model.
See
get_feature_effect
for more information on the result of the job.- Parameters
- backtest_indexstr
The backtest index to use for Feature Effects calculation.
- row_countint
The number of rows from dataset to use for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
- featureslist or None
The list of features to use to calculate Feature Effects.
- Returns
- jobJob
A Job representing Feature Effects computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.
- get_feature_effects_multiclass(backtest_index, source='training', class_=None)¶
Retrieve Feature Effects for the multiclass datetime model.
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the available sources.- Parameters
- backtest_indexstr
The backtest index to retrieve Feature Effects for.
- sourcestr
The source Feature Effects are retrieved for.
- class_str or None
The class name Feature Effects are retrieved for.
- Returns
- list
The list of multiclass Feature Effects.
- Raises
- ClientError (404)
If the Feature Effects have not been computed or source is not valid value.
- get_or_request_feature_effects_multiclass(backtest_index, source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)¶
Retrieve Feature Effects for a datetime multiclass model, and request a job if it hasn’t been run previously.
- Parameters
- backtest_indexstr
The backtest index to retrieve Feature Effects for.
- sourcestring
The source from which Feature Effects are retrieved.
- class_str or None
The class name Feature Effects retrieve for.
- row_countint
The number of rows used from the dataset for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by feature impact) used to calculate Feature Effects.
- featureslist or None
The list of features used to calculate Feature Effects.
- max_waitint, optional
The maximum time to wait for a requested feature effect job to complete before erroring.
- Returns
- feature_effectslist of FeatureEffectsMulticlass
The list of multiclass feature effects data.
- calculate_prediction_intervals(prediction_intervals_size)¶
Calculate prediction intervals for this DatetimeModel for the specified size.
New in version v2.19.
- Parameters
- prediction_intervals_sizeint
The prediction interval’s size to calculate for this model. See the prediction intervals documentation for more information.
- Returns
- jobJob
a
Job
tracking the prediction intervals computation
- Return type
- get_calculated_prediction_intervals(offset=None, limit=None)¶
Retrieve a list of already-calculated prediction intervals for this model
New in version v2.19.
- Parameters
- offsetint, optional
If provided, this many results will be skipped
- limitint, optional
If provided, at most this many results will be returned. If not provided, will return at most 100 results.
- Returns
- list[int]
A descending-ordered list of already-calculated prediction interval sizes
- compute_datetime_trend_plots(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance_start=None, forecast_distance_end=None)¶
Computes datetime trend plots (Accuracy over Time, Forecast vs Actual, Anomaly over Time) for this model
New in version v2.25.
- Parameters
- backtestint or string, optional
Compute plots for a specific backtest (use the backtest index starting from zero). To compute plots for holdout, use
dr.enums.DATA_SUBSET.HOLDOUT
- sourcestring, optional
The source of the data for the backtest/holdout. Attribute must be one of
dr.enums.SOURCE_TYPE
- forecast_distance_startint, optional:
The start of forecast distance range (forecast window) to compute. If not specified, the first forecast distance for this project will be used. Only for time series supervised models
- forecast_distance_endint, optional:
The end of forecast distance range (forecast window) to compute. If not specified, the last forecast distance for this project will be used. Only for time series supervised models
- Returns
- jobJob
a
Job
tracking the datetime trend plots computation
Notes
Forecast distance specifies the number of time steps between the predicted point and the origin point.
For the multiseries models only first 1000 series in alphabetical order and an average plot for them will be computed.
Maximum 100 forecast distances can be requested for calculation in time series supervised projects.
- get_accuracy_over_time_plots_metadata(forecast_distance=None)¶
Retrieve Accuracy over Time plots metadata for this model.
New in version v2.25.
- Parameters
- forecast_distanceint, optional
Forecast distance to retrieve the metadata for. If not specified, the first forecast distance for this project will be used. Only available for time series projects.
- Returns
- metadataAccuracyOverTimePlotsMetadata
a
AccuracyOverTimePlotsMetadata
representing Accuracy over Time plots metadata
- get_accuracy_over_time_plot(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance=None, series_id=None, resolution=None, max_bin_size=None, start_date=None, end_date=None, max_wait=600)¶
Retrieve Accuracy over Time plots for this model.
New in version v2.25.
- Parameters
- backtestint or string, optional
Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use
dr.enums.DATA_SUBSET.HOLDOUT
- sourcestring, optional
The source of the data for the backtest/holdout. Attribute must be one of
dr.enums.SOURCE_TYPE
- forecast_distanceint, optional
Forecast distance to retrieve the plots for. If not specified, the first forecast distance for this project will be used. Only available for time series projects.
- series_idstring, optional
The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
- resolutionstring, optional
Specifying at which resolution the data should be binned. If not provided an optimal resolution will be used to build chart data with number of bins <=
max_bin_size
. One ofdr.enums.DATETIME_TREND_PLOTS_RESOLUTION
.- max_bin_sizeint, optional
An int between
1
and1000
, which specifies the maximum number of bins for the retrieval. Default is500
.- start_datedatetime.datetime, optional
The start of the date range to return. If not specified, start date for requested plot will be used.
- end_datedatetime.datetime, optional
The end of the date range to return. If not specified, end date for requested plot will be used.
- max_waitint or None, optional
The maximum time to wait for a compute job to complete before retrieving the plots. Default is
dr.enums.DEFAULT_MAX_WAIT
. If0
orNone
, the plots would be retrieved without attempting the computation.
- Returns
- plotAccuracyOverTimePlot
a
AccuracyOverTimePlot
representing Accuracy over Time plot
Examples
import datarobot as dr import pandas as pd model = dr.DatetimeModel(project_id=project_id, id=model_id) plot = model.get_accuracy_over_time_plot() df = pd.DataFrame.from_dict(plot.bins) figure = df.plot("start_date", ["actual", "predicted"]).get_figure() figure.savefig("accuracy_over_time.png")
- get_accuracy_over_time_plot_preview(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance=None, series_id=None, max_wait=600)¶
Retrieve Accuracy over Time preview plots for this model.
New in version v2.25.
- Parameters
- backtestint or string, optional
Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use
dr.enums.DATA_SUBSET.HOLDOUT
- sourcestring, optional
The source of the data for the backtest/holdout. Attribute must be one of
dr.enums.SOURCE_TYPE
- forecast_distanceint, optional
Forecast distance to retrieve the plots for. If not specified, the first forecast distance for this project will be used. Only available for time series projects.
- series_idstring, optional
The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
- max_waitint or None, optional
The maximum time to wait for a compute job to complete before retrieving the plots. Default is
dr.enums.DEFAULT_MAX_WAIT
. If0
orNone
, the plots would be retrieved without attempting the computation.
- Returns
- plotAccuracyOverTimePlotPreview
a
AccuracyOverTimePlotPreview
representing Accuracy over Time plot preview
Examples
import datarobot as dr import pandas as pd model = dr.DatetimeModel(project_id=project_id, id=model_id) plot = model.get_accuracy_over_time_plot_preview() df = pd.DataFrame.from_dict(plot.bins) figure = df.plot("start_date", ["actual", "predicted"]).get_figure() figure.savefig("accuracy_over_time_preview.png")
- get_forecast_vs_actual_plots_metadata()¶
Retrieve Forecast vs Actual plots metadata for this model.
New in version v2.25.
- Returns
- metadataForecastVsActualPlotsMetadata
a
ForecastVsActualPlotsMetadata
representing Forecast vs Actual plots metadata
- get_forecast_vs_actual_plot(backtest=0, source=SOURCE_TYPE.VALIDATION, forecast_distance_start=None, forecast_distance_end=None, series_id=None, resolution=None, max_bin_size=None, start_date=None, end_date=None, max_wait=600)¶
Retrieve Forecast vs Actual plots for this model.
New in version v2.25.
- Parameters
- backtestint or string, optional
Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use
dr.enums.DATA_SUBSET.HOLDOUT
- sourcestring, optional
The source of the data for the backtest/holdout. Attribute must be one of
dr.enums.SOURCE_TYPE
- forecast_distance_startint, optional:
The start of forecast distance range (forecast window) to retrieve. If not specified, the first forecast distance for this project will be used.
- forecast_distance_endint, optional:
The end of forecast distance range (forecast window) to retrieve. If not specified, the last forecast distance for this project will be used.
- series_idstring, optional
The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
- resolutionstring, optional
Specifying at which resolution the data should be binned. If not provided an optimal resolution will be used to build chart data with number of bins <=
max_bin_size
. One ofdr.enums.DATETIME_TREND_PLOTS_RESOLUTION
.- max_bin_sizeint, optional
An int between
1
and1000
, which specifies the maximum number of bins for the retrieval. Default is500
.- start_datedatetime.datetime, optional
The start of the date range to return. If not specified, start date for requested plot will be used.
- end_datedatetime.datetime, optional
The end of the date range to return. If not specified, end date for requested plot will be used.
- max_waitint or None, optional
The maximum time to wait for a compute job to complete before retrieving the plots. Default is
dr.enums.DEFAULT_MAX_WAIT
. If0
orNone
, the plots would be retrieved without attempting the computation.
- Returns
- plotForecastVsActualPlot
a
ForecastVsActualPlot
representing Forecast vs Actual plot
Examples
import datarobot as dr import pandas as pd import matplotlib.pyplot as plt model = dr.DatetimeModel(project_id=project_id, id=model_id) plot = model.get_forecast_vs_actual_plot() df = pd.DataFrame.from_dict(plot.bins) # As an example, get the forecasts for the 10th point forecast_point_index = 10 # Pad the forecasts for plotting. The forecasts length must match the df length forecasts = [None] * forecast_point_index + df.forecasts[forecast_point_index] forecasts = forecasts + [None] * (len(df) - len(forecasts)) plt.plot(df.start_date, df.actual, label="Actual") plt.plot(df.start_date, forecasts, label="Forecast") forecast_point = df.start_date[forecast_point_index] plt.title("Forecast vs Actual (Forecast Point {})".format(forecast_point)) plt.legend() plt.savefig("forecast_vs_actual.png")
- get_forecast_vs_actual_plot_preview(backtest=0, source=SOURCE_TYPE.VALIDATION, series_id=None, max_wait=600)¶
Retrieve Forecast vs Actual preview plots for this model.
New in version v2.25.
- Parameters
- backtestint or string, optional
Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use
dr.enums.DATA_SUBSET.HOLDOUT
- sourcestring, optional
The source of the data for the backtest/holdout. Attribute must be one of
dr.enums.SOURCE_TYPE
- series_idstring, optional
The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
- max_waitint or None, optional
The maximum time to wait for a compute job to complete before retrieving the plots. Default is
dr.enums.DEFAULT_MAX_WAIT
. If0
orNone
, the plots would be retrieved without attempting the computation.
- Returns
- plotForecastVsActualPlotPreview
a
ForecastVsActualPlotPreview
representing Forecast vs Actual plot preview
Examples
import datarobot as dr import pandas as pd model = dr.DatetimeModel(project_id=project_id, id=model_id) plot = model.get_forecast_vs_actual_plot_preview() df = pd.DataFrame.from_dict(plot.bins) figure = df.plot("start_date", ["actual", "predicted"]).get_figure() figure.savefig("forecast_vs_actual_preview.png")
- get_anomaly_over_time_plots_metadata()¶
Retrieve Anomaly over Time plots metadata for this model.
New in version v2.25.
- Returns
- metadataAnomalyOverTimePlotsMetadata
a
AnomalyOverTimePlotsMetadata
representing Anomaly over Time plots metadata
- get_anomaly_over_time_plot(backtest=0, source=SOURCE_TYPE.VALIDATION, series_id=None, resolution=None, max_bin_size=None, start_date=None, end_date=None, max_wait=600)¶
Retrieve Anomaly over Time plots for this model.
New in version v2.25.
- Parameters
- backtestint or string, optional
Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use
dr.enums.DATA_SUBSET.HOLDOUT
- sourcestring, optional
The source of the data for the backtest/holdout. Attribute must be one of
dr.enums.SOURCE_TYPE
- series_idstring, optional
The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
- resolutionstring, optional
Specifying at which resolution the data should be binned. If not provided an optimal resolution will be used to build chart data with number of bins <=
max_bin_size
. One ofdr.enums.DATETIME_TREND_PLOTS_RESOLUTION
.- max_bin_sizeint, optional
An int between
1
and1000
, which specifies the maximum number of bins for the retrieval. Default is500
.- start_datedatetime.datetime, optional
The start of the date range to return. If not specified, start date for requested plot will be used.
- end_datedatetime.datetime, optional
The end of the date range to return. If not specified, end date for requested plot will be used.
- max_waitint or None, optional
The maximum time to wait for a compute job to complete before retrieving the plots. Default is
dr.enums.DEFAULT_MAX_WAIT
. If0
orNone
, the plots would be retrieved without attempting the computation.
- Returns
- plotAnomalyOverTimePlot
a
AnomalyOverTimePlot
representing Anomaly over Time plot
Examples
import datarobot as dr import pandas as pd model = dr.DatetimeModel(project_id=project_id, id=model_id) plot = model.get_anomaly_over_time_plot() df = pd.DataFrame.from_dict(plot.bins) figure = df.plot("start_date", "predicted").get_figure() figure.savefig("anomaly_over_time.png")
- get_anomaly_over_time_plot_preview(prediction_threshold=0.5, backtest=0, source=SOURCE_TYPE.VALIDATION, series_id=None, max_wait=600)¶
Retrieve Anomaly over Time preview plots for this model.
New in version v2.25.
- Parameters
- prediction_threshold: float, optional
Only bins with predictions exceeding this threshold will be returned in the response.
- backtestint or string, optional
Retrieve plots for a specific backtest (use the backtest index starting from zero). To retrieve plots for holdout, use
dr.enums.DATA_SUBSET.HOLDOUT
- sourcestring, optional
The source of the data for the backtest/holdout. Attribute must be one of
dr.enums.SOURCE_TYPE
- series_idstring, optional
The name of the series to retrieve for multiseries projects. If not provided an average plot for the first 1000 series will be retrieved.
- max_waitint or None, optional
The maximum time to wait for a compute job to complete before retrieving the plots. Default is
dr.enums.DEFAULT_MAX_WAIT
. If0
orNone
, the plots would be retrieved without attempting the computation.
- Returns
- plotAnomalyOverTimePlotPreview
a
AnomalyOverTimePlotPreview
representing Anomaly over Time plot preview
Examples
import datarobot as dr import pandas as pd import matplotlib.pyplot as plt model = dr.DatetimeModel(project_id=project_id, id=model_id) plot = model.get_anomaly_over_time_plot_preview(prediction_threshold=0.01) df = pd.DataFrame.from_dict(plot.bins) x = pd.date_range( plot.start_date, plot.end_date, freq=df.end_date[0] - df.start_date[0] ) plt.plot(x, [0] * len(x), label="Date range") plt.plot(df.start_date, [0] * len(df.start_date), "ro", label="Anomaly") plt.yticks([]) plt.legend() plt.savefig("anomaly_over_time_preview.png")
- initialize_anomaly_assessment(backtest, source, series_id=None)¶
Initialize the anomaly assessment insight and calculate Shapley explanations for the most anomalous points in the subset. The insight is available for anomaly detection models in time series unsupervised projects which also support calculation of Shapley values.
- Parameters
- backtest: int starting with 0 or “holdout”
The backtest to compute insight for.
- source: “training” or “validation”
The source to compute insight for.
- series_id: string
Required for multiseries projects. The series id to compute insight for. Say if there is a series column containing cities, the example of the series name to pass would be “Boston”
- Returns
- AnomalyAssessmentRecord
- get_anomaly_assessment_records(backtest=None, source=None, series_id=None, limit=100, offset=0, with_data_only=False)¶
Retrieve computed Anomaly Assessment records for this model. Model must be an anomaly detection model in time series unsupervised project which also supports calculation of Shapley values.
Records can be filtered by the data backtest, source and series_id. The results can be limited.
New in version v2.25.
- Parameters
- backtest: int starting with 0 or “holdout”
The backtest of the data to filter records by.
- source: “training” or “validation”
The source of the data to filter records by.
- series_id: string
The series id to filter records by.
- limit: int, optional
- offset: int, optional
- with_data_only: bool, optional
Whether to return only records with preview and explanations available. False by default.
- Returns
- recordslist of AnomalyAssessmentRecord
a
AnomalyAssessmentRecord
representing Anomaly Assessment Record
- get_feature_impact(with_metadata=False, backtest=None, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.
Else where this technique is sometimes called ‘Permutation Importance’.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Parameters
- with_metadatabool
The flag indicating if the result should include the metadata as well.
- backtestint or string
The index of the backtest unless it is holdout then it is string ‘holdout’. This is supported only in DatetimeModels
- data_slice_filterDataSlice, optional
(New in version v3.4) A data slice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
- Returns
- list or dict
The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.
Each List item is a dict with the keys
featureName
,impactNormalized
, andimpactUnnormalized
,redundantWith
andcount
.For dict response available keys are:
featureImpacts
- Feature Impact data as a dictionary. Each item is a dict withkeys:
featureName
,impactNormalized
, andimpactUnnormalized
, andredundantWith
.
shapBased
- A boolean that indicates whether Feature Impact was calculated usingShapley values.
ranRedundancyDetection
- A boolean that indicates whether redundant featureidentification was run while calculating this Feature Impact.
rowCount
- An integer or None that indicates the number of rows that was used tocalculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.
count
- An integer with the number of features under thefeatureImpacts
.
- Raises
- ClientError (404)
If the feature impacts have not been computed.
- request_feature_impact(row_count=None, with_metadata=False, backtest=None, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Request feature impacts to be computed for the model.
See
get_feature_impact
for more information on the result of the job.- Parameters
- row_countint
The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multi-class (that has a separate method) and time series projects.
- with_metadatabool
The flag indicating if the result should include the metadata as well.
- backtestint or string
The index of the backtest unless it is holdout then it is string ‘holdout’. This is supported only in DatetimeModels
- data_slice_filterDataSlice, optional
(New in version v3.4) A data slice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
- Returns
- jobJob
A Job representing the feature impact computation. To get the completed feature impact data, use job.get_result or job.get_result_when_complete.
- Raises
- JobAlreadyRequested (422)
If the feature impacts have already been requested.
- get_or_request_feature_impact(max_wait=600, row_count=None, with_metadata=False, backtest=None, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve feature impact for the model, requesting a job if it hasn’t been run previously
- Parameters
- max_waitint, optional
The maximum time to wait for a requested feature impact job to complete before erroring
- row_countint
The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multi-class (that has a separate method) and time series projects.
- with_metadatabool
The flag indicating if the result should include the metadata as well.
- backteststr
Feature Impact backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
- data_slice_filterDataSlice, optional
(New in version v3.4) A data slice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
- Returns
- ——-
- feature_impactslist or dict
The feature impact data. See
get_feature_impact
for the exact schema.
- request_lift_chart(source=None, backtest_index=None, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
(New in version v3.4) Request the model Lift Chart for the specified backtest data slice.
- Parameters
- sourcestr
(Deprecated in version v3.4) Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. If backtest_index is present then this will be ignored.
- backtest_indexstr
Lift chart data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
- data_slice_filterDataSlice, optional
A data slice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then request_lift_chart will raise a ValueError.
- Returns
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type
- get_lift_chart(source=None, backtest_index=None, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
(New in version v3.4) Retrieve the model Lift chart for the specified backtest and data slice.
- Parameters
- sourcestr
(Deprecated in version v3.4) Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model. If backtest_index is present then this will be ignored.
- backtest_indexstr
Lift chart data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filterDataSlice, optional
A data slice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- Returns
- LiftChart
Model lift chart data
- Raises
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- request_roc_curve(source=None, backtest_index=None, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
(New in version v3.4) Request the binary model Roc Curve for the specified backtest and data slice.
- Parameters
- sourcestr
(Deprecated in version v3.4) Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. If backtest_index is present then this will be ignored.
- backtest_indexstr
ROC curve data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
- data_slice_filterDataSlice, optional
A data slice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then request_roc_curve will raise a ValueError.
- Returns
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type
- get_roc_curve(source=None, backtest_index=None, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
(New in version v3.4) Retrieve the ROC curve for a binary model for the specified backtest and data slice.
- Parameters
- sourcestr
(Deprecated in version v3.4) ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model. If backtest_index is present then this will be ignored.
- backtest_indexstr
ROC curve data backtest. Can be ‘holdout’ or numbers from 0 up to max number of backtests in project.
- fallback_to_parent_insightsbool
Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- data_slice_filterDataSlice, optional
A data slice used to filter the return values based on the data slice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
- Returns
- RocCurve
Model ROC curve data
- Raises
- ClientError
If the insight is not available for this model
- TypeError
If the underlying project type is multilabel
- ValueError
If data_slice_filter passed as None
- advanced_tune(params, description=None)¶
Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Parameters
- paramsdict
Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- descriptionstr
Human-readable string describing the newly advanced-tuned model
- Returns
- ModelJob
The created job to build the model
- Return type
- delete()¶
Delete a model from the project’s leaderboard.
- Return type
None
- download_scoring_code(file_name, source_code=False)¶
Download the Scoring Code JAR.
- Parameters
- file_namestr
File path where scoring code will be saved.
- source_codebool, optional
Set to True to download source code archive. It will not be executable.
- Return type
None
- download_training_artifact(file_name)¶
Retrieve trained artifact(s) from a model containing one or more custom tasks.
Artifact(s) will be downloaded to the specified local filepath.
- Parameters
- file_namestr
File path where trained model artifact(s) will be saved.
- classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters
- datadict
Correctly snake_cased keys and their values.
- Return type
TypeVar
(T
, bound=APIObject
)
- get_advanced_tuning_parameters()¶
Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns
- dict
A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.
tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.
tuning_parameters is a list of a dicts, each has the following keys
parameter_name : (str) name of the parameter (unique per task, see below)
parameter_id : (str) opaque ID string uniquely identifying parameter
default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)
current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.
task_name : (str) name of the task that this parameter belongs to
constraints: (dict) see the notes below
vertex_id: (str) ID of vertex that this parameter belongs to
Notes
The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.
"constraints": { "select": { "values": [<list(basestring or number) : possible values>] }, "ascii": {}, "unicode": {}, "int": { "min": <int : minimum valid value>, "max": <int : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "float": { "min": <float : minimum valid value>, "max": <float : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "intList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <int : minimum valid value>, "max_val": <int : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "floatList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <float : minimum valid value>, "max_val": <float : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> } }
The keys have meaning as follows:
select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.
- Return type
- get_all_confusion_charts(fallback_to_parent_insights=False)¶
Retrieve a list of all confusion matrices available for the model.
- Parameters
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns
- list of ConfusionChart
Data for all available confusion charts for model.
- get_all_feature_impacts(data_slice_filter=None)¶
Retrieve a list of all feature impact results available for the model.
- Parameters
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.
- Returns
- list of dicts
Data for all available model feature impacts. Or an empty list if not data found.
Examples
model = datarobot.Model(id='model-id', project_id='project-id') # Get feature impact insights for sliced data data_slice = datarobot.DataSlice(id='data-slice-id') sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get feature impact insights for unsliced data data_slice = datarobot.DataSlice() unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get all feature impact insights all_fi = model.get_all_feature_impacts()
- get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶
Retrieve a list of all Lift charts available for the model.
- Parameters
- fallback_to_parent_insightsbool, optional
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns
- list of LiftChart
Data for all available model lift charts. Or an empty list if no data found.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get lift chart insights for sliced data sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id') # Get lift chart insights for unsliced data unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True) # Get all lift chart insights all_lift_charts = model.get_all_lift_charts()
- get_all_multiclass_lift_charts(fallback_to_parent_insights=False)¶
Retrieve a list of all Lift charts available for the model.
- Parameters
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns
- list of LiftChart
Data for all available model lift charts.
- get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶
Retrieve a list of all residuals charts available for the model.
- Parameters
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns
- list of ResidualsChart
Data for all available model residuals charts.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get residuals chart insights for sliced data sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id') # Get residuals chart insights for unsliced data unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True) # Get all residuals chart insights all_residuals_charts = model.get_all_residuals_charts()
- get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)¶
Retrieve a list of all ROC curves available for the model.
- Parameters
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns
- list of RocCurve
Data for all available model ROC curves. Or an empty list if no RocCurves are found.
Examples
model = datarobot.Model.get('project-id', 'model-id') ds_filter=DataSlice(id='data-slice-id') # Get roc curve insights for sliced data sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get roc curve insights for unsliced data data_slice_filter=DataSlice(id=None) unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get all roc curve insights all_roc_curves = model.get_all_roc_curves()
- get_confusion_chart(source, fallback_to_parent_insights=False)¶
Retrieve a multiclass model’s confusion matrix for the specified source.
- Parameters
- sourcestr
Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns
- ConfusionChart
Model ConfusionChart data
- Raises
- ClientError
If the insight is not available for this model
- get_cross_class_accuracy_scores()¶
Retrieves a list of Cross Class Accuracy scores for the model.
- Returns
- json
- get_data_disparity_insights(feature, class_name1, class_name2)¶
Retrieve a list of Cross Class Data Disparity insights for the model.
- Parameters
- featurestr
Bias and Fairness protected feature name.
- class_name1str
One of the compared classes
- class_name2str
Another compared class
- Returns
- json
- get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)¶
Retrieve a list of Per Class Bias insights for the model.
- Parameters
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- offsetint, optional
Number of items to skip.
- limitint, optional
Number of items to return.
- Returns
- json
- get_features_used()¶
Query the server to determine which features were used.
Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.
- Returns
- featureslist of str
The names of the features used in the model.
- Return type
List
[str
]
- get_frozen_child_models()¶
Retrieve the IDs for all models that are frozen from this model.
- Returns
- A list of Models
- get_labelwise_roc_curves(source, fallback_to_parent_insights=False)¶
Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .
New in version v2.24.
- Parameters
- sourcestr
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- Returns
- list ofclass:LabelwiseRocCurve <datarobot.models.roc_curve.LabelwiseRocCurve>
Labelwise ROC Curve instances for
source
and all labels
- Raises
- ClientError
If the insight is not available for this model
- get_missing_report_info()¶
Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.
- Returns
- An iterable of MissingReportPerFeature
The queried model missing report, sorted by missing count (DESCENDING order).
- get_model_blueprint_chart()¶
Retrieve a diagram that can be used to understand data flow in the blueprint.
- Returns
- ModelBlueprintChart
The queried model blueprint chart.
- get_model_blueprint_documents()¶
Get documentation for tasks used in this model.
- Returns
- list of BlueprintTaskDocument
All documents available for the model.
- get_model_blueprint_json()¶
Get the blueprint json representation used by this model.
- Returns
- BlueprintJson
Json representation of the blueprint stages.
- Return type
Dict
[str
,Tuple
[List
[str
],List
[str
],str
]]
- get_multiclass_feature_impact()¶
For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Returns
- feature_impactslist of dict
The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
- Raises
- ClientError (404)
If the multiclass feature impacts have not been computed.
- get_multiclass_lift_chart(source, fallback_to_parent_insights=False)¶
Retrieve model Lift chart for the specified source.
- Parameters
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns
- list of LiftChart
Model lift chart data for each saved target class
- Raises
- ClientError
If the insight is not available for this model
- get_multilabel_lift_charts(source, fallback_to_parent_insights=False)¶
Retrieve model Lift charts for the specified source.
New in version v2.24.
- Parameters
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns
- list of LiftChart
Model lift chart data for each saved target class
- Raises
- ClientError
If the insight is not available for this model
- get_num_iterations_trained()¶
Retrieves the number of estimators trained by early-stopping tree-based models.
– versionadded:: v2.22
- Returns
- projectId: str
id of project containing the model
- modelId: str
id of the model
- data: array
list of numEstimatorsItem objects, one for each modeling stage.
- numEstimatorsItem will be of the form:
- stage: str
indicates the modeling stage (for multi-stage models); None of single-stage models
- numIterations: int
the number of estimators or iterations trained by the model
- get_parameters()¶
Retrieve model parameters.
- Returns
- ModelParameters
Model parameters for this model.
- get_pareto_front()¶
Retrieve the Pareto Front for a Eureqa model.
This method is only supported for Eureqa models.
- Returns
- ParetoFront
Model ParetoFront data
- get_prime_eligibility()¶
Check if this model can be approximated with DataRobot Prime
- Returns
- prime_eligibilitydict
a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
- get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve model residuals chart for the specified source.
- Parameters
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.
- Returns
- ResidualsChart
Model residuals chart data
- Raises
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- get_rulesets()¶
List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.
- Returns
- rulesetslist of Ruleset
- Return type
List
[Ruleset
]
- get_supported_capabilities()¶
Retrieves a summary of the capabilities supported by a model.
New in version v2.14.
- Returns
- supportsBlending: bool
whether the model supports blending
- supportsMonotonicConstraints: bool
whether the model supports monotonic constraints
- hasWordCloud: bool
whether the model has word cloud data available
- eligibleForPrime: bool
whether the model is eligible for Prime
- hasParameters: bool
whether the model has parameters that can be retrieved
- supportsCodeGeneration: bool
(New in version v2.18) whether the model supports code generation
- supportsShap: bool
- (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based
feature Importance
- supportsEarlyStopping: bool
(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.
- get_uri()¶
- Returns
- urlstr
Permanent static hyperlink to this model at leaderboard.
- Return type
str
- get_word_cloud(exclude_stop_words=False)¶
Retrieve word cloud data for the model.
- Parameters
- exclude_stop_wordsbool, optional
Set to True if you want stopwords filtered out of response.
- Returns
- WordCloud
Word cloud data for the model.
- incremental_train(data_stage_id, training_data_name=None)¶
Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.
- Return type
- classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)¶
Retrieve paginated model records, sorted by scores, with optional filtering.
- Parameters
- sort_by_partition: str, one of `validation`, `backtesting`, `crossValidation` or `holdout`
Set the partition to use for sorted (by score) list of models. validation is the default.
- sort_by_metric: str
Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.
- with_metric: str
For a single-metric list of results, specify that project metric.
- search_term: str
If specified, only models containing the term in their name or processes are returned.
- featurelists: list of str
If specified, only models trained on selected featurelists are returned.
- families: list of str
If specified, only models belonging to selected families are returned.
- blueprints: list of str
If specified, only models trained on specified blueprint IDs are returned.
- labels: list of str, `starred` or `prepared for deployment`
If specified, only models tagged with all listed labels are returned.
- characteristics: list of str
If specified, only models matching all listed characteristics are returned.
- training_filters: list of str
If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings
- number_of_clusters: list of int
Filter models by number of clusters. Applicable only in unsupervised clustering projects.
- limit: int
- offset: int
- Returns
- generic_models: list of GenericModel
- Return type
List
[GenericModel
]
- open_in_browser()¶
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type
None
- request_approximation()¶
Request an approximation of this model using DataRobot Prime
This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.
- Returns
- jobJob
the job generating the rulesets
- request_cross_class_accuracy_scores()¶
Request data disparity insights to be computed for the model.
- Returns
- status_idstr
A statusId of computation request.
- request_data_disparity_insights(feature, compared_class_names)¶
Request data disparity insights to be computed for the model.
- Parameters
- featurestr
Bias and Fairness protected feature name.
- compared_class_nameslist(str)
List of two classes to compare
- Returns
- status_idstr
A statusId of computation request.
- request_external_test(dataset_id, actual_value_column=None)¶
Request external test to compute scores and insights on an external test dataset
- Parameters
- dataset_idstring
The dataset to make predictions against (as uploaded from Project.upload_dataset)
- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- Returns
- ——-
- jobJob
a Job representing external dataset insights computation
- request_fairness_insights(fairness_metrics_set=None)¶
Request fairness insights to be computed for the model.
- Parameters
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- Returns
- status_idstr
A statusId of computation request.
- request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)¶
Train a new frozen model with parameters from this model.
Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.
Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.
Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Parameters
- training_row_countint, optional
the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
- training_durationstr, optional
a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
- training_start_datedatetime.datetime, optional
the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
- training_end_datedatetime.datetime, optional
the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
- time_window_sample_pctint, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_methodstr, optional
(New in version v2.23) defines the way training data is selected. Can be either
random
orlatest
. In combination withtraining_row_count
defines how rows are selected from backtest (latest
by default). When training data is defined using time range (training_duration
oruse_project_settings
) this setting changes the waytime_window_sample_pct
is applied (random
by default). Applicable to OTV projects only.
- Returns
- model_jobModelJob
the modeling job training a frozen model
- Return type
- request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)¶
Requests predictions against a previously uploaded dataset.
- Parameters
- dataset_idstring, optional
The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataset
Dataset
, optional The dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataframepd.DataFrame, optional
(New in v3.0) The dataframe to make predictions against
- file_pathstr, optional
(New in v3.0) Path to file to make predictions against
- fileIOBase, optional
(New in v3.0) File to make predictions against
- include_prediction_intervalsbool, optional
(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_sizeint, optional
(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_pointdatetime.datetime or None, optional
(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the
response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
- max_explanations: (New in version v2.21) int optional; specifies the maximum number of
explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
- max_ngram_explanations: optional; int or str
(New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.
- Returns
- jobPredictJob
The job computing the predictions
- Return type
- request_residuals_chart(source, data_slice_id=None)¶
Request the model residuals chart for the specified source.
- Parameters
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type
- set_prediction_threshold(threshold)¶
Set a custom prediction threshold for the model.
May not be used once
prediction_threshold_read_only
is True for this model.- Parameters
- thresholdfloat
only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).
- star_model()¶
Mark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type
None
- start_advanced_tuning_session()¶
Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns
- AdvancedTuningSession
Session for setting up and running Advanced Tuning on a model
- train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)¶
Trains this model on a different featurelist or sample size.
Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Parameters
- featurelist_idstr, optional
the featurelist to use to train the model. If not specified, the featurelist of this model is used.
- training_row_countint, optional
the number of rows of data that should be used to train the model. If specified, neither
training_duration
noruse_project_settings
may be specified.- training_durationstr, optional
a duration string specifying what time range the data used to train the model should span. If specified, neither
training_row_count
noruse_project_settings
may be specified.- use_project_settingsbool, optional
(New in version v2.20) defaults to
False
. IfTrue
, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neithertraining_row_count
nortraining_duration
may be specified.- time_window_sample_pctint, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_methodstr, optional
(New in version v2.23) defines the way training data is selected. Can be either
random
orlatest
. In combination withtraining_row_count
defines how rows are selected from backtest (latest
by default). When training data is defined using time range (training_duration
oruse_project_settings
) this setting changes the waytime_window_sample_pct
is applied (random
by default). Applicable to OTV projects only.- monotonic_increasing_featurelist_idstr, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_idstr, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- n_clusters: int, optional
(New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.
- Returns
- jobModelJob
the created job to build the model
- Return type
- train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)¶
Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.
This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.
- Parameters
- data_stage_id: str
The id of the data stage to use for training.
- training_data_namestr, optional
The name of the iteration or data stage to indicate what the incremental learning was performed on.
- data_stage_encodingstr, optional
The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252
- data_stage_encodingstr, optional
The delimiter used by the data in the data stage (default: ‘,’).
- data_stage_compressionstr, optional
The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip
- Returns
- jobModelJob
The created job that is retraining the model
- unstar_model()¶
Unmark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type
None
Frozen Model¶
- class datarobot.models.FrozenModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, is_n_clusters_dynamically_determined=None, blueprint_id=None, metrics=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, n_clusters=None, has_empty_clusters=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)¶
Represents a model tuned with parameters which are derived from another model
All durations are specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Attributes
- idstr
the id of the model
- project_idstr
the id of the project the model belongs to
- processeslist of str
the processes used by the model
- featurelist_namestr
the name of the featurelist used by the model
- featurelist_idstr
the id of the featurelist used by the model
- sample_pctfloat
the percentage of the project dataset used in training the model
- training_row_countint or None
the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
- training_durationstr or None
only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- model_typestr
what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
- model_categorystr
what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
- is_frozenbool
whether this model is a frozen model
- parent_model_idstr
the id of the model that tuning parameters are derived from
- blueprint_idstr
the id of the blueprint used in this model
- metricsdict
a mapping from each metric to the model’s scores for that metric
- monotonic_increasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraintsbool
optional, whether this model supports enforcing monotonic constraints
- is_starredbool
whether this model marked as starred
- prediction_thresholdfloat
for binary classification projects, the threshold used for predictions
- prediction_threshold_read_onlybool
indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- model_numberinteger
model number assigned to a model
- supports_composable_mlbool or None
(New in version v2.26) whether this model is supported in the Composable ML.
- classmethod get(project_id, model_id)¶
Retrieve a specific frozen model.
- Parameters
- project_idstr
The project’s id.
- model_idstr
The
model_id
of the leaderboard item to retrieve.
- Returns
- modelFrozenModel
The queried instance.
RatingTableModel¶
- class datarobot.models.RatingTableModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, rating_table_id=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)¶
A model that has a rating table.
All durations are specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Attributes
- idstr
the id of the model
- project_idstr
the id of the project the model belongs to
- processeslist of str
the processes used by the model
- featurelist_namestr
the name of the featurelist used by the model
- featurelist_idstr
the id of the featurelist used by the model
- sample_pctfloat or None
the percentage of the project dataset used in training the model. If the project uses datetime partitioning, the sample_pct will be None. See training_row_count, training_duration, and training_start_date and training_end_date instead.
- training_row_countint or None
the number of rows of the project dataset used in training the model. In a datetime partitioned project, if specified, defines the number of rows used to train the model and evaluate backtest scores; if unspecified, either training_duration or training_start_date and training_end_date was used to determine that instead.
- training_durationstr or None
only present for models in datetime partitioned projects. If specified, a duration string specifying the duration spanned by the data used to train the model and evaluate backtest scores.
- training_start_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the start date of the data used to train the model.
- training_end_datedatetime or None
only present for frozen models in datetime partitioned projects. If specified, the end date of the data used to train the model.
- model_typestr
what model this is, e.g. ‘Nystroem Kernel SVM Regressor’
- model_categorystr
what kind of model this is - ‘prime’ for DataRobot Prime models, ‘blend’ for blender models, and ‘model’ for other models
- is_frozenbool
whether this model is a frozen model
- blueprint_idstr
the id of the blueprint used in this model
- metricsdict
a mapping from each metric to the model’s scores for that metric
- rating_table_idstr
the id of the rating table that belongs to this model
- monotonic_increasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced.
- monotonic_decreasing_featurelist_idstr
optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced.
- supports_monotonic_constraintsbool
optional, whether this model supports enforcing monotonic constraints
- is_starredbool
whether this model marked as starred
- prediction_thresholdfloat
for binary classification projects, the threshold used for predictions
- prediction_threshold_read_onlybool
indicated whether modification of the prediction threshold is forbidden. Threshold modification is forbidden once a model has had a deployment created or predictions made via the dedicated prediction API.
- model_numberinteger
model number assigned to a model
- supports_composable_mlbool or None
(New in version v2.26) whether this model is supported in the Composable ML.
- classmethod get(project_id, model_id)¶
Retrieve a specific rating table model
If the project does not have a rating table, a ClientError will occur.
- Parameters
- project_idstr
the id of the project the model belongs to
- model_idstr
the id of the model to retrieve
- Returns
- modelRatingTableModel
the model
- classmethod create_from_rating_table(project_id, rating_table_id)¶
Creates a new model from a validated rating table record. The RatingTable must not be associated with an existing model.
- Parameters
- project_idstr
the id of the project the rating table belongs to
- rating_table_idstr
the id of the rating table to create this model from
- Returns
- job: Job
an instance of created async job
- Raises
- ClientError (422)
Raised if creating model from a RatingTable that failed validation
- JobAlreadyRequested
Raised if creating model from a RatingTable that is already associated with a RatingTableModel
- Return type
- advanced_tune(params, description=None)¶
Generate a new model with the specified advanced-tuning parameters
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Parameters
- paramsdict
Mapping of parameter ID to parameter value. The list of valid parameter IDs for a model can be found by calling get_advanced_tuning_parameters(). This endpoint does not need to include values for all parameters. If a parameter is omitted, its current_value will be used.
- descriptionstr
Human-readable string describing the newly advanced-tuned model
- Returns
- ModelJob
The created job to build the model
- Return type
- cross_validate()¶
Run cross validation on the model.
Note
To perform Cross Validation on a new model with new parameters, use
train
instead.- Returns
- ModelJob
The created job to build the model
- delete()¶
Delete a model from the project’s leaderboard.
- Return type
None
- download_scoring_code(file_name, source_code=False)¶
Download the Scoring Code JAR.
- Parameters
- file_namestr
File path where scoring code will be saved.
- source_codebool, optional
Set to True to download source code archive. It will not be executable.
- Return type
None
- download_training_artifact(file_name)¶
Retrieve trained artifact(s) from a model containing one or more custom tasks.
Artifact(s) will be downloaded to the specified local filepath.
- Parameters
- file_namestr
File path where trained model artifact(s) will be saved.
- classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters
- datadict
Correctly snake_cased keys and their values.
- Return type
TypeVar
(T
, bound=APIObject
)
- get_advanced_tuning_parameters()¶
Get the advanced-tuning parameters available for this model.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns
- dict
A dictionary describing the advanced-tuning parameters for the current model. There are two top-level keys, tuning_description and tuning_parameters.
tuning_description an optional value. If not None, then it indicates the user-specified description of this set of tuning parameter.
tuning_parameters is a list of a dicts, each has the following keys
parameter_name : (str) name of the parameter (unique per task, see below)
parameter_id : (str) opaque ID string uniquely identifying parameter
default_value : (*) the actual value used to train the model; either the single value of the parameter specified before training, or the best value from the list of grid-searched values (based on current_value)
current_value : (*) the single value or list of values of the parameter that were grid searched. Depending on the grid search specification, could be a single fixed value (no grid search), a list of discrete values, or a range.
task_name : (str) name of the task that this parameter belongs to
constraints: (dict) see the notes below
vertex_id: (str) ID of vertex that this parameter belongs to
Notes
The type of default_value and current_value is defined by the constraints structure. It will be a string or numeric Python type.
constraints is a dict with at least one, possibly more, of the following keys. The presence of a key indicates that the parameter may take on the specified type. (If a key is absent, this means that the parameter may not take on the specified type.) If a key on constraints is present, its value will be a dict containing all of the fields described below for that key.
"constraints": { "select": { "values": [<list(basestring or number) : possible values>] }, "ascii": {}, "unicode": {}, "int": { "min": <int : minimum valid value>, "max": <int : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "float": { "min": <float : minimum valid value>, "max": <float : maximum valid value>, "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "intList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <int : minimum valid value>, "max_val": <int : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> }, "floatList": { "min_length": <int : minimum valid length>, "max_length": <int : maximum valid length> "min_val": <float : minimum valid value>, "max_val": <float : maximum valid value> "supports_grid_search": <bool : True if Grid Search may be requested for this param> } }
The keys have meaning as follows:
select: Rather than specifying a specific data type, if present, it indicates that the parameter is permitted to take on any of the specified values. Listed values may be of any string or real (non-complex) numeric type.
ascii: The parameter may be a unicode object that encodes simple ASCII characters. (A-Z, a-z, 0-9, whitespace, and certain common symbols.) In addition to listed constraints, ASCII keys currently may not contain either newlines or semicolons.
unicode: The parameter may be any Python unicode object.
int: The value may be an object of type int within the specified range (inclusive). Please note that the value will be passed around using the JSON format, and some JSON parsers have undefined behavior with integers outside of the range [-(2**53)+1, (2**53)-1].
float: The value may be an object of type float within the specified range (inclusive).
intList, floatList: The value may be a list of int or float objects, respectively, following constraints as specified respectively by the int and float types (above).
Many parameters only specify one key under constraints. If a parameter specifies multiple keys, the parameter may take on any value permitted by any key.
- Return type
- get_all_confusion_charts(fallback_to_parent_insights=False)¶
Retrieve a list of all confusion matrices available for the model.
- Parameters
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent for any source that is not available for this model and if this has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns
- list of ConfusionChart
Data for all available confusion charts for model.
- get_all_feature_impacts(data_slice_filter=None)¶
Retrieve a list of all feature impact results available for the model.
- Parameters
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then no data_slice filtering will be applied when requesting the roc_curve.
- Returns
- list of dicts
Data for all available model feature impacts. Or an empty list if not data found.
Examples
model = datarobot.Model(id='model-id', project_id='project-id') # Get feature impact insights for sliced data data_slice = datarobot.DataSlice(id='data-slice-id') sliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get feature impact insights for unsliced data data_slice = datarobot.DataSlice() unsliced_fi = model.get_all_feature_impacts(data_slice_filter=data_slice) # Get all feature impact insights all_fi = model.get_all_feature_impacts()
- get_all_lift_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶
Retrieve a list of all Lift charts available for the model.
- Parameters
- fallback_to_parent_insightsbool, optional
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned lift chart by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns
- list of LiftChart
Data for all available model lift charts. Or an empty list if no data found.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get lift chart insights for sliced data sliced_lift_charts = model.get_all_lift_charts(data_slice_id='data-slice-id') # Get lift chart insights for unsliced data unsliced_lift_charts = model.get_all_lift_charts(unsliced_only=True) # Get all lift chart insights all_lift_charts = model.get_all_lift_charts()
- get_all_multiclass_lift_charts(fallback_to_parent_insights=False)¶
Retrieve a list of all Lift charts available for the model.
- Parameters
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- Returns
- list of LiftChart
Data for all available model lift charts.
- get_all_residuals_charts(fallback_to_parent_insights=False, data_slice_filter=None)¶
Retrieve a list of all residuals charts available for the model.
- Parameters
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
Filters the returned residuals charts by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns
- list of ResidualsChart
Data for all available model residuals charts.
Examples
model = datarobot.Model.get('project-id', 'model-id') # Get residuals chart insights for sliced data sliced_residuals_charts = model.get_all_residuals_charts(data_slice_id='data-slice-id') # Get residuals chart insights for unsliced data unsliced_residuals_charts = model.get_all_residuals_charts(unsliced_only=True) # Get all residuals chart insights all_residuals_charts = model.get_all_residuals_charts()
- get_all_roc_curves(fallback_to_parent_insights=False, data_slice_filter=None)¶
Retrieve a list of all ROC curves available for the model.
- Parameters
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent for any source that is not available for this model and if this model has a defined parent model. If omitted or False, or this model has no parent, this will not attempt to retrieve any data from this model’s parent.
- data_slice_filterDataSlice, optional
filters the returned roc_curve by data_slice_filter.id. If None (the default) applies no filter based on data_slice_id.
- Returns
- list of RocCurve
Data for all available model ROC curves. Or an empty list if no RocCurves are found.
Examples
model = datarobot.Model.get('project-id', 'model-id') ds_filter=DataSlice(id='data-slice-id') # Get roc curve insights for sliced data sliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get roc curve insights for unsliced data data_slice_filter=DataSlice(id=None) unsliced_roc = model.get_all_roc_curves(data_slice_filter=ds_filter) # Get all roc curve insights all_roc_curves = model.get_all_roc_curves()
- get_confusion_chart(source, fallback_to_parent_insights=False)¶
Retrieve a multiclass model’s confusion matrix for the specified source.
- Parameters
- sourcestr
Confusion chart source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return confusion chart data for this model’s parent if the confusion chart is not available for this model and the defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns
- ConfusionChart
Model ConfusionChart data
- Raises
- ClientError
If the insight is not available for this model
- get_cross_class_accuracy_scores()¶
Retrieves a list of Cross Class Accuracy scores for the model.
- Returns
- json
- get_cross_validation_scores(partition=None, metric=None)¶
Return a dictionary, keyed by metric, showing cross validation scores per partition.
Cross Validation should already have been performed using
cross_validate
ortrain
.Note
Models that computed cross validation before this feature was added will need to be deleted and retrained before this method can be used.
- Parameters
- partitionfloat
optional, the id of the partition (1,2,3.0,4.0,etc…) to filter results by can be a whole number positive integer or float value. 0 corresponds to the validation partition.
- metric: unicode
optional name of the metric to filter to resulting cross validation scores by
- Returns
- cross_validation_scores: dict
A dictionary keyed by metric showing cross validation scores per partition.
- get_data_disparity_insights(feature, class_name1, class_name2)¶
Retrieve a list of Cross Class Data Disparity insights for the model.
- Parameters
- featurestr
Bias and Fairness protected feature name.
- class_name1str
One of the compared classes
- class_name2str
Another compared class
- Returns
- json
- get_fairness_insights(fairness_metrics_set=None, offset=0, limit=100)¶
Retrieve a list of Per Class Bias insights for the model.
- Parameters
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- offsetint, optional
Number of items to skip.
- limitint, optional
Number of items to return.
- Returns
- json
- get_feature_effect(source, data_slice_id=None)¶
Retrieve Feature Effects for the model.
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the available sources.- Parameters
- sourcestring
The source Feature Effects are retrieved for.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, retrieve unsliced insight data.
- Returns
- feature_effectsFeatureEffects
The feature effects data.
- Raises
- ClientError (404)
If the feature effects have not been computed or source is not valid value.
- get_feature_effect_metadata()¶
Retrieve Feature Effects metadata. Response contains status and available model sources.
Feature Effect for the training partition is always available, with the exception of older projects that only supported Feature Effect for validation.
When a model is trained into validation or holdout without stacked predictions (i.e., no out-of-sample predictions in those partitions), Feature Effects is not available for validation or holdout.
Feature Effects for holdout is not available when holdout was not unlocked for the project.
Use source to retrieve Feature Effects, selecting one of the provided sources.
- Returns
- feature_effect_metadata: FeatureEffectMetadata
- get_feature_effects_multiclass(source='training', class_=None)¶
Retrieve Feature Effects for the multiclass model.
Feature Effects provide partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the available sources.- Parameters
- sourcestr
The source Feature Effects are retrieved for.
- class_str or None
The class name Feature Effects are retrieved for.
- Returns
- list
The list of multiclass feature effects.
- Raises
- ClientError (404)
If Feature Effects have not been computed or source is not valid value.
- get_feature_impact(with_metadata=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve the computed Feature Impact results, a measure of the relevance of each feature in the model.
Feature Impact is computed for each column by creating new data with that column randomly permuted (but the others left unchanged), and seeing how the error metric score for the predictions is affected. The ‘impactUnnormalized’ is how much worse the error metric score is when making predictions on this modified data. The ‘impactNormalized’ is normalized so that the largest value is 1. In both cases, larger values indicate more important features.
If a feature is a redundant feature, i.e. once other features are considered it doesn’t contribute much in addition, the ‘redundantWith’ value is the name of feature that has the highest correlation with this feature. Note that redundancy detection is only available for jobs run after the addition of this feature. When retrieving data that predates this functionality, a NoRedundancyImpactAvailable warning will be used.
Elsewhere this technique is sometimes called ‘Permutation Importance’.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Parameters
- with_metadatabool
The flag indicating if the result should include the metadata as well.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default, this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_feature_impact will raise a ValueError.
- Returns
- list or dict
The feature impact data response depends on the with_metadata parameter. The response is either a dict with metadata and a list with actual data or just a list with that data.
Each List item is a dict with the keys
featureName
,impactNormalized
, andimpactUnnormalized
,redundantWith
andcount
.For dict response available keys are:
featureImpacts
- Feature Impact data as a dictionary. Each item is a dict withkeys:
featureName
,impactNormalized
, andimpactUnnormalized
, andredundantWith
.
shapBased
- A boolean that indicates whether Feature Impact was calculated usingShapley values.
ranRedundancyDetection
- A boolean that indicates whether redundant featureidentification was run while calculating this Feature Impact.
rowCount
- An integer or None that indicates the number of rows that was used tocalculate Feature Impact. For the Feature Impact calculated with the default logic, without specifying the rowCount, we return None here.
count
- An integer with the number of features under thefeatureImpacts
.
- Raises
- ClientError (404)
If the feature impacts have not been computed.
- ValueError
If data_slice_filter passed as None
- get_features_used()¶
Query the server to determine which features were used.
Note that the data returned by this method is possibly different than the names of the features in the featurelist used by this model. This method will return the raw features that must be supplied in order for predictions to be generated on a new set of data. The featurelist, in contrast, would also include the names of derived features.
- Returns
- featureslist of str
The names of the features used in the model.
- Return type
List
[str
]
- get_frozen_child_models()¶
Retrieve the IDs for all models that are frozen from this model.
- Returns
- A list of Models
- get_labelwise_roc_curves(source, fallback_to_parent_insights=False)¶
Retrieve a list of LabelwiseRocCurve instances for a multilabel model for the given source and all labels. This method is valid only for multilabel projects. For binary projects, use Model.get_roc_curve API .
New in version v2.24.
- Parameters
- sourcestr
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- Returns
- list ofclass:LabelwiseRocCurve <datarobot.models.roc_curve.LabelwiseRocCurve>
Labelwise ROC Curve instances for
source
and all labels
- Raises
- ClientError
If the insight is not available for this model
- get_lift_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve the model Lift chart for the specified source.
- Parameters
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_lift_chart will raise a ValueError.
- Returns
- LiftChart
Model lift chart data
- Raises
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- get_missing_report_info()¶
Retrieve a report on missing training data that can be used to understand missing values treatment in the model. The report consists of missing values resolutions for features numeric or categorical features that were part of building the model.
- Returns
- An iterable of MissingReportPerFeature
The queried model missing report, sorted by missing count (DESCENDING order).
- get_model_blueprint_chart()¶
Retrieve a diagram that can be used to understand data flow in the blueprint.
- Returns
- ModelBlueprintChart
The queried model blueprint chart.
- get_model_blueprint_documents()¶
Get documentation for tasks used in this model.
- Returns
- list of BlueprintTaskDocument
All documents available for the model.
- get_model_blueprint_json()¶
Get the blueprint json representation used by this model.
- Returns
- BlueprintJson
Json representation of the blueprint stages.
- Return type
Dict
[str
,Tuple
[List
[str
],List
[str
],str
]]
- get_multiclass_feature_impact()¶
For multiclass it’s possible to calculate feature impact separately for each target class. The method for calculation is exactly the same, calculated in one-vs-all style for each target class.
Requires that Feature Impact has already been computed with
request_feature_impact
.- Returns
- feature_impactslist of dict
The feature impact data. Each item is a dict with the keys ‘featureImpacts’ (list), ‘class’ (str). Each item in ‘featureImpacts’ is a dict with the keys ‘featureName’, ‘impactNormalized’, and ‘impactUnnormalized’, and ‘redundantWith’.
- Raises
- ClientError (404)
If the multiclass feature impacts have not been computed.
- get_multiclass_lift_chart(source, fallback_to_parent_insights=False)¶
Retrieve model Lift chart for the specified source.
- Parameters
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns
- list of LiftChart
Model lift chart data for each saved target class
- Raises
- ClientError
If the insight is not available for this model
- get_multilabel_lift_charts(source, fallback_to_parent_insights=False)¶
Retrieve model Lift charts for the specified source.
New in version v2.24.
- Parameters
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return lift chart data for this model’s parent if the lift chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return insight data from this model’s parent.
- Returns
- list of LiftChart
Model lift chart data for each saved target class
- Raises
- ClientError
If the insight is not available for this model
- get_num_iterations_trained()¶
Retrieves the number of estimators trained by early-stopping tree-based models.
– versionadded:: v2.22
- Returns
- projectId: str
id of project containing the model
- modelId: str
id of the model
- data: array
list of numEstimatorsItem objects, one for each modeling stage.
- numEstimatorsItem will be of the form:
- stage: str
indicates the modeling stage (for multi-stage models); None of single-stage models
- numIterations: int
the number of estimators or iterations trained by the model
- get_or_request_feature_effect(source, max_wait=600, row_count=None, data_slice_id=None)¶
Retrieve Feature Effects for the model, requesting a new job if it hasn’t been run previously.
See
get_feature_effect_metadata
for retrieving information of source.- Parameters
- sourcestring
The source Feature Effects are retrieved for.
- max_waitint, optional
The maximum time to wait for a requested Feature Effect job to complete before erroring.
- row_countint, optional
(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- feature_effectsFeatureEffects
The Feature Effects data.
- get_or_request_feature_effects_multiclass(source, top_n_features=None, features=None, row_count=None, class_=None, max_wait=600)¶
Retrieve Feature Effects for the multiclass model, requesting a job if it hasn’t been run previously.
- Parameters
- sourcestring
The source Feature Effects retrieve for.
- class_str or None
The class name Feature Effects retrieve for.
- row_countint
The number of rows from dataset to use for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by Feature Impact) used to calculate Feature Effects.
- featureslist or None
The list of features used to calculate Feature Effects.
- max_waitint, optional
The maximum time to wait for a requested Feature Effects job to complete before erroring.
- Returns
- feature_effectslist of FeatureEffectsMulticlass
The list of multiclass feature effects data.
- get_or_request_feature_impact(max_wait=600, **kwargs)¶
Retrieve feature impact for the model, requesting a job if it hasn’t been run previously
- Parameters
- max_waitint, optional
The maximum time to wait for a requested feature impact job to complete before erroring
- **kwargs
Arbitrary keyword arguments passed to
request_feature_impact
.
- Returns
- feature_impactslist or dict
The feature impact data. See
get_feature_impact
for the exact schema.
- get_parameters()¶
Retrieve model parameters.
- Returns
- ModelParameters
Model parameters for this model.
- get_pareto_front()¶
Retrieve the Pareto Front for a Eureqa model.
This method is only supported for Eureqa models.
- Returns
- ParetoFront
Model ParetoFront data
- get_prime_eligibility()¶
Check if this model can be approximated with DataRobot Prime
- Returns
- prime_eligibilitydict
a dict indicating whether a model can be approximated with DataRobot Prime (key can_make_prime) and why it may be ineligible (key message)
- get_residuals_chart(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve model residuals chart for the specified source.
- Parameters
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- fallback_to_parent_insightsbool
Optional, if True, this will return residuals chart data for this model’s parent if the residuals chart is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return residuals data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_residuals_chart will raise a ValueError.
- Returns
- ResidualsChart
Model residuals chart data
- Raises
- ClientError
If the insight is not available for this model
- ValueError
If data_slice_filter passed as None
- get_roc_curve(source, fallback_to_parent_insights=False, data_slice_filter=<datarobot.models.model.Sentinel object>)¶
Retrieve the ROC curve for a binary model for the specified source. This method is valid only for binary projects. For multilabel projects, use Model.get_labelwise_roc_curves.
- Parameters
- sourcestr
ROC curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values. (New in version v2.23) For time series and OTV models, also accepts values backtest_2, backtest_3, …, up to the number of backtests in the model.
- fallback_to_parent_insightsbool
(New in version v2.14) Optional, if True, this will return ROC curve data for this model’s parent if the ROC curve is not available for this model and the model has a defined parent model. If omitted or False, or there is no parent model, will not attempt to return data from this model’s parent.
- data_slice_filterDataSlice, optional
A dataslice used to filter the return values based on the dataslice.id. By default this function will use data_slice_filter.id == None which returns an unsliced insight. If data_slice_filter is None then get_roc_curve will raise a ValueError.
- Returns
- RocCurve
Model ROC curve data
- Raises
- ClientError
If the insight is not available for this model
- (New in version v3.0) TypeError
If the underlying project type is multilabel
- ValueError
If data_slice_filter passed as None
- get_rulesets()¶
List the rulesets approximating this model generated by DataRobot Prime
If this model hasn’t been approximated yet, will return an empty list. Note that these are rulesets approximating this model, not rulesets used to construct this model.
- Returns
- rulesetslist of Ruleset
- Return type
List
[Ruleset
]
- get_supported_capabilities()¶
Retrieves a summary of the capabilities supported by a model.
New in version v2.14.
- Returns
- supportsBlending: bool
whether the model supports blending
- supportsMonotonicConstraints: bool
whether the model supports monotonic constraints
- hasWordCloud: bool
whether the model has word cloud data available
- eligibleForPrime: bool
whether the model is eligible for Prime
- hasParameters: bool
whether the model has parameters that can be retrieved
- supportsCodeGeneration: bool
(New in version v2.18) whether the model supports code generation
- supportsShap: bool
- (New in version v2.18) True if the model supports Shapley package. i.e. Shapley based
feature Importance
- supportsEarlyStopping: bool
(New in version v2.22) True if this is an early stopping tree-based model and number of trained iterations can be retrieved.
- get_uri()¶
- Returns
- urlstr
Permanent static hyperlink to this model at leaderboard.
- Return type
str
- get_word_cloud(exclude_stop_words=False)¶
Retrieve word cloud data for the model.
- Parameters
- exclude_stop_wordsbool, optional
Set to True if you want stopwords filtered out of response.
- Returns
- WordCloud
Word cloud data for the model.
- incremental_train(data_stage_id, training_data_name=None)¶
Submit a job to the queue to perform incremental training on an existing model. See train_incremental documentation.
- Return type
- classmethod list(project_id, sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)¶
Retrieve paginated model records, sorted by scores, with optional filtering.
- Parameters
- sort_by_partition: str, one of `validation`, `backtesting`, `crossValidation` or `holdout`
Set the partition to use for sorted (by score) list of models. validation is the default.
- sort_by_metric: str
Set the project metric to use for model sorting. DataRobot-selected project optimization metric is the default.
- with_metric: str
For a single-metric list of results, specify that project metric.
- search_term: str
If specified, only models containing the term in their name or processes are returned.
- featurelists: list of str
If specified, only models trained on selected featurelists are returned.
- families: list of str
If specified, only models belonging to selected families are returned.
- blueprints: list of str
If specified, only models trained on specified blueprint IDs are returned.
- labels: list of str, `starred` or `prepared for deployment`
If specified, only models tagged with all listed labels are returned.
- characteristics: list of str
If specified, only models matching all listed characteristics are returned.
- training_filters: list of str
If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings
- number_of_clusters: list of int
Filter models by number of clusters. Applicable only in unsupervised clustering projects.
- limit: int
- offset: int
- Returns
- generic_models: list of GenericModel
- Return type
List
[GenericModel
]
- open_in_browser()¶
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type
None
- request_approximation()¶
Request an approximation of this model using DataRobot Prime
This will create several rulesets that could be used to approximate this model. After comparing their scores and rule counts, the code used in the approximation can be downloaded and run locally.
- Returns
- jobJob
the job generating the rulesets
- request_cross_class_accuracy_scores()¶
Request data disparity insights to be computed for the model.
- Returns
- status_idstr
A statusId of computation request.
- request_data_disparity_insights(feature, compared_class_names)¶
Request data disparity insights to be computed for the model.
- Parameters
- featurestr
Bias and Fairness protected feature name.
- compared_class_nameslist(str)
List of two classes to compare
- Returns
- status_idstr
A statusId of computation request.
- request_external_test(dataset_id, actual_value_column=None)¶
Request external test to compute scores and insights on an external test dataset
- Parameters
- dataset_idstring
The dataset to make predictions against (as uploaded from Project.upload_dataset)
- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- Returns
- ——-
- jobJob
a Job representing external dataset insights computation
- request_fairness_insights(fairness_metrics_set=None)¶
Request fairness insights to be computed for the model.
- Parameters
- fairness_metrics_setstr, optional
Can be one of <datarobot.enums.FairnessMetricsSet>. The fairness metric used to calculate the fairness scores.
- Returns
- status_idstr
A statusId of computation request.
- request_feature_effect(row_count=None, data_slice_id=None)¶
Submit request to compute Feature Effects for the model.
See
get_feature_effect
for more information on the result of the job.- Parameters
- row_countint
(New in version v2.21) The sample size to use for Feature Impact computation. Minimum is 10 rows. Maximum is 100000 rows or the training sample size of the model, whichever is less.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- jobJob
A Job representing the feature effect computation. To get the completed feature effect data, use job.get_result or job.get_result_when_complete.
- Raises
- JobAlreadyRequested (422)
If the feature effect have already been requested.
- request_feature_effects_multiclass(row_count=None, top_n_features=None, features=None)¶
Request Feature Effects computation for the multiclass model.
See
get_feature_effect
for more information on the result of the job.- Parameters
- row_countint
The number of rows from dataset to use for Feature Impact calculation.
- top_n_featuresint or None
Number of top features (ranked by feature impact) used to calculate Feature Effects.
- featureslist or None
The list of features used to calculate Feature Effects.
- Returns
- jobJob
A Job representing Feature Effect computation. To get the completed Feature Effect data, use job.get_result or job.get_result_when_complete.
- request_feature_impact(row_count=None, with_metadata=False, data_slice_id=None)¶
Request feature impacts to be computed for the model.
See
get_feature_impact
for more information on the result of the job.- Parameters
- row_countint, optional
The sample size (specified in rows) to use for Feature Impact computation. This is not supported for unsupervised, multiclass (which has a separate method), and time series projects.
- with_metadatabool, optional
Flag indicating whether the result should include the metadata. If true, metadata is included.
- data_slice_idstr, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- jobJob or status_id
Job representing the Feature Impact computation. To retrieve the completed Feature Impact data, use job.get_result or job.get_result_when_complete.
- Raises
- JobAlreadyRequested (422)
If the feature impacts have already been requested.
- request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)¶
Train a new frozen model with parameters from this model.
Requires that this model belongs to a datetime partitioned project. If it does not, an error will occur when submitting the job.
Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
In addition of training_row_count and training_duration, frozen datetime models may be trained on an exact date range. Only one of training_row_count, training_duration, or training_start_date and training_end_date should be specified.
Models specified using training_start_date and training_end_date are the only ones that can be trained into the holdout data (once the holdout is unlocked).
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Parameters
- training_row_countint, optional
the number of rows of data that should be used to train the model. If specified, training_duration may not be specified.
- training_durationstr, optional
a duration string specifying what time range the data used to train the model should span. If specified, training_row_count may not be specified.
- training_start_datedatetime.datetime, optional
the start date of the data to train to model on. Only rows occurring at or after this datetime will be used. If training_start_date is specified, training_end_date must also be specified.
- training_end_datedatetime.datetime, optional
the end date of the data to train the model on. Only rows occurring strictly before this datetime will be used. If training_end_date is specified, training_start_date must also be specified.
- time_window_sample_pctint, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_methodstr, optional
(New in version v2.23) defines the way training data is selected. Can be either
random
orlatest
. In combination withtraining_row_count
defines how rows are selected from backtest (latest
by default). When training data is defined using time range (training_duration
oruse_project_settings
) this setting changes the waytime_window_sample_pct
is applied (random
by default). Applicable to OTV projects only.
- Returns
- model_jobModelJob
the modeling job training a frozen model
- Return type
- request_frozen_model(sample_pct=None, training_row_count=None)¶
Train a new frozen model with parameters from this model
Note
This method only works if project the model belongs to is not datetime partitioned. If it is, use
request_frozen_datetime_model
instead.Frozen models use the same tuning parameters as their parent model instead of independently optimizing them to allow efficiently retraining models on larger amounts of the training data.
- Parameters
- sample_pctfloat
optional, the percentage of the dataset to use with the model. If not provided, will use the value from this model.
- training_row_countint
(New in version v2.9) optional, the integer number of rows of the dataset to use with the model. Only one of sample_pct and training_row_count should be specified.
- Returns
- model_jobModelJob
the modeling job training a frozen model
- Return type
- request_lift_chart(source, data_slice_id=None)¶
Request the model Lift Chart for the specified source.
- Parameters
- sourcestr
Lift chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type
- request_predictions(dataset_id=None, dataset=None, dataframe=None, file_path=None, file=None, include_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, max_ngram_explanations=None)¶
Requests predictions against a previously uploaded dataset.
- Parameters
- dataset_idstring, optional
The ID of the dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataset
Dataset
, optional The dataset to make predictions against (as uploaded from Project.upload_dataset)
- dataframepd.DataFrame, optional
(New in v3.0) The dataframe to make predictions against
- file_pathstr, optional
(New in v3.0) Path to file to make predictions against
- fileIOBase, optional
(New in v3.0) File to make predictions against
- include_prediction_intervalsbool, optional
(New in v2.16) For time series projects only. Specifies whether prediction intervals should be calculated for this request. Defaults to True if prediction_intervals_size is specified, otherwise defaults to False.
- prediction_intervals_sizeint, optional
(New in v2.16) For time series projects only. Represents the percentile to use for the size of the prediction intervals. Defaults to 80 if include_prediction_intervals is True. Prediction intervals size must be between 1 and 100 (inclusive).
- forecast_pointdatetime.datetime or None, optional
(New in version v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column can be used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- explanation_algorithm: (New in version v2.21) optional; If set to ‘shap’, the
response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
- max_explanations: (New in version v2.21) int optional; specifies the maximum number of
explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
- max_ngram_explanations: optional; int or str
(New in version v2.29) Specifies the maximum number of text explanation values that should be returned. If set to all, text explanations will be computed and all the ngram explanations will be returned. If set to a non zero positive integer value, text explanations will be computed and this amount of descendingly sorted ngram explanations will be returned. By default text explanation won’t be triggered to be computed.
- Returns
- jobPredictJob
The job computing the predictions
- Return type
- request_residuals_chart(source, data_slice_id=None)¶
Request the model residuals chart for the specified source.
- Parameters
- sourcestr
Residuals chart data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type
- request_roc_curve(source, data_slice_id=None)¶
Request the model Roc Curve for the specified source.
- Parameters
- sourcestr
Roc Curve data source. Check datarobot.enums.CHART_DATA_SOURCE for possible values.
- data_slice_idstring, optional
ID for the data slice used in the request. If None, request unsliced insight data.
- Returns
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type
- request_training_predictions(data_subset, explanation_algorithm=None, max_explanations=None)¶
Start a job to build training predictions
- Parameters
- data_subsetstr
data set definition to build predictions on. Choices are:
- dr.enums.DATA_SUBSET.ALL or string all for all data available. Not valid for
models in datetime partitioned projects
- dr.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT or string validationAndHoldout for
all data except training set. Not valid for models in datetime partitioned projects
dr.enums.DATA_SUBSET.HOLDOUT or string holdout for holdout data set only
- dr.enums.DATA_SUBSET.ALL_BACKTESTS or string allBacktests for downloading
the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
- explanation_algorithmdr.enums.EXPLANATIONS_ALGORITHM
(New in v2.21) Optional. If set to dr.enums.EXPLANATIONS_ALGORITHM.SHAP, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to None (no prediction explanations).
- max_explanationsint
(New in v2.21) Optional. Specifies the maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. In the case of dr.enums.EXPLANATIONS_ALGORITHM.SHAP: If not set, explanations are returned for all features. If the number of features is greater than the
max_explanations
, the sum of remaining values will also be returned asshap_remaining_total
. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns. Is ignored ifexplanation_algorithm
is not set.
- Returns
- Job
an instance of created async job
- retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)¶
Submit a job to the queue to train a blender model.
- Parameters
- sample_pct: float, optional
The sample size in percents (1 to 100) to use in training. If this parameter is used then training_row_count should not be given.
- featurelist_idstr, optional
The featurelist id
- training_row_countint, optional
The number of rows used to train the model. If this parameter is used, then sample_pct should not be given.
- n_clusters: int, optional
(new in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that do not determine the number of clusters automatically.
- Returns
- jobModelJob
The created job that is retraining the model
- Return type
- set_prediction_threshold(threshold)¶
Set a custom prediction threshold for the model.
May not be used once
prediction_threshold_read_only
is True for this model.- Parameters
- thresholdfloat
only used for binary classification projects. The threshold to when deciding between the positive and negative classes when making predictions. Should be between 0.0 and 1.0 (inclusive).
- star_model()¶
Mark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type
None
- start_advanced_tuning_session()¶
Start an Advanced Tuning session. Returns an object that helps set up arguments for an Advanced Tuning model execution.
As of v2.17, all models other than blenders, open source, prime, baseline and user-created support Advanced Tuning.
- Returns
- AdvancedTuningSession
Session for setting up and running Advanced Tuning on a model
- train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)¶
Train the blueprint used in model on a particular featurelist or amount of data.
This method creates a new training job for worker and appends it to the end of the queue for this project. After the job has finished you can get the newly trained model by retrieving it from the project leaderboard, or by retrieving the result of the job.
Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.
In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.
Note
For datetime partitioned projects, see
train_datetime
instead.- Parameters
- sample_pctfloat, optional
The amount of data to use for training, as a percentage of the project dataset from 0 to 100.
- featurelist_idstr, optional
The identifier of the featurelist to use. If not defined, the featurelist of this model is used.
- scoring_typestr, optional
Either
validation
orcrossValidation
(alsodr.SCORING_TYPE.validation
ordr.SCORING_TYPE.cross_validation
).validation
is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning,crossValidation
can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.- training_row_countint, optional
The number of rows to use to train the requested model.
- monotonic_increasing_featurelist_idstr
(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_idstr
(new in version 2.11) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.
- Returns
- model_job_idstr
id of created job, can be used as parameter to
ModelJob.get
method orwait_for_async_model_creation
function
Examples
project = Project.get('project-id') model = Model.get('project-id', 'model-id') model_job_id = model.train(training_row_count=project.max_train_rows)
- Return type
str
- train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)¶
Trains this model on a different featurelist or sample size.
Requires that this model is part of a datetime partitioned project; otherwise, an error will occur.
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Parameters
- featurelist_idstr, optional
the featurelist to use to train the model. If not specified, the featurelist of this model is used.
- training_row_countint, optional
the number of rows of data that should be used to train the model. If specified, neither
training_duration
noruse_project_settings
may be specified.- training_durationstr, optional
a duration string specifying what time range the data used to train the model should span. If specified, neither
training_row_count
noruse_project_settings
may be specified.- use_project_settingsbool, optional
(New in version v2.20) defaults to
False
. IfTrue
, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neithertraining_row_count
nortraining_duration
may be specified.- time_window_sample_pctint, optional
may only be specified when the requested model is a time window (e.g. duration or start and end dates). An integer between 1 and 99 indicating the percentage to sample by within the window. The points kept are determined by a random uniform sample. If specified, training_duration must be specified otherwise, the number of rows used to train the model and evaluate backtest scores and an error will occur.
- sampling_methodstr, optional
(New in version v2.23) defines the way training data is selected. Can be either
random
orlatest
. In combination withtraining_row_count
defines how rows are selected from backtest (latest
by default). When training data is defined using time range (training_duration
oruse_project_settings
) this setting changes the waytime_window_sample_pct
is applied (random
by default). Applicable to OTV projects only.- monotonic_increasing_featurelist_idstr, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_idstr, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- n_clusters: int, optional
(New in version 2.27) number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.
- Returns
- jobModelJob
the created job to build the model
- Return type
- train_incremental(data_stage_id, training_data_name=None, data_stage_encoding=None, data_stage_delimiter=None, data_stage_compression=None)¶
Submit a job to the queue to perform incremental training on an existing model using additional data. The id of the additional data to use for training is specified with the data_stage_id. Optionally a name for the iteration can be supplied by the user to help identify the contents of data in the iteration.
This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.
- Parameters
- data_stage_id: str
The id of the data stage to use for training.
- training_data_namestr, optional
The name of the iteration or data stage to indicate what the incremental learning was performed on.
- data_stage_encodingstr, optional
The encoding type of the data in the data stage (default: UTF-8). Supported formats: UTF-8, ASCII, WINDOWS1252
- data_stage_encodingstr, optional
The delimiter used by the data in the data stage (default: ‘,’).
- data_stage_compressionstr, optional
The compression type of the data stage file, e.g. ‘zip’ (default: None). Supported formats: zip
- Returns
- jobModelJob
The created job that is retraining the model
- unstar_model()¶
Unmark the model as starred.
Model stars propagate to the web application and the API, and can be used to filter when listing models.
- Return type
None
Combined Model¶
See API reference for Combined Model in Segmented Modeling API Reference
Advanced Tuning¶
- class datarobot.models.advanced_tuning.AdvancedTuningSession(model)¶
A session enabling users to configure and run advanced tuning for a model.
Every model contains a set of one or more tasks. Every task contains a set of zero or more parameters. This class allows tuning the values of each parameter on each task of a model, before running that model.
This session is client-side only and is not persistent. Only the final model, constructed when run is called, is persisted on the DataRobot server.
- Attributes
- descriptionstr
Description for the new advance-tuned model. Defaults to the same description as the base model.
- get_task_names()¶
Get the list of task names that are available for this model
- Returns
- list(str)
List of task names
- Return type
List
[str
]
- get_parameter_names(task_name)¶
Get the list of parameter names available for a specific task
- Returns
- list(str)
List of parameter names
- Return type
List
[str
]
- set_parameter(value, task_name=None, parameter_name=None, parameter_id=None)¶
Set the value of a parameter to be used
The caller must supply enough of the optional arguments to this function to uniquely identify the parameter that is being set. For example, a less-common parameter name such as ‘building_block__complementary_error_function’ might only be used once (if at all) by a single task in a model. In which case it may be sufficient to simply specify ‘parameter_name’. But a more-common name such as ‘random_seed’ might be used by several of the model’s tasks, and it may be necessary to also specify ‘task_name’ to clarify which task’s random seed is to be set. This function only affects client-side state. It will not check that the new parameter value(s) are valid.
- Parameters
- task_namestr
Name of the task whose parameter needs to be set
- parameter_namestr
Name of the parameter to set
- parameter_idstr
ID of the parameter to set
- valueint, float, list, or str
New value for the parameter, with legal values determined by the parameter being set
- Raises
- NoParametersFoundException
if no matching parameters are found.
- NonUniqueParametersException
if multiple parameters matched the specified filtering criteria
- Return type
None
- get_parameters()¶
Returns the set of parameters available to this model
The returned parameters have one additional key, “value”, reflecting any new values that have been set in this AdvancedTuningSession. When the session is run, “value” will be used, or if it is unset, “current_value”.
- Returns
- parametersdict
“Parameters” dictionary, same as specified on Model.get_advanced_tuning_params.
- An additional field is added per parameter to the ‘tuning_parameters’ list in the dictionary:
- valueint, float, list, or str
The current value of the parameter. None if none has been specified.
- Return type
ModelJob¶
- datarobot.models.modeljob.wait_for_async_model_creation(project_id, model_job_id, max_wait=600)¶
Given a Project id and ModelJob id poll for status of process responsible for model creation until model is created.
- Parameters
- project_idstr
The identifier of the project
- model_job_idstr
The identifier of the ModelJob
- max_waitint, optional
Time in seconds after which model creation is considered unsuccessful
- Returns
- modelModel
Newly created model
- Raises
- AsyncModelCreationError
Raised if status of fetched ModelJob object is
error
- AsyncTimeoutError
Model wasn’t created in time, specified by
max_wait
parameter
- Return type
- class datarobot.models.ModelJob(data, completed_resource_url=None)¶
Tracks asynchronous work being done within a project
- Attributes
- idint
the id of the job
- project_idstr
the id of the project the job belongs to
- statusstr
the status of the job - will be one of
datarobot.enums.QUEUE_STATUS
- job_typestr
what kind of work the job is doing - will be ‘model’ for modeling jobs
- is_blockedbool
if true, the job is blocked (cannot be executed) until its dependencies are resolved
- sample_pctfloat
the percentage of the project’s dataset used in this modeling job
- model_typestr
the model this job builds (e.g. ‘Nystroem Kernel SVM Regressor’)
- processeslist of str
the processes used by the model
- featurelist_idstr
the id of the featurelist used in this modeling job
- blueprintBlueprint
the blueprint used in this modeling job
- classmethod from_job(job)¶
Transforms a generic Job into a ModelJob
- Parameters
- job: Job
A generic job representing a ModelJob
- Returns
- model_job: ModelJob
A fully populated ModelJob with all the details of the job
- Raises
- ValueError:
If the generic Job was not a model job, e.g. job_type != JOB_TYPE.MODEL
- Return type
- classmethod get(project_id, model_job_id)¶
Fetches one ModelJob. If the job finished, raises PendingJobFinished exception.
- Parameters
- project_idstr
The identifier of the project the model belongs to
- model_job_idstr
The identifier of the model_job
- Returns
- model_jobModelJob
The pending ModelJob
- Raises
- PendingJobFinished
If the job being queried already finished, and the server is re-routing to the finished model.
- AsyncFailureError
Querying this resource gave a status code other than 200 or 303
- Return type
- classmethod get_model(project_id, model_job_id)¶
Fetches a finished model from the job used to create it.
- Parameters
- project_idstr
The identifier of the project the model belongs to
- model_job_idstr
The identifier of the model_job
- Returns
- modelModel
The finished model
- Raises
- JobNotFinished
If the job has not finished yet
- AsyncFailureError
Querying the model_job in question gave a status code other than 200 or 303
- Return type
- cancel()¶
Cancel this job. If this job has not finished running, it will be removed and canceled.
- get_result(params=None)¶
- Parameters
- paramsdict or None
Query parameters to be added to request to get results.
- For featureEffects, source param is required to define source,
- otherwise the default is `training`
- Returns
- resultobject
- Return type depends on the job type:
for model jobs, a Model is returned
for predict jobs, a pandas.DataFrame (with predictions) is returned
for featureImpact jobs, a list of dicts by default (see
with_metadata
parameter of theFeatureImpactJob
class and itsget()
method).for primeRulesets jobs, a list of Rulesets
for primeModel jobs, a PrimeModel
for primeDownloadValidation jobs, a PrimeFile
for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
for predictionExplanations jobs, a PredictionExplanations
for featureEffects, a FeatureEffects
- Raises
- JobNotFinished
If the job is not finished, the result is not available.
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
- get_result_when_complete(max_wait=600, params=None)¶
- Parameters
- max_waitint, optional
How long to wait for the job to finish.
- paramsdict, optional
Query parameters to be added to request.
- Returns
- result: object
Return type is the same as would be returned by Job.get_result.
- Raises
- AsyncTimeoutError
If the job does not finish in time
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
- refresh()¶
Update this object with the latest job data from the server.
- wait_for_completion(max_wait=600)¶
Waits for job to complete.
- Parameters
- max_waitint, optional
How long to wait for the job to finish.
- Return type
None
Pareto Front¶
- class datarobot.models.pareto_front.ParetoFront(project_id, error_metric, hyperparameters, target_type, solutions)¶
Pareto front data for a Eureqa model.
The pareto front reflects the tradeoffs between error and complexity for particular model. The solutions reflect possible Eureqa models that are different levels of complexity. By default, only one solution will have a corresponding model, but models can be created for each solution.
- Attributes
- project_idstr
the ID of the project the model belongs to
- error_metricstr
Eureqa error-metric identifier used to compute error metrics for this search. Note that Eureqa error metrics do NOT correspond 1:1 with DataRobot error metrics – the available metrics are not the same, and are computed from a subset of the training data rather than from the validation data.
- hyperparametersdict
Hyperparameters used by this run of the Eureqa blueprint
- target_typestr
Indicating what kind of modeling is being done in this project, either ‘Regression’, ‘Binary’ (Binary classification), or ‘Multiclass’ (Multiclass classification).
- solutionslist(Solution)
Solutions that Eureqa has found to model this data. Some solutions will have greater accuracy. Others will have slightly less accuracy but will use simpler expressions.
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrslist
List of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- class datarobot.models.pareto_front.Solution(eureqa_solution_id, complexity, error, expression, expression_annotated, best_model, project_id)¶
Eureqa Solution.
A solution represents a possible Eureqa model; however not all solutions have models associated with them. It must have a model created before it can be used to make predictions, etc.
- Attributes
- eureqa_solution_id: str
ID of this Solution
- complexity: int
Complexity score for this solution. Complexity score is a function of the mathematical operators used in the current solution. The Complexity calculation can be tuned via model hyperparameters.
- error: float or None
Error for the current solution, as computed by Eureqa using the ‘error_metric’ error metric. It will be None if model refitted existing solution.
- expression: str
Eureqa model equation string.
- expression_annotated: str
Eureqa model equation string with variable names tagged for easy identification.
- best_model: bool
True, if the model is determined to be the best
- create_model()¶
Add this solution to the leaderboard, if it is not already present.
Partitioning¶
- class datarobot.RandomCV(holdout_pct, reps, seed=0)¶
A partition in which observations are randomly assigned to cross-validation groups and the holdout set.
- Parameters
- holdout_pctint
the desired percentage of dataset to assign to holdout set
- repsint
number of cross validation folds to use
- seedint
a seed to use for randomization
- class datarobot.StratifiedCV(holdout_pct, reps, seed=0)¶
A partition in which observations are randomly assigned to cross-validation groups and the holdout set, preserving in each group the same ratio of positive to negative cases as in the original data.
- Parameters
- holdout_pctint
the desired percentage of dataset to assign to holdout set
- repsint
number of cross validation folds to use
- seedint
a seed to use for randomization
- class datarobot.GroupCV(holdout_pct, reps, partition_key_cols, seed=0)¶
A partition in which one column is specified, and rows sharing a common value for that column are guaranteed to stay together in the partitioning into cross-validation groups and the holdout set.
- Parameters
- holdout_pctint
the desired percentage of dataset to assign to holdout set
- repsint
number of cross validation folds to use
- partition_key_colslist
a list containing a single string, where the string is the name of the column whose values should remain together in partitioning
- seedint
a seed to use for randomization
- class datarobot.UserCV(user_partition_col, cv_holdout_level, seed=0)¶
A partition where the cross-validation folds and the holdout set are specified by the user.
- Parameters
- user_partition_colstring
the name of the column containing the partition assignments
- cv_holdout_level
the value of the partition column indicating a row is part of the holdout set
- seedint
a seed to use for randomization
- class datarobot.RandomTVH(holdout_pct, validation_pct, seed=0)¶
Specifies a partitioning method in which rows are randomly assigned to training, validation, and holdout.
- Parameters
- holdout_pctint
the desired percentage of dataset to assign to holdout set
- validation_pctint
the desired percentage of dataset to assign to validation set
- seedint
a seed to use for randomization
- class datarobot.UserTVH(user_partition_col, training_level, validation_level, holdout_level, seed=0)¶
Specifies a partitioning method in which rows are assigned by the user to training, validation, and holdout sets.
- Parameters
- user_partition_colstring
the name of the column containing the partition assignments
- training_level
the value of the partition column indicating a row is part of the training set
- validation_level
the value of the partition column indicating a row is part of the validation set
- holdout_level
the value of the partition column indicating a row is part of the holdout set (use None if you want no holdout set)
- seedint
a seed to use for randomization
- class datarobot.StratifiedTVH(holdout_pct, validation_pct, seed=0)¶
A partition in which observations are randomly assigned to train, validation, and holdout sets, preserving in each group the same ratio of positive to negative cases as in the original data.
- Parameters
- holdout_pctint
the desired percentage of dataset to assign to holdout set
- validation_pctint
the desired percentage of dataset to assign to validation set
- seedint
a seed to use for randomization
- class datarobot.GroupTVH(holdout_pct, validation_pct, partition_key_cols, seed=0)¶
A partition in which one column is specified, and rows sharing a common value for that column are guaranteed to stay together in the partitioning into the training, validation, and holdout sets.
- Parameters
- holdout_pctint
the desired percentage of dataset to assign to holdout set
- validation_pctint
the desired percentage of dataset to assign to validation set
- partition_key_colslist
a list containing a single string, where the string is the name of the column whose values should remain together in partitioning
- seedint
a seed to use for randomization
- class datarobot.DatetimePartitioningSpecification(datetime_partition_column, autopilot_data_selection_method=None, validation_duration=None, holdout_start_date=None, holdout_duration=None, disable_holdout=None, gap_duration=None, number_of_backtests=None, backtests=None, use_time_series=False, default_to_known_in_advance=False, default_to_do_not_derive=False, feature_derivation_window_start=None, feature_derivation_window_end=None, feature_settings=None, forecast_window_start=None, forecast_window_end=None, windows_basis_unit=None, treat_as_exponential=None, differencing_method=None, periodicities=None, multiseries_id_columns=None, use_cross_series_features=None, aggregation_type=None, cross_series_group_by_columns=None, calendar_id=None, holdout_end_date=None, unsupervised_mode=False, model_splits=None, allow_partial_history_time_series_predictions=False, unsupervised_type=None)¶
Uniquely defines a DatetimePartitioning for some project
Includes only the attributes of DatetimePartitioning that are directly controllable by users, not those determined by the DataRobot application based on the project dataset and the user-controlled settings.
This is the specification that should be passed to
Project.analyze_and_model
via thepartitioning_method
parameter. To see the full partitioning based on the project dataset, useDatetimePartitioning.generate
.All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.Note that either (
holdout_start_date
,holdout_duration
) or (holdout_start_date
,holdout_end_date
) can be used to specify holdout partitioning settings.- Attributes
- datetime_partition_columnstr
the name of the column whose values as dates are used to assign a row to a particular partition
- autopilot_data_selection_methodstr
one of
datarobot.enums.DATETIME_AUTOPILOT_DATA_SELECTION_METHOD
. Whether models created by the autopilot should use “rowCount” or “duration” as their data_selection_method.- validation_durationstr or None
the default validation_duration for the backtests
- holdout_start_datedatetime.datetime or None
The start date of holdout scoring data. If
holdout_start_date
is specified, eitherholdout_duration
orholdout_end_date
must also be specified. Ifdisable_holdout
is set toTrue
,holdout_start_date
,holdout_duration
, andholdout_end_date
may not be specified.- holdout_durationstr or None
The duration of the holdout scoring data. If
holdout_duration
is specified,holdout_start_date
must also be specified. Ifdisable_holdout
is set toTrue
,holdout_duration
,holdout_start_date
, andholdout_end_date
may not be specified.- holdout_end_datedatetime.datetime or None
The end date of holdout scoring data. If
holdout_end_date
is specified,holdout_start_date
must also be specified. Ifdisable_holdout
is set toTrue
,holdout_end_date
,holdout_start_date
, andholdout_duration
may not be specified.- disable_holdoutbool or None
(New in version v2.8) Whether to suppress allocating a holdout fold. If set to
True
,holdout_start_date
,holdout_duration
, andholdout_end_date
may not be specified.- gap_durationstr or None
The duration of the gap between training and holdout scoring data
- number_of_backtestsint or None
the number of backtests to use
- backtestslist of
BacktestSpecification
the exact specification of backtests to use. The indices of the specified backtests should range from 0 to number_of_backtests - 1. If any backtest is left unspecified, a default configuration will be chosen.
- use_time_seriesbool
(New in version v2.8) Whether to create a time series project (if
True
) or an OTV project which uses datetime partitioning (ifFalse
). The default behavior is to create an OTV project.- default_to_known_in_advancebool
(New in version v2.11) Optional, default
False
. Used for time series projects only. Sets whether all features default to being treated as known in advance. Known in advance features are expected to be known for dates in the future when making predictions, e.g., “is this a holiday?”. Individual features can be set to a value different than the default using thefeature_settings
parameter.- default_to_do_not_derivebool
(New in v2.17) Optional, default
False
. Used for time series projects only. Sets whether all features default to being treated as do-not-derive features, excluding them from feature derivation. Individual features can be set to a value different than the default by using thefeature_settings
parameter.- feature_derivation_window_startint or None
(New in version v2.8) Only used for time series projects. Offset into the past to define how far back relative to the forecast point the feature derivation window should start. Expressed in terms of the
windows_basis_unit
and should be negative value or zero.- feature_derivation_window_endint or None
(New in version v2.8) Only used for time series projects. Offset into the past to define how far back relative to the forecast point the feature derivation window should end. Expressed in terms of the
windows_basis_unit
and should be a negative value or zero.- feature_settingslist of
FeatureSettings
(New in version v2.9) Optional, a list specifying per feature settings, can be left unspecified.
- forecast_window_startint or None
(New in version v2.8) Only used for time series projects. Offset into the future to define how far forward relative to the forecast point the forecast window should start. Expressed in terms of the
windows_basis_unit
.- forecast_window_endint or None
(New in version v2.8) Only used for time series projects. Offset into the future to define how far forward relative to the forecast point the forecast window should end. Expressed in terms of the
windows_basis_unit
.- windows_basis_unitstring, optional
(New in version v2.14) Only used for time series projects. Indicates which unit is a basis for feature derivation window and forecast window. Valid options are detected time unit (one of the
datarobot.enums.TIME_UNITS
) or “ROW”. If omitted, the default value is the detected time unit.- treat_as_exponentialstring, optional
(New in version v2.9) defaults to “auto”. Used to specify whether to treat data as exponential trend and apply transformations like log-transform. Use values from the
datarobot.enums.TREAT_AS_EXPONENTIAL
enum.- differencing_methodstring, optional
(New in version v2.9) defaults to “auto”. Used to specify which differencing method to apply of case if data is stationary. Use values from
datarobot.enums.DIFFERENCING_METHOD
enum.- periodicitieslist of Periodicity, optional
(New in version v2.9) a list of
datarobot.Periodicity
. Periodicities units should be “ROW”, if thewindows_basis_unit
is “ROW”.- multiseries_id_columnslist of str or null
(New in version v2.11) a list of the names of multiseries id columns to define series within the training data. Currently only one multiseries id column is supported.
- use_cross_series_featuresbool
(New in version v2.14) Whether to use cross series features.
- aggregation_typestr, optional
(New in version v2.14) The aggregation type to apply when creating cross series features. Optional, must be one of “total” or “average”.
- cross_series_group_by_columnslist of str, optional
(New in version v2.15) List of columns (currently of length 1). Optional setting that indicates how to further split series into related groups. For example, if every series is sales of an individual product, the series group-by could be the product category with values like “men’s clothing”, “sports equipment”, etc.. Can only be used in a multiseries project with
use_cross_series_features
set toTrue
.- calendar_idstr, optional
(New in version v2.15) The id of the
CalendarFile
to use with this project.- unsupervised_mode: bool, optional
(New in version v2.20) defaults to False, indicates whether partitioning should be constructed for the unsupervised project.
- model_splits: int, optional
(New in version v2.21) Sets the cap on the number of jobs per model used when building models to control number of jobs in the queue. Higher number of model splits will allow for less downsampling leading to the use of more post-processed data.
- allow_partial_history_time_series_predictions: bool, optional
(New in version v2.24) Whether to allow time series models to make predictions using partial historical data.
- unsupervised_type: str, optional
(New in version v3.2) The unsupervised project type, only valid if
unsupervised_mode
is True. Use values fromdatarobot.enums.UnsupervisedTypeEnum
enum. If not specified then the project defaults to ‘anomaly’ whenunsupervised_mode
is True.
- collect_payload()¶
Set up the dict that should be sent to the server when setting the target Returns ——- partitioning_spec : dict
- Return type
Dict
[str
,Any
]
- prep_payload(project_id, max_wait=600)¶
Run any necessary validation and prep of the payload, including async operations
Mainly used for the datetime partitioning spec but implemented in general for consistency
- Return type
None
- update(**kwargs)¶
Update this instance, matching attributes to kwargs
Mainly used for the datetime partitioning spec but implemented in general for consistency
- Return type
None
- class datarobot.BacktestSpecification(index, gap_duration=None, validation_start_date=None, validation_duration=None, validation_end_date=None, primary_training_start_date=None, primary_training_end_date=None)¶
Uniquely defines a Backtest used in a DatetimePartitioning
Includes only the attributes of a backtest directly controllable by users. The other attributes are assigned by the DataRobot application based on the project dataset and the user-controlled settings.
There are two ways to specify an individual backtest:
Option 1: Use
index
,gap_duration
,validation_start_date
, andvalidation_duration
. All durations should be specified with a duration string such as those returned by thepartitioning_methods.construct_duration_string
helper method.import datarobot as dr partitioning_spec = dr.DatetimePartitioningSpecification( backtests=[ # modify the first backtest using option 1 dr.BacktestSpecification( index=0, gap_duration=dr.partitioning_methods.construct_duration_string(), validation_start_date=datetime(year=2010, month=1, day=1), validation_duration=dr.partitioning_methods.construct_duration_string(years=1), ) ], # other partitioning settings... )
Option 2 (New in version v2.20): Use
index
,primary_training_start_date
,primary_training_end_date
,validation_start_date
, andvalidation_end_date
. In this case, note that settingprimary_training_end_date
andvalidation_start_date
to the same timestamp will result with no gap being created.import datarobot as dr partitioning_spec = dr.DatetimePartitioningSpecification( backtests=[ # modify the first backtest using option 2 dr.BacktestSpecification( index=0, primary_training_start_date=datetime(year=2005, month=1, day=1), primary_training_end_date=datetime(year=2010, month=1, day=1), validation_start_date=datetime(year=2010, month=1, day=1), validation_end_date=datetime(year=2011, month=1, day=1), ) ], # other partitioning settings... )
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Attributes
- indexint
the index of the backtest to update
- gap_durationstr
a duration string specifying the desired duration of the gap between training and validation scoring data for the backtest
- validation_start_datedatetime.datetime
the desired start date of the validation scoring data for this backtest
- validation_durationstr
a duration string specifying the desired duration of the validation scoring data for this backtest
- validation_end_datedatetime.datetime
the desired end date of the validation scoring data for this backtest
- primary_training_start_datedatetime.datetime
the desired start date of the training partition for this backtest
- primary_training_end_datedatetime.datetime
the desired end date of the training partition for this backtest
- class datarobot.FeatureSettings(feature_name, known_in_advance=None, do_not_derive=None)¶
Per feature settings
- Attributes
- feature_namestring
name of the feature
- known_in_advancebool
(New in version v2.11) Optional, for time series projects only. Sets whether the feature is known in advance, i.e., values for future dates are known at prediction time. If not specified, the feature uses the value from the default_to_known_in_advance flag.
- do_not_derivebool
(New in v2.17) Optional, for time series projects only. Sets whether the feature is excluded from feature derivation. If not specified, the feature uses the value from the default_to_do_not_derive flag.
- collect_payload(use_a_priori=False)¶
- Parameters
- use_a_prioriboolSwitch to using the older a_priori key name instead of known_in_advance. Default: False
- Returns
- BacktestSpecification dictionary representation
- Return type
- class datarobot.Periodicity(time_steps, time_unit)¶
Periodicity configuration
- Parameters
- time_stepsint
Time step value
- time_unitstring
Time step unit, valid options are values from datarobot.enums.TIME_UNITS
Examples
from datarobot as dr periodicities = [ dr.Periodicity(time_steps=10, time_unit=dr.enums.TIME_UNITS.HOUR), dr.Periodicity(time_steps=600, time_unit=dr.enums.TIME_UNITS.MINUTE)] spec = dr.DatetimePartitioningSpecification( # ... periodicities=periodicities )
- class datarobot.DatetimePartitioning(project_id=None, datetime_partitioning_id=None, datetime_partition_column=None, date_format=None, autopilot_data_selection_method=None, validation_duration=None, available_training_start_date=None, available_training_duration=None, available_training_row_count=None, available_training_end_date=None, primary_training_start_date=None, primary_training_duration=None, primary_training_row_count=None, primary_training_end_date=None, gap_start_date=None, gap_duration=None, gap_row_count=None, gap_end_date=None, disable_holdout=None, holdout_start_date=None, holdout_duration=None, holdout_row_count=None, holdout_end_date=None, number_of_backtests=None, backtests=None, total_row_count=None, use_time_series=False, default_to_known_in_advance=False, default_to_do_not_derive=False, feature_derivation_window_start=None, feature_derivation_window_end=None, feature_settings=None, forecast_window_start=None, forecast_window_end=None, windows_basis_unit=None, treat_as_exponential=None, differencing_method=None, periodicities=None, multiseries_id_columns=None, number_of_known_in_advance_features=0, number_of_do_not_derive_features=0, use_cross_series_features=None, aggregation_type=None, cross_series_group_by_columns=None, calendar_id=None, calendar_name=None, model_splits=None, allow_partial_history_time_series_predictions=False, unsupervised_mode=False, unsupervised_type=None)¶
Full partitioning of a project for datetime partitioning.
To instantiate, use
DatetimePartitioning.get(project_id)
.Includes both the attributes specified by the user, as well as those determined by the DataRobot application based on the project dataset. In order to use a partitioning to set the target, call
to_specification
and pass the resultingDatetimePartitioningSpecification
toProject.analyze_and_model
via thepartitioning_method
parameter.The available training data corresponds to all the data available for training, while the primary training data corresponds to the data that can be used to train while ensuring that all backtests are available. If a model is trained with more data than is available in the primary training data, then all backtests may not have scores available.
All durations are specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Attributes
- project_idstr
the id of the project this partitioning applies to
- datetime_partitioning_idstr or None
the id of the datetime partitioning it is an optimized partitioning
- datetime_partition_columnstr
the name of the column whose values as dates are used to assign a row to a particular partition
- date_formatstr
the format (e.g. “%Y-%m-%d %H:%M:%S”) by which the partition column was interpreted (compatible with strftime)
- autopilot_data_selection_methodstr
one of
datarobot.enums.DATETIME_AUTOPILOT_DATA_SELECTION_METHOD
. Whether models created by the autopilot use “rowCount” or “duration” as their data_selection_method.- validation_durationstr or None
the validation duration specified when initializing the partitioning - not directly significant if the backtests have been modified, but used as the default validation_duration for the backtests. Can be absent if this is a time series project with an irregular primary date/time feature.
- available_training_start_datedatetime.datetime
The start date of the available training data for scoring the holdout
- available_training_durationstr
The duration of the available training data for scoring the holdout
- available_training_row_countint or None
The number of rows in the available training data for scoring the holdout. Only available when retrieving the partitioning after setting the target.
- available_training_end_datedatetime.datetime
The end date of the available training data for scoring the holdout
- primary_training_start_datedatetime.datetime or None
The start date of primary training data for scoring the holdout. Unavailable when the holdout fold is disabled.
- primary_training_durationstr
The duration of the primary training data for scoring the holdout
- primary_training_row_countint or None
The number of rows in the primary training data for scoring the holdout. Only available when retrieving the partitioning after setting the target.
- primary_training_end_datedatetime.datetime or None
The end date of the primary training data for scoring the holdout. Unavailable when the holdout fold is disabled.
- gap_start_datedatetime.datetime or None
The start date of the gap between training and holdout scoring data. Unavailable when the holdout fold is disabled.
- gap_durationstr
The duration of the gap between training and holdout scoring data
- gap_row_countint or None
The number of rows in the gap between training and holdout scoring data. Only available when retrieving the partitioning after setting the target.
- gap_end_datedatetime.datetime or None
The end date of the gap between training and holdout scoring data. Unavailable when the holdout fold is disabled.
- disable_holdoutbool or None
Whether to suppress allocating a holdout fold. If set to
True
,holdout_start_date
,holdout_duration
, andholdout_end_date
may not be specified.- holdout_start_datedatetime.datetime or None
The start date of holdout scoring data. Unavailable when the holdout fold is disabled.
- holdout_durationstr
The duration of the holdout scoring data
- holdout_row_countint or None
The number of rows in the holdout scoring data. Only available when retrieving the partitioning after setting the target.
- holdout_end_datedatetime.datetime or None
The end date of the holdout scoring data. Unavailable when the holdout fold is disabled.
- number_of_backtestsint
the number of backtests used.
- backtestslist of
Backtest
the configured backtests.
- total_row_countint
the number of rows in the project dataset. Only available when retrieving the partitioning after setting the target.
- use_time_seriesbool
(New in version v2.8) Whether to create a time series project (if
True
) or an OTV project which uses datetime partitioning (ifFalse
). The default behavior is to create an OTV project.- default_to_known_in_advancebool
(New in version v2.11) Optional, default
False
. Used for time series projects only. Sets whether all features default to being treated as known in advance. Known in advance features are expected to be known for dates in the future when making predictions, e.g., “is this a holiday?”. Individual features can be set to a value different from the default using thefeature_settings
parameter.- default_to_do_not_derivebool
(New in v2.17) Optional, default
False
. Used for time series projects only. Sets whether all features default to being treated as do-not-derive features, excluding them from feature derivation. Individual features can be set to a value different from the default by using thefeature_settings
parameter.- feature_derivation_window_startint or None
(New in version v2.8) Only used for time series projects. Offset into the past to define how far back relative to the forecast point the feature derivation window should start. Expressed in terms of the
windows_basis_unit
.- feature_derivation_window_endint or None
(New in version v2.8) Only used for time series projects. Offset into the past to define how far back relative to the forecast point the feature derivation window should end. Expressed in terms of the
windows_basis_unit
.- feature_settingslist of
FeatureSettings
(New in version v2.9) Optional, a list specifying per feature settings, can be left unspecified.
- forecast_window_startint or None
(New in version v2.8) Only used for time series projects. Offset into the future to define how far forward relative to the forecast point the forecast window should start. Expressed in terms of the
windows_basis_unit
.- forecast_window_endint or None
(New in version v2.8) Only used for time series projects. Offset into the future to define how far forward relative to the forecast point the forecast window should end. Expressed in terms of the
windows_basis_unit
.- windows_basis_unitstring, optional
(New in version v2.14) Only used for time series projects. Indicates which unit is a basis for feature derivation window and forecast window. Valid options are detected time unit (one of the
datarobot.enums.TIME_UNITS
) or “ROW”. If omitted, the default value is detected time unit.- treat_as_exponentialstring, optional
(New in version v2.9) defaults to “auto”. Used to specify whether to treat data as exponential trend and apply transformations like log-transform. Use values from the
datarobot.enums.TREAT_AS_EXPONENTIAL
enum.- differencing_methodstring, optional
(New in version v2.9) defaults to “auto”. Used to specify which differencing method to apply of case if data is stationary. Use values from the
datarobot.enums.DIFFERENCING_METHOD
enum.- periodicitieslist of Periodicity, optional
(New in version v2.9) a list of
datarobot.Periodicity
. Periodicities units should be “ROW”, if thewindows_basis_unit
is “ROW”.- multiseries_id_columnslist of str or null
(New in version v2.11) a list of the names of multiseries id columns to define series within the training data. Currently only one multiseries id column is supported.
- number_of_known_in_advance_featuresint
(New in version v2.14) Number of features that are marked as known in advance.
- number_of_do_not_derive_featuresint
(New in v2.17) Number of features that are excluded from derivation.
- use_cross_series_featuresbool
(New in version v2.14) Whether to use cross series features.
- aggregation_typestr, optional
(New in version v2.14) The aggregation type to apply when creating cross series features. Optional, must be one of “total” or “average”.
- cross_series_group_by_columnslist of str, optional
(New in version v2.15) List of columns (currently of length 1). Optional setting that indicates how to further split series into related groups. For example, if every series is sales of an individual product, the series group-by could be the product category with values like “men’s clothing”, “sports equipment”, etc.. Can only be used in a multiseries project with
use_cross_series_features
set toTrue
.- calendar_idstr, optional
(New in version v2.15) Only available for time series projects. The id of the
CalendarFile
to use with this project.- calendar_namestr, optional
(New in version v2.17) Only available for time series projects. The name of the
CalendarFile
used with this project.- model_splits: int, optional
(New in version v2.21) Sets the cap on the number of jobs per model used when building models to control number of jobs in the queue. Higher number of model splits will allow for less downsampling leading to the use of more post-processed data.
- allow_partial_history_time_series_predictions: bool, optional
(New in version v2.24) Whether to allow time series models to make predictions using partial historical data.
- unsupervised_mode: bool, optional
(New in version v3.1) Whether the date/time partitioning is for an unsupervised project
- unsupervised_type: str, optional
(New in version v3.2) The unsupervised project type, only valid if
unsupervised_mode
is True. Use values fromdatarobot.enums.UnsupervisedTypeEnum
enum. If not specified then the project defaults to ‘anomaly’ whenunsupervised_mode
is True.
- classmethod generate(project_id, spec, max_wait=600, target=None)¶
Preview the full partitioning determined by a DatetimePartitioningSpecification
Based on the project dataset and the partitioning specification, inspect the full partitioning that would be used if the same specification were passed into
Project.analyze_and_model
.- Parameters
- project_idstr
the id of the project
- specDatetimePartitioningSpec
the desired partitioning
- max_waitint, optional
For some settings (e.g. generating a partitioning preview for a multiseries project for the first time), an asynchronous task must be run to analyze the dataset. max_wait governs the maximum time (in seconds) to wait before giving up. In all non-multiseries projects, this is unused.
- targetstr, optional
the name of the target column. For unsupervised projects target may be None. Providing a target will ensure that partitions are correctly optimized for your dataset.
- Returns
- DatetimePartitioning
the full generated partitioning
- classmethod get(project_id)¶
Retrieve the DatetimePartitioning from a project
Only available if the project has already set the target as a datetime project.
- Parameters
- project_idstr
the id of the project to retrieve partitioning for
- Returns
- DatetimePartitioningthe full partitioning for the project
- Return type
- classmethod generate_optimized(project_id, spec, target, max_wait=600)¶
Preview the full partitioning determined by a DatetimePartitioningSpecification
Based on the project dataset and the partitioning specification, inspect the full partitioning that would be used if the same specification were passed into Project.analyze_and_model.
- Parameters
- project_idstr
the id of the project
- specDatetimePartitioningSpecification
the desired partitioning
- targetstr
the name of the target column. For unsupervised projects target may be None.
- max_waitint, optional
Governs the maximum time (in seconds) to wait before giving up.
- Returns
- DatetimePartitioning
the full generated partitioning
- Return type
- classmethod get_optimized(project_id, datetime_partitioning_id)¶
Retrieve an Optimized DatetimePartitioning from a project for the specified datetime_partitioning_id. A datetime_partitioning_id is created by using the
generate_optimized
function.- Parameters
- project_idstr
the id of the project to retrieve partitioning for
- datetime_partitioning_idObjectId
the ObjectId associated with the project to retrieve from mongo
- Returns
- DatetimePartitioningthe full partitioning for the project
- Return type
- classmethod feature_log_list(project_id, offset=None, limit=None)¶
Retrieve the feature derivation log content and log length for a time series project.
The Time Series Feature Log provides details about the feature generation process for a time series project. It includes information about which features are generated and their priority, as well as the detected properties of the time series data such as whether the series is stationary, and periodicities detected.
This route is only supported for time series projects that have finished partitioning.
The feature derivation log will include information about:
- Detected stationarity of the series:e.g. ‘Series detected as non-stationary’
- Detected presence of multiplicative trend in the series:e.g. ‘Multiplicative trend detected’
- Detected presence of multiplicative trend in the series:e.g. ‘Detected periodicities: 7 day’
- Maximum number of feature to be generated:e.g. ‘Maximum number of feature to be generated is 1440’
- Window sizes used in rolling statistics / lag extractorse.g. ‘The window sizes chosen to be: 2 months(because the time step is 1 month and Feature Derivation Window is 2 months)’
- Features that are specified as known-in-advancee.g. ‘Variables treated as apriori: holiday’
- Details about why certain variables are transformed in the input datae.g. ‘Generating variable “y (log)” from “y” because multiplicative trendis detected’
- Details about features generated as timeseries features, and their prioritye.g. ‘Generating feature “date (actual)” from “date” (priority: 1)’
- Parameters
- project_idstr
project id to retrieve a feature derivation log for.
- offsetint
optional, defaults is 0, this many results will be skipped.
- limitint
optional, defaults to 100, at most this many results are returned. To specify no limit, use 0. The default may change without notice.
- classmethod feature_log_retrieve(project_id)¶
Retrieve the feature derivation log content and log length for a time series project.
The Time Series Feature Log provides details about the feature generation process for a time series project. It includes information about which features are generated and their priority, as well as the detected properties of the time series data such as whether the series is stationary, and periodicities detected.
This route is only supported for time series projects that have finished partitioning.
The feature derivation log will include information about:
- Detected stationarity of the series:e.g. ‘Series detected as non-stationary’
- Detected presence of multiplicative trend in the series:e.g. ‘Multiplicative trend detected’
- Detected presence of multiplicative trend in the series:e.g. ‘Detected periodicities: 7 day’
- Maximum number of feature to be generated:e.g. ‘Maximum number of feature to be generated is 1440’
- Window sizes used in rolling statistics / lag extractorse.g. ‘The window sizes chosen to be: 2 months(because the time step is 1 month and Feature Derivation Window is 2 months)’
- Features that are specified as known-in-advancee.g. ‘Variables treated as apriori: holiday’
- Details about why certain variables are transformed in the input datae.g. ‘Generating variable “y (log)” from “y” because multiplicative trendis detected’
- Details about features generated as timeseries features, and their prioritye.g. ‘Generating feature “date (actual)” from “date” (priority: 1)’
- Parameters
- project_idstr
project id to retrieve a feature derivation log for.
- Return type
str
- to_specification(use_holdout_start_end_format=False, use_backtest_start_end_format=False)¶
Render the DatetimePartitioning as a
DatetimePartitioningSpecification
The resulting specification can be used when setting the target, and contains only the attributes directly controllable by users.
- Parameters
- use_holdout_start_end_formatbool, optional
Defaults to
False
. IfTrue
, will useholdout_end_date
when configuring the holdout partition. IfFalse
, will useholdout_duration
instead.- use_backtest_start_end_formatbool, optional
Defaults to
False
. IfFalse
, will use a duration-based approach for specifying backtests (gap_duration
,validation_start_date
, andvalidation_duration
). IfTrue
, will use a start/end date approach for specifying backtests (primary_training_start_date
,primary_training_end_date
,validation_start_date
,validation_end_date
). In contrast, projects created in the Web UI will use the start/end date approach for specifying backtests. Set this parameter toTrue
to mirror the behavior in the Web UI.
- Returns
- DatetimePartitioningSpecification
the specification for this partitioning
- Return type
- to_dataframe()¶
Render the partitioning settings as a dataframe for convenience of display
Excludes project_id, datetime_partition_column, date_format, autopilot_data_selection_method, validation_duration, and number_of_backtests, as well as the row count information, if present.
Also excludes the time series specific parameters for use_time_series, default_to_known_in_advance, default_to_do_not_derive, and defining the feature derivation and forecast windows.
- Return type
DataFrame
- classmethod datetime_partitioning_log_retrieve(project_id, datetime_partitioning_id)¶
Retrieve the datetime partitioning log content for an optimized datetime partitioning.
The datetime partitioning log provides details about the partitioning process for an OTV or time series project.
- Parameters
- project_idstr
The project ID of the project associated with the datetime partitioning.
- datetime_partitioning_idstr
id of the optimized datetime partitioning
- Return type
Any
- classmethod datetime_partitioning_log_list(project_id, datetime_partitioning_id, offset=None, limit=None)¶
Retrieve the datetime partitioning log content and log length for an optimized datetime partitioning.
The Datetime Partitioning Log provides details about the partitioning process for an OTV or Time Series project.
- Parameters
- project_idstr
project id of the project associated with the datetime partitioning.
- datetime_partitioning_idstr
id of the optimized datetime partitioning
- offsetint or None
optional, defaults is 0, this many results will be skipped.
- limitint or None
optional, defaults to 100, at most this many results are returned. To specify no limit, use 0. The default may change without notice.
- Return type
Any
- classmethod get_input_data(project_id, datetime_partitioning_id)¶
Retrieve the input used to create an optimized DatetimePartitioning from a project for the specified datetime_partitioning_id. A datetime_partitioning_id is created by using the
generate_optimized
function.- Parameters
- project_idstr
The ID of the project to retrieve partitioning for.
- datetime_partitioning_idObjectId
The ObjectId associated with the project to retrieve from Mongo.
- Returns
- DatetimePartitioningInputThe input to optimized datetime partitioning.
- Return type
- class datarobot.helpers.partitioning_methods.DatetimePartitioningId(datetime_partitioning_id, project_id)¶
Defines a DatetimePartitioningId used for datetime partitioning.
This class only includes the datetime_partitioning_id that identifies a previously optimized datetime partitioning and the project_id for the associated project.
This is the specification that should be passed to
Project.analyze_and_model
via thepartitioning_method
parameter. To see the full partitioning useDatetimePartitioning.get_optimized
.- Attributes
- datetime_partitioning_idstr
The ID of the datetime partitioning to use.
- project_idstr
The ID of the project that the datetime partitioning is associated with.
- collect_payload()¶
Set up the dict that should be sent to the server when setting the target Returns ——- partitioning_spec : dict
- Return type
Dict
[str
,Any
]
- prep_payload(project_id, max_wait=600)¶
Run any necessary validation and prep of the payload, including async operations
Mainly used for the datetime partitioning spec but implemented in general for consistency
- Return type
None
- update(**kwargs)¶
Update this instance, matching attributes to kwargs
Mainly used for the datetime partitioning spec but implemented in general for consistency
- Return type
NoReturn
- class datarobot.helpers.partitioning_methods.Backtest(index=None, available_training_start_date=None, available_training_duration=None, available_training_row_count=None, available_training_end_date=None, primary_training_start_date=None, primary_training_duration=None, primary_training_row_count=None, primary_training_end_date=None, gap_start_date=None, gap_duration=None, gap_row_count=None, gap_end_date=None, validation_start_date=None, validation_duration=None, validation_row_count=None, validation_end_date=None, total_row_count=None)¶
A backtest used to evaluate models trained in a datetime partitioned project
When setting up a datetime partitioning project, backtests are specified by a
BacktestSpecification
.The available training data corresponds to all the data available for training, while the primary training data corresponds to the data that can be used to train while ensuring that all backtests are available. If a model is trained with more data than is available in the primary training data, then all backtests may not have scores available.
All durations are specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Attributes
- indexint
the index of the backtest
- available_training_start_datedatetime.datetime
the start date of the available training data for this backtest
- available_training_durationstr
the duration of available training data for this backtest
- available_training_row_countint or None
the number of rows of available training data for this backtest. Only available when retrieving from a project where the target is set.
- available_training_end_datedatetime.datetime
the end date of the available training data for this backtest
- primary_training_start_datedatetime.datetime
the start date of the primary training data for this backtest
- primary_training_durationstr
the duration of the primary training data for this backtest
- primary_training_row_countint or None
the number of rows of primary training data for this backtest. Only available when retrieving from a project where the target is set.
- primary_training_end_datedatetime.datetime
the end date of the primary training data for this backtest
- gap_start_datedatetime.datetime
the start date of the gap between training and validation scoring data for this backtest
- gap_durationstr
the duration of the gap between training and validation scoring data for this backtest
- gap_row_countint or None
the number of rows in the gap between training and validation scoring data for this backtest. Only available when retrieving from a project where the target is set.
- gap_end_datedatetime.datetime
the end date of the gap between training and validation scoring data for this backtest
- validation_start_datedatetime.datetime
the start date of the validation scoring data for this backtest
- validation_durationstr
the duration of the validation scoring data for this backtest
- validation_row_countint or None
the number of rows of validation scoring data for this backtest. Only available when retrieving from a project where the target is set.
- validation_end_datedatetime.datetime
the end date of the validation scoring data for this backtest
- total_row_countint or None
the number of rows in this backtest. Only available when retrieving from a project where the target is set.
- to_specification(use_start_end_format=False)¶
Render this backtest as a
BacktestSpecification
.The resulting specification includes only the attributes users can directly control, not those indirectly determined by the project dataset.
- Parameters
- use_start_end_formatbool
Default
False
. IfFalse
, will use a duration-based approach for specifying backtests (gap_duration
,validation_start_date
, andvalidation_duration
). IfTrue
, will use a start/end date approach for specifying backtests (primary_training_start_date
,primary_training_end_date
,validation_start_date
,validation_end_date
). In contrast, projects created in the Web UI will use the start/end date approach for specifying backtests. Set this parameter toTrue
to mirror the behavior in the Web UI.
- Returns
- BacktestSpecification
the specification for this backtest
- Return type
- to_dataframe()¶
Render this backtest as a dataframe for convenience of display
- Returns
- backtest_partitioningpandas.Dataframe
the backtest attributes, formatted into a dataframe
- Return type
DataFrame
- class datarobot.helpers.partitioning_methods.FeatureSettingsPayload() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
- datarobot.helpers.partitioning_methods.construct_duration_string(years=0, months=0, days=0, hours=0, minutes=0, seconds=0)¶
Construct a valid string representing a duration in accordance with ISO8601
A duration of six months, 3 days, and 12 hours could be represented as P6M3DT12H.
- Parameters
- yearsint
the number of years in the duration
- monthsint
the number of months in the duration
- daysint
the number of days in the duration
- hoursint
the number of hours in the duration
- minutesint
the number of minutes in the duration
- secondsint
the number of seconds in the duration
- Returns
- duration_string: str
The duration string, specified compatibly with ISO8601
- Return type
str
PayoffMatrix¶
- class datarobot.models.PayoffMatrix(project_id, id, name=None, true_positive_value=None, true_negative_value=None, false_positive_value=None, false_negative_value=None)¶
Represents a Payoff Matrix, a costs/benefit scenario used for creating a profit curve.
Examples
import datarobot as dr # create a payoff matrix payoff_matrix = dr.PayoffMatrix.create( project_id, name, true_positive_value=100, true_negative_value=10, false_positive_value=0, false_negative_value=-10, ) # list available payoff matrices payoff_matrices = dr.PayoffMatrix.list(project_id) payoff_matrix = payoff_matrices[0]
- Attributes
- project_idstr
id of the project with which the payoff matrix is associated.
- idstr
id of the payoff matrix.
- namestr
User-supplied label for the payoff matrix.
- true_positive_valuefloat
Cost or benefit of a true positive classification
- true_negative_valuefloat
Cost or benefit of a true negative classification
- false_positive_valuefloat
Cost or benefit of a false positive classification
- false_negative_valuefloat
Cost or benefit of a false negative classification
- classmethod create(project_id, name, true_positive_value=1, true_negative_value=1, false_positive_value=-1, false_negative_value=-1)¶
Create a payoff matrix associated with a specific project.
- Parameters
- project_idstr
id of the project with which the payoff matrix will be associated
- Returns
- payoff_matrix
PayoffMatrix
The newly created payoff matrix
- payoff_matrix
- Return type
- classmethod list(project_id)¶
Fetch all the payoff matrices for a project.
- Parameters
- project_idstr
id of the project
- Returns
- ——-
- List of PayoffMatrix
A list of
PayoffMatrix
objects- Raises
- ——
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
List
[PayoffMatrix
]
- classmethod get(project_id, id)¶
Retrieve a specified payoff matrix.
- Parameters
- project_idstr
id of the project the model belongs to
- idstr
id of the payoff matrix
- Returns
PayoffMatrix
object representing specified- payoff matrix
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
- classmethod update(project_id, id, name, true_positive_value, true_negative_value, false_positive_value, false_negative_value)¶
Update (replace) a payoff matrix. Note that all data fields are required.
- Parameters
- project_idstr
id of the project to which the payoff matrix belongs
- idstr
id of the payoff matrix
- namestr
User-supplied label for the payoff matrix
- true_positive_valuefloat
True positive payoff value to use for the profit curve
- true_negative_valuefloat
True negative payoff value to use for the profit curve
- false_positive_valuefloat
False positive payoff value to use for the profit curve
- false_negative_valuefloat
False negative payoff value to use for the profit curve
- Returns
- payoff_matrix
PayoffMatrix with updated values
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
- classmethod delete(project_id, id)¶
Delete a specified payoff matrix.
- Parameters
- project_idstr
id of the project the model belongs to
- idstr
id of the payoff matrix
- Returns
- responserequests.Response
Empty response (204)
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
Response
- classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters
- datadict
Correctly snake_cased keys and their values.
- Return type
TypeVar
(T
, bound=APIObject
)
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
TypeVar
(T
, bound=APIObject
)
PredictJob¶
- datarobot.models.predict_job.wait_for_async_predictions(project_id, predict_job_id, max_wait=600)¶
Given a Project id and PredictJob id poll for status of process responsible for predictions generation until it’s finished
- Parameters
- project_idstr
The identifier of the project
- predict_job_idstr
The identifier of the PredictJob
- max_waitint, optional
Time in seconds after which predictions creation is considered unsuccessful
- Returns
- predictionspandas.DataFrame
Generated predictions.
- Raises
- AsyncPredictionsGenerationError
Raised if status of fetched PredictJob object is
error
- AsyncTimeoutError
Predictions weren’t generated in time, specified by
max_wait
parameter
- Return type
DataFrame
- class datarobot.models.PredictJob(data, completed_resource_url=None)¶
Tracks asynchronous work being done within a project
- Attributes
- idint
the id of the job
- project_idstr
the id of the project the job belongs to
- statusstr
the status of the job - will be one of
datarobot.enums.QUEUE_STATUS
- job_typestr
what kind of work the job is doing - will be ‘predict’ for predict jobs
- is_blockedbool
if true, the job is blocked (cannot be executed) until its dependencies are resolved
- messagestr
a message about the state of the job, typically explaining why an error occurred
- classmethod from_job(job)¶
Transforms a generic Job into a PredictJob
- Parameters
- job: Job
A generic job representing a PredictJob
- Returns
- predict_job: PredictJob
A fully populated PredictJob with all the details of the job
- Raises
- ValueError:
If the generic Job was not a predict job, e.g. job_type != JOB_TYPE.PREDICT
- Return type
- classmethod get(project_id, predict_job_id)¶
Fetches one PredictJob. If the job finished, raises PendingJobFinished exception.
- Parameters
- project_idstr
The identifier of the project the model on which prediction was started belongs to
- predict_job_idstr
The identifier of the predict_job
- Returns
- predict_jobPredictJob
The pending PredictJob
- Raises
- PendingJobFinished
If the job being queried already finished, and the server is re-routing to the finished predictions.
- AsyncFailureError
Querying this resource gave a status code other than 200 or 303
- Return type
- classmethod get_predictions(project_id, predict_job_id, class_prefix='class_')¶
Fetches finished predictions from the job used to generate them.
Note
The prediction API for classifications now returns an additional prediction_values dictionary that is converted into a series of class_prefixed columns in the final dataframe. For example, <label> = 1.0 is converted to ‘class_1.0’. If you are on an older version of the client (prior to v2.8), you must update to v2.8 to correctly pivot this data.
- Parameters
- project_idstr
The identifier of the project to which belongs the model used for predictions generation
- predict_job_idstr
The identifier of the predict_job
- class_prefixstr
The prefix to append to labels in the final dataframe (e.g., apple -> class_apple)
- Returns
- predictionspandas.DataFrame
Generated predictions
- Raises
- JobNotFinished
If the job has not finished yet
- AsyncFailureError
Querying the predict_job in question gave a status code other than 200 or 303
- Return type
DataFrame
- cancel()¶
Cancel this job. If this job has not finished running, it will be removed and canceled.
- get_result(params=None)¶
- Parameters
- paramsdict or None
Query parameters to be added to request to get results.
- For featureEffects, source param is required to define source,
- otherwise the default is `training`
- Returns
- resultobject
- Return type depends on the job type:
for model jobs, a Model is returned
for predict jobs, a pandas.DataFrame (with predictions) is returned
for featureImpact jobs, a list of dicts by default (see
with_metadata
parameter of theFeatureImpactJob
class and itsget()
method).for primeRulesets jobs, a list of Rulesets
for primeModel jobs, a PrimeModel
for primeDownloadValidation jobs, a PrimeFile
for predictionExplanationInitialization jobs, a PredictionExplanationsInitialization
for predictionExplanations jobs, a PredictionExplanations
for featureEffects, a FeatureEffects
- Raises
- JobNotFinished
If the job is not finished, the result is not available.
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
- get_result_when_complete(max_wait=600, params=None)¶
- Parameters
- max_waitint, optional
How long to wait for the job to finish.
- paramsdict, optional
Query parameters to be added to request.
- Returns
- result: object
Return type is the same as would be returned by Job.get_result.
- Raises
- AsyncTimeoutError
If the job does not finish in time
- AsyncProcessUnsuccessfulError
If the job errored or was aborted
- refresh()¶
Update this object with the latest job data from the server.
- wait_for_completion(max_wait=600)¶
Waits for job to complete.
- Parameters
- max_waitint, optional
How long to wait for the job to finish.
- Return type
None
Prediction Dataset¶
- class datarobot.models.PredictionDataset(project_id, id, name, created, num_rows, num_columns, forecast_point=None, predictions_start_date=None, predictions_end_date=None, relax_known_in_advance_features_check=None, data_quality_warnings=None, forecast_point_range=None, data_start_date=None, data_end_date=None, max_forecast_date=None, actual_value_column=None, detected_actual_value_columns=None, contains_target_values=None, secondary_datasets_config_id=None)¶
A dataset uploaded to make predictions
Typically created via project.upload_dataset
- Attributes
- idstr
the id of the dataset
- project_idstr
the id of the project the dataset belongs to
- createdstr
the time the dataset was created
- namestr
the name of the dataset
- num_rowsint
the number of rows in the dataset
- num_columnsint
the number of columns in the dataset
- forecast_pointdatetime.datetime or None
For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series predictions documentation for more information.
- predictions_start_datedatetime.datetime or None, optional
For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_datedatetime.datetime or None, optional
For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.- relax_known_in_advance_features_checkbool, optional
(New in version v2.15) For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- data_quality_warningsdict, optional
(New in version v2.15) A dictionary that contains available warnings about potential problems in this prediction dataset. Available warnings include:
- has_kia_missing_values_in_forecast_windowbool
Applicable for time series projects. If True, known in advance features have missing values in forecast window which may decrease prediction accuracy.
- insufficient_rows_for_evaluating_modelsbool
Applicable for datasets which are used as external test sets. If True, there is not enough rows in dataset to calculate insights.
- single_class_actual_value_columnbool
Applicable for datasets which are used as external test sets. If True, actual value column has only one class and such insights as ROC curve can not be calculated. Only applies for binary classification projects or unsupervised projects.
- forecast_point_rangelist[datetime.datetime] or None, optional
(New in version v2.20) For time series projects only. Specifies the range of dates available for use as a forecast point.
- data_start_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The minimum primary date of this prediction dataset.
- data_end_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The maximum primary date of this prediction dataset.
- max_forecast_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The maximum forecast date of this prediction dataset.
- actual_value_columnstring, optional
(New in version v2.21) Optional, only available for unsupervised projects, in case dataset was uploaded with actual value column specified. Name of the column which will be used to calculate the classification metrics and insights.
- detected_actual_value_columnslist of dict, optional
(New in version v2.21) For unsupervised projects only, list of detected actual value columns information containing missing count and name for each column.
- contains_target_valuesbool, optional
(New in version v2.21) Only for supervised projects. If True, dataset contains target values and can be used to calculate the classification metrics and insights.
- secondary_datasets_config_id: string or None, optional
(New in version v2.23) The Id of the alternative secondary dataset config to use during prediction for Feature discovery project.
- classmethod get(project_id, dataset_id)¶
Retrieve information about a dataset uploaded for predictions
- Parameters
- project_id:
the id of the project to query
- dataset_id:
the id of the dataset to retrieve
- Returns
- dataset: PredictionDataset
A dataset uploaded to make predictions
- Return type
- delete()¶
Delete a dataset uploaded for predictions
Will also delete predictions made using this dataset and cancel any predict jobs using this dataset.
- Return type
None
Prediction Explanations¶
- class datarobot.PredictionExplanationsInitialization(project_id, model_id, prediction_explanations_sample=None)¶
Represents a prediction explanations initialization of a model.
- Attributes
- project_idstr
id of the project the model belongs to
- model_idstr
id of the model the prediction explanations initialization is for
- prediction_explanations_samplelist of dict
a small sample of prediction explanations that could be generated for the model
- classmethod get(project_id, model_id)¶
Retrieve the prediction explanations initialization for a model.
Prediction explanations initializations are a prerequisite for computing prediction explanations, and include a sample what the computed prediction explanations for a prediction dataset would look like.
- Parameters
- project_idstr
id of the project the model belongs to
- model_idstr
id of the model the prediction explanations initialization is for
- Returns
- prediction_explanations_initializationPredictionExplanationsInitialization
The queried instance.
- Raises
- ClientError (404)
If the project or model does not exist or the initialization has not been computed.
- classmethod create(project_id, model_id)¶
Create a prediction explanations initialization for the specified model.
- Parameters
- project_idstr
id of the project the model belongs to
- model_idstr
id of the model for which initialization is requested
- Returns
- jobJob
an instance of created async job
- delete()¶
Delete this prediction explanations initialization.
- class datarobot.PredictionExplanations(id, project_id, model_id, dataset_id, max_explanations, num_columns, finish_time, prediction_explanations_location, threshold_low=None, threshold_high=None, class_names=None, num_top_classes=None, source=None)¶
Represents prediction explanations metadata and provides access to computation results.
Examples
prediction_explanations = dr.PredictionExplanations.get(project_id, explanations_id) for row in prediction_explanations.get_rows(): print(row) # row is an instance of PredictionExplanationsRow
- Attributes
- idstr
id of the record and prediction explanations computation result
- project_idstr
id of the project the model belongs to
- model_idstr
id of the model the prediction explanations are for
- dataset_idstr
id of the prediction dataset prediction explanations were computed for
- max_explanationsint
maximum number of prediction explanations to supply per row of the dataset
- threshold_lowfloat
the lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset
- threshold_highfloat
the high threshold, above which a prediction must score in order for prediction explanations to be computed for a row in the dataset
- num_columnsint
the number of columns prediction explanations were computed for
- finish_timefloat
timestamp referencing when computation for these prediction explanations finished
- prediction_explanations_locationstr
where to retrieve the prediction explanations
- source: str
For OTV/TS in-training predictions. Holds the portion of the training dataset used to generate predictions.
- classmethod get(project_id, prediction_explanations_id)¶
Retrieve a specific prediction explanations metadata.
- Parameters
- project_idstr
id of the project the explanations belong to
- prediction_explanations_idstr
id of the prediction explanations
- Returns
- prediction_explanationsPredictionExplanations
The queried instance.
- classmethod create(project_id, model_id, dataset_id, max_explanations=None, threshold_low=None, threshold_high=None, mode=None)¶
Create prediction explanations for the specified dataset.
In order to create PredictionExplanations for a particular model and dataset, you must first:
Compute feature impact for the model via
datarobot.Model.get_feature_impact()
Compute a PredictionExplanationsInitialization for the model via
datarobot.PredictionExplanationsInitialization.create(project_id, model_id)
Compute predictions for the model and dataset via
datarobot.Model.request_predictions(dataset_id)
threshold_high
andthreshold_low
are optional filters applied to speed up computation. When at least one is specified, only the selected outlier rows will have prediction explanations computed. Rows are considered to be outliers if their predicted value (in case of regression projects) or probability of being the positive class (in case of classification projects) is less thanthreshold_low
or greater thanthresholdHigh
. If neither is specified, prediction explanations will be computed for all rows.- Parameters
- project_idstr
id of the project the model belongs to
- model_idstr
id of the model for which prediction explanations are requested
- dataset_idstr
id of the prediction dataset for which prediction explanations are requested
- threshold_lowfloat, optional
the lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset. If neither
threshold_high
northreshold_low
is specified, prediction explanations will be computed for all rows.- threshold_highfloat, optional
the high threshold, above which a prediction must score in order for prediction explanations to be computed. If neither
threshold_high
northreshold_low
is specified, prediction explanations will be computed for all rows.- max_explanationsint, optional
the maximum number of prediction explanations to supply per row of the dataset, default: 3.
- modePredictionExplanationsMode, optional
mode of calculation for multiclass models, if not specified - server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).
- Returns
- job: Job
an instance of created async job
- classmethod create_on_training_data(project_id, model_id, dataset_id, max_explanations=None, threshold_low=None, threshold_high=None, mode=None, datetime_prediction_partition=None)¶
Create prediction explanations for the the dataset used to train the model. This can be retrieved by calling
dr.Model.get().featurelist_id
. For OTV and timeseries projects,datetime_prediction_partition
is required and limited to the first backtest (‘0’) or holdout (‘holdout’).In order to create PredictionExplanations for a particular model and dataset, you must first:
Compute Feature Impact for the model via
datarobot.Model.get_feature_impact()
/Compute a PredictionExplanationsInitialization for the model via
datarobot.PredictionExplanationsInitialization.create(project_id, model_id)
.Compute predictions for the model and dataset via
datarobot.Model.request_predictions(dataset_id)
.
threshold_high
andthreshold_low
are optional filters applied to speed up computation. When at least one is specified, only the selected outlier rows will have prediction explanations computed. Rows are considered to be outliers if their predicted value (in case of regression projects) or probability of being the positive class (in case of classification projects) is less thanthreshold_low
or greater thanthresholdHigh
. If neither is specified, prediction explanations will be computed for all rows.- Parameters
- project_idstr
The ID of the project the model belongs to.
- model_idstr
The ID of the model for which prediction explanations are requested.
- dataset_idstr
The ID of the prediction dataset for which prediction explanations are requested.
- threshold_lowfloat, optional
The lower threshold, below which a prediction must score in order for prediction explanations to be computed for a row in the dataset. If neither
threshold_high
northreshold_low
is specified, prediction explanations will be computed for all rows.- threshold_highfloat, optional
The high threshold, above which a prediction must score in order for prediction explanations to be computed. If neither
threshold_high
northreshold_low
is specified, prediction explanations will be computed for all rows.- max_explanationsint, optional
The maximum number of prediction explanations to supply per row of the dataset (default: 3).
- modePredictionExplanationsMode, optional
The mode of calculation for multiclass models. If not specified, the server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).
- datetime_prediction_partition: str
Options: ‘0’, ‘holdout’ or None. Used only by time series and OTV projects to indicate what part of the dataset will be used to generate predictions for computing prediction explanation. Current options are ‘0’ (first backtest) and ‘holdout’. Note that only the validation partition of the first backtest will be used to generation predictions.
- Returns
- job: Job
An instance of created async job.
- classmethod list(project_id, model_id=None, limit=None, offset=None)¶
List of prediction explanations metadata for a specified project.
- Parameters
- project_idstr
id of the project to list prediction explanations for
- model_idstr, optional
if specified, only prediction explanations computed for this model will be returned
- limitint or None
at most this many results are returned, default: no limit
- offsetint or None
this many results will be skipped, default: 0
- Returns
- prediction_explanationslist[PredictionExplanations]
- get_rows(batch_size=None, exclude_adjusted_predictions=True)¶
Retrieve prediction explanations rows.
- Parameters
- batch_sizeint or None, optional
maximum number of prediction explanations rows to retrieve per request
- exclude_adjusted_predictionsbool
Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
- Yields
- prediction_explanations_rowPredictionExplanationsRow
Represents prediction explanations computed for a prediction row.
- is_multiclass()¶
Whether these explanations are for a multiclass project or a non-multiclass project
- is_unsupervised_clustering_or_multiclass()¶
Clustering and multiclass XEMP always has either one of num_top_classes or class_names parameters set
- get_number_of_explained_classes()¶
How many classes we attempt to explain for each row
- get_all_as_dataframe(exclude_adjusted_predictions=True)¶
Retrieve all prediction explanations rows and return them as a pandas.DataFrame.
Returned dataframe has the following structure:
row_id : row id from prediction dataset
prediction : the output of the model for this row
adjusted_prediction : adjusted prediction values (only appears for projects that utilize prediction adjustments, e.g. projects with an exposure column)
class_0_label : a class level from the target (only appears for classification projects)
class_0_probability : the probability that the target is this class (only appears for classification projects)
class_1_label : a class level from the target (only appears for classification projects)
class_1_probability : the probability that the target is this class (only appears for classification projects)
explanation_0_feature : the name of the feature contributing to the prediction for this explanation
explanation_0_feature_value : the value the feature took on
explanation_0_label : the output being driven by this explanation. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.
explanation_0_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this explanation
explanation_0_per_ngram_text_explanations : Text prediction explanations data in json formatted string.
explanation_0_strength : the amount this feature’s value affected the prediction
…
explanation_N_feature : the name of the feature contributing to the prediction for this explanation
explanation_N_feature_value : the value the feature took on
explanation_N_label : the output being driven by this explanation. For regression projects, this is the name of the target feature. For classification projects, this is the class label whose probability increasing would correspond to a positive strength.
explanation_N_qualitative_strength : a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’) for this explanation
explanation_N_per_ngram_text_explanations : Text prediction explanations data in json formatted string.
explanation_N_strength : the amount this feature’s value affected the prediction
For classification projects, the server does not guarantee any ordering on the prediction values, however within this function we sort the values so that class_X corresponds to the same class from row to row.
- Parameters
- exclude_adjusted_predictionsbool
Optional, defaults to True. Set this to False to include adjusted prediction values in the returned dataframe.
- Returns
- dataframe: pandas.DataFrame
- download_to_csv(filename, encoding='utf-8', exclude_adjusted_predictions=True)¶
Save prediction explanations rows into CSV file.
- Parameters
- filenamestr or file object
path or file object to save prediction explanations rows
- encodingstring, optional
A string representing the encoding to use in the output file, defaults to ‘utf-8’
- exclude_adjusted_predictionsbool
Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
- get_prediction_explanations_page(limit=None, offset=None, exclude_adjusted_predictions=True)¶
Get prediction explanations.
If you don’t want use a generator interface, you can access paginated prediction explanations directly.
- Parameters
- limitint or None
the number of records to return, the server will use a (possibly finite) default if not specified
- offsetint or None
the number of records to skip, default 0
- exclude_adjusted_predictionsbool
Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
- Returns
- prediction_explanationsPredictionExplanationsPage
- delete()¶
Delete these prediction explanations.
- class datarobot.models.prediction_explanations.PredictionExplanationsRow(row_id, prediction, prediction_values, prediction_explanations=None, adjusted_prediction=None, adjusted_prediction_values=None)¶
Represents prediction explanations computed for a prediction row.
Notes
PredictionValue
contains:label
: describes what this model output corresponds to. For regression projects, it is the name of the target feature. For classification projects, it is a level from the target feature.value
: the output of the prediction. For regression projects, it is the predicted value of the target. For classification projects, it is the predicted probability the row belongs to the class identified by the label.
PredictionExplanation
contains:label
: described what output was driven by this explanation. For regression projects, it is the name of the target feature. For classification projects, it is the class whose probability increasing would correspond to a positive strength of this prediction explanation.feature
: the name of the feature contributing to the predictionfeature_value
: the value the feature took on for this rowstrength
: the amount this feature’s value affected the predictionqualitative_strength
: a human-readable description of how strongly the feature affected the prediction (e.g. ‘+++’, ‘–’, ‘+’)
- Attributes
- row_idint
which row this
PredictionExplanationsRow
describes- predictionfloat
the output of the model for this row
- adjusted_predictionfloat or None
adjusted prediction value for projects that provide this information, None otherwise
- prediction_valueslist
an array of dictionaries with a schema described as
PredictionValue
- adjusted_prediction_valueslist
same as prediction_values but for adjusted predictions
- prediction_explanationslist
an array of dictionaries with a schema described as
PredictionExplanation
- class datarobot.models.prediction_explanations.PredictionExplanationsPage(id, count=None, previous=None, next=None, data=None, prediction_explanations_record_location=None, adjustment_method=None)¶
Represents a batch of prediction explanations received by one request.
- Attributes
- idstr
id of the prediction explanations computation result
- datalist[dict]
list of raw prediction explanations; each row corresponds to a row of the prediction dataset
- countint
total number of rows computed
- previous_pagestr
where to retrieve previous page of prediction explanations, None if current page is the first
- next_pagestr
where to retrieve next page of prediction explanations, None if current page is the last
- prediction_explanations_record_locationstr
where to retrieve the prediction explanations metadata
- adjustment_methodstr
Adjustment method that was applied to predictions, or ‘N/A’ if no adjustments were done.
- classmethod get(project_id, prediction_explanations_id, limit=None, offset=0, exclude_adjusted_predictions=True)¶
Retrieve prediction explanations.
- Parameters
- project_idstr
id of the project the model belongs to
- prediction_explanations_idstr
id of the prediction explanations
- limitint or None
the number of records to return; the server will use a (possibly finite) default if not specified
- offsetint or None
the number of records to skip, default 0
- exclude_adjusted_predictionsbool
Optional, defaults to True. Set to False to include adjusted predictions, which will differ from the predictions on some projects, e.g. those with an exposure column specified.
- Returns
- prediction_explanationsPredictionExplanationsPage
The queried instance.
- class datarobot.models.ShapMatrix(project_id, id, model_id=None, dataset_id=None)¶
Represents SHAP based prediction explanations and provides access to score values.
Examples
import datarobot as dr # request SHAP matrix calculation shap_matrix_job = dr.ShapMatrix.create(project_id, model_id, dataset_id) shap_matrix = shap_matrix_job.get_result_when_complete() # list available SHAP matrices shap_matrices = dr.ShapMatrix.list(project_id) shap_matrix = shap_matrices[0] # get SHAP matrix as dataframe shap_matrix_values = shap_matrix.get_as_dataframe()
- Attributes
- project_idstr
id of the project the model belongs to
- shap_matrix_idstr
id of the generated SHAP matrix
- model_idstr
id of the model used to
- dataset_idstr
id of the prediction dataset SHAP values were computed for
- classmethod create(cls, project_id, model_id, dataset_id)¶
Calculate SHAP based prediction explanations against previously uploaded dataset.
- Parameters
- project_idstr
id of the project the model belongs to
- model_idstr
id of the model for which prediction explanations are requested
- dataset_idstr
id of the prediction dataset for which prediction explanations are requested (as uploaded from Project.upload_dataset)
- Returns
- jobShapMatrixJob
The job computing the SHAP based prediction explanations
- Raises
- ClientError
If the server responded with 4xx status. Possible reasons are project, model or dataset don’t exist, user is not allowed or model doesn’t support SHAP based prediction explanations
- ServerError
If the server responded with 5xx status
- Return type
- classmethod list(cls, project_id)¶
Fetch all the computed SHAP prediction explanations for a project.
- Parameters
- project_idstr
id of the project
- Returns
- List of ShapMatrix
A list of
ShapMatrix
objects
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
List
[ShapMatrix
]
- classmethod get(cls, project_id, id)¶
Retrieve the specific SHAP matrix.
- Parameters
- project_idstr
id of the project the model belongs to
- idstr
id of the SHAP matrix
- Returns
ShapMatrix
object representing specified record
- Return type
- get_as_dataframe(read_timeout=60)¶
Retrieve SHAP matrix values as dataframe.
- Returns
- dataframepandas.DataFrame
A dataframe with SHAP scores
- read_timeoutint (optional, default 60)
New in version 2.29.
Wait this many seconds for the server to respond.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- Return type
DataFrame
- class datarobot.models.ClassListMode(class_names)¶
Calculate prediction explanations for the specified classes in each row.
- Attributes
- class_nameslist
List of class names that will be explained for each dataset row.
- get_api_parameters(batch_route=False)¶
Get parameters passed in corresponding API call
- Parameters
- batch_routebool
Batch routes describe prediction calls with all possible parameters, so to distinguish explanation parameters from others they have prefix in parameters.
- Returns
- dict
- class datarobot.models.TopPredictionsMode(num_top_classes)¶
Calculate prediction explanations for the number of top predicted classes in each row.
- Attributes
- num_top_classesint
Number of top predicted classes [1..10] that will be explained for each dataset row.
- get_api_parameters(batch_route=False)¶
Get parameters passed in corresponding API call
- Parameters
- batch_routebool
Batch routes describe prediction calls with all possible parameters, so to distinguish explanation parameters from others they have prefix in parameters.
- Returns
- dict
Predictions¶
- class datarobot.models.Predictions(project_id, prediction_id, model_id=None, dataset_id=None, includes_prediction_intervals=None, prediction_intervals_size=None, forecast_point=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, explanation_algorithm=None, max_explanations=None, shap_warnings=None)¶
Represents predictions metadata and provides access to prediction results.
Examples
List all predictions for a project
import datarobot as dr # Fetch all predictions for a project all_predictions = dr.Predictions.list(project_id) # Inspect all calculated predictions for predictions in all_predictions: print(predictions) # repr includes project_id, model_id, and dataset_id
Retrieve predictions by id
import datarobot as dr # Getting predictions by id predictions = dr.Predictions.get(project_id, prediction_id) # Dump actual predictions df = predictions.get_all_as_dataframe() print(df)
- Attributes
- project_idstr
id of the project the model belongs to
- model_idstr
id of the model
- prediction_idstr
id of generated predictions
- includes_prediction_intervalsbool, optional
(New in v2.16) For time series projects only. Indicates if prediction intervals will be part of the response. Defaults to False.
- prediction_intervals_sizeint, optional
(New in v2.16) For time series projects only. Indicates the percentile used for prediction intervals calculation. Will be present only if includes_prediction_intervals is True.
- forecast_pointdatetime.datetime, optional
(New in v2.20) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- predictions_start_datedatetime.datetime or None, optional
(New in v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_datedatetime.datetime or None, optional
(New in v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.- actual_value_columnstring, optional
(New in version v2.21) For time series unsupervised projects only. Actual value column which was used to calculate the classification metrics and insights on the prediction dataset. Can’t be provided with the
forecast_point
parameter.- explanation_algorithmdatarobot.enums.EXPLANATIONS_ALGORITHM, optional
(New in version v2.21) If set to ‘shap’, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
- max_explanationsint, optional
(New in version v2.21) The maximum number of explanation values that should be returned for each row, ordered by absolute value, greatest to least. If null, no limit. In the case of ‘shap’: if the number of features is greater than the limit, the sum of remaining values will also be returned as shapRemainingTotal. Defaults to null. Cannot be set if explanation_algorithm is omitted.
- shap_warningsdict, optional
(New in version v2.21) Will be present if explanation_algorithm was set to datarobot.enums.EXPLANATIONS_ALGORITHM.SHAP and there were additivity failures during SHAP values calculation.
- classmethod list(project_id, model_id=None, dataset_id=None)¶
Fetch all the computed predictions metadata for a project.
- Parameters
- project_idstr
id of the project
- model_idstr, optional
if specified, only predictions metadata for this model will be retrieved
- dataset_idstr, optional
if specified, only predictions metadata for this dataset will be retrieved
- Returns
- A list ofpy:class:Predictions <datarobot.models.Predictions> objects
- Return type
List
[Predictions
]
- classmethod get(project_id, prediction_id)¶
Retrieve the specific predictions metadata
- Parameters
- project_idstr
id of the project the model belongs to
- prediction_idstr
id of the prediction set
- Returns
Predictions
object representing specified- predictions
- Return type
- get_all_as_dataframe(class_prefix='class_', serializer='json')¶
Retrieve all prediction rows and return them as a pandas.DataFrame.
- Parameters
- class_prefixstr, optional
The prefix to append to labels in the final dataframe. Default is
class_
(e.g., apple -> class_apple)- serializerstr, optional
Serializer to use for the download. Options:
json
(default) orcsv
.
- Returns
- dataframe: pandas.DataFrame
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- Return type
DataFrame
- download_to_csv(filename, encoding='utf-8', serializer='json')¶
Save prediction rows into CSV file.
- Parameters
- filenamestr or file object
path or file object to save prediction rows
- encodingstring, optional
A string representing the encoding to use in the output file, defaults to ‘utf-8’
- serializerstr, optional
Serializer to use for the download. Options:
json
(default) orcsv
.
- Return type
None
PredictionServer¶
- class datarobot.PredictionServer(id=None, url=None, datarobot_key=None)¶
A prediction server can be used to make predictions.
- Attributes
- idstr, optional
The id of the prediction server.
- urlstr
The url of the prediction server.
- datarobot_keystr, optional
The
Datarobot-Key
HTTP header used in requests to this prediction server. Note that in thedatarobot.models.Deployment
instance there is thedefault_prediction_server
property which has this value as a “kebab-cased” key as opposed to “snake_cased”.
- classmethod list()¶
Returns a list of prediction servers a user can use to make predictions.
New in version v2.17.
- Returns
- prediction_serverslist of PredictionServer instances
Contains a list of prediction servers that can be used to make predictions.
Examples
prediction_servers = PredictionServer.list() prediction_servers >>> [PredictionServer('https://example.com')]
- Return type
List
[PredictionServer
]
PrimeFile¶
- class datarobot.models.PrimeFile(id=None, project_id=None, parent_model_id=None, model_id=None, ruleset_id=None, language=None, is_valid=None)¶
Represents an executable file available for download of the code for a DataRobot Prime model
- Attributes
- idstr
the id of the PrimeFile
- project_idstr
the id of the project this PrimeFile belongs to
- parent_model_idstr
the model being approximated by this PrimeFile
- model_idstr
the prime model this file represents
- ruleset_idint
the ruleset being used in this PrimeFile
- languagestr
the language of the code in this file - see enums.LANGUAGE for possibilities
- is_validbool
whether the code passed basic validation
- download(filepath)¶
Download the code and save it to a file
- Parameters
- filepath: string
the location to save the file to
- Return type
None
Project¶
- class datarobot.models.Project(id=None, project_name=None, mode=None, target=None, target_type=None, holdout_unlocked=None, metric=None, stage=None, partition=None, positive_class=None, created=None, advanced_options=None, max_train_pct=None, max_train_rows=None, file_name=None, credentials=None, feature_engineering_prediction_point=None, unsupervised_mode=None, use_feature_discovery=None, relationships_configuration_id=None, project_description=None, query_generator_id=None, segmentation=None, partitioning_method=None, catalog_id=None, catalog_version_id=None, use_gpu=None)¶
A project built from a particular training dataset
- Attributes
- idstr
the id of the project
- project_namestr
the name of the project
- project_descriptionstr
an optional description for the project
- modeint
The current autopilot mode. 0: Full Autopilot. 2: Manual Mode. 4: Comprehensive Autopilot. null: Mode not set.
- targetstr
the name of the selected target features
- target_typestr
Indicating what kind of modeling is being done in this project Options are: ‘Regression’, ‘Binary’ (Binary classification), ‘Multiclass’ (Multiclass classification), ‘Multilabel’ (Multilabel classification)
- holdout_unlockedbool
whether the holdout has been unlocked
- metricstr
the selected project metric (e.g. LogLoss)
- stagestr
the stage the project has reached - one of
datarobot.enums.PROJECT_STAGE
- partitiondict
information about the selected partitioning options
- positive_classstr
for binary classification projects, the selected positive class; otherwise, None
- createddatetime
the time the project was created
- advanced_optionsAdvancedOptions
information on the advanced options that were selected for the project settings, e.g. a weights column or a cap of the runtime of models that can advance autopilot stages
- max_train_pctfloat
The maximum percentage of the project dataset that can be used without going into the validation data or being too large to submit any blueprint for training
- max_train_rowsint
the maximum number of rows that can be trained on without going into the validation data or being too large to submit any blueprint for training
- file_namestr
The name of the file uploaded for the project dataset
- credentialslist, optional
A list of credentials for the datasets used in relationship configuration (previously graphs). For Feature Discovery projects, the list must be formatted in dictionary record format. Provide the catalogVersionId and credentialId for each dataset that is to be used in the project that requires authentication.
- feature_engineering_prediction_pointstr, optional
For time-aware Feature Engineering, this parameter specifies the column from the primary dataset to use as the prediction point.
- unsupervised_modebool, optional
(New in version v2.20) defaults to False, indicates whether this is an unsupervised project.
- relationships_configuration_idstr, optional
(New in version v2.21) id of the relationships configuration to use
- query_generator_id: str, optional
(New in version v2.27) id of the query generator applied for time series data prep
- segmentationdict, optional
information on the segmentation options for segmented project
- partitioning_methodPartitioningMethod, optional
(New in version v3.0) The partitioning class for this project. This attribute should only be used with newly-created projects and before calling Project.analyze_and_model(). After the project has been aimed, see Project.partition for actual partitioning options.
- catalog_idstr
(New in version v3.0) ID of the dataset used during creation of the project.
- catalog_version_idstr
(New in version v3.0) The object ID of the
catalog_version
which the project’s dataset belongs to.- use_gpu: bool
(New in version v3.2) Whether project allows usage of GPUs
- set_options(options=None, **kwargs)¶
Update the advanced options of this project.
Either accepts an AdvancedOptions object or individual keyword arguments. This is an inplace update.
- Raises
- ValueError
Raised if an object passed to the
options
parameter is not anAdvancedOptions
instance, a valid keyword argument from theAdvancedOptions
class, or a combination of anAdvancedOptions
instance AND keyword arguments.
- Return type
None
- get_options()¶
Return the stored advanced options for this project.
- Returns
- AdvancedOptions
- Return type
- classmethod get(project_id)¶
Gets information about a project.
- Parameters
- project_idstr
The identifier of the project you want to load.
- Returns
- projectProject
The queried project
Examples
import datarobot as dr p = dr.Project.get(project_id='54e639a18bd88f08078ca831') p.id >>>'54e639a18bd88f08078ca831' p.project_name >>>'Some project name'
- Return type
TypeVar
(TProject
, bound=Project
)
- classmethod create(cls, sourcedata, project_name='Untitled Project', max_wait=600, read_timeout=600, dataset_filename=None, *, use_case=None)¶
Creates a project with provided data.
Project creation is asynchronous process, which means that after initial request we will keep polling status of async process that is responsible for project creation until it’s finished. For SDK users this only means that this method might raise exceptions related to it’s async nature.
- Parameters
- sourcedatabasestring, file, pathlib.Path or pandas.DataFrame
Dataset to use for the project. If string can be either a path to a local file, url to publicly available file or raw file content. If using a file, the filename must consist of ASCII characters only.
- project_namestr, unicode, optional
The name to assign to the empty project.
- max_waitint, optional
Time in seconds after which project creation is considered unsuccessful
- read_timeout: int
The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete
- dataset_filenamestring or None, optional
(New in version v2.14) File name to use for dataset. Ignored for url and file path sources.
- use_case: UseCase | string, optional
A single UseCase object or ID to add this new Project to. Must be a kwarg.
- Returns
- projectProject
Instance with initialized data.
- Raises
- InputNotUnderstoodError
Raised if sourcedata isn’t one of supported types.
- AsyncFailureError
Polling for status of async process resulted in response with unsupported status code. Beginning in version 2.1, this will be ProjectAsyncFailureError, a subclass of AsyncFailureError
- AsyncProcessUnsuccessfulError
Raised if project creation was unsuccessful
- AsyncTimeoutError
Raised if project creation took more time, than specified by
max_wait
parameter
Examples
p = Project.create('/home/datasets/somedataset.csv', project_name="New API project") p.id >>> '5921731dkqshda8yd28h' p.project_name >>> 'New API project'
- Return type
TypeVar
(TProject
, bound=Project
)
- classmethod encrypted_string(plaintext)¶
Sends a string to DataRobot to be encrypted
This is used for passwords that DataRobot uses to access external data sources
- Parameters
- plaintextstr
The string to encrypt
- Returns
- ciphertextstr
The encrypted string
- Return type
str
- classmethod create_from_hdfs(cls, url, port=None, project_name=None, max_wait=600)¶
Create a project from a datasource on a WebHDFS server.
- Parameters
- urlstr
The location of the WebHDFS file, both server and full path. Per the DataRobot specification, must begin with hdfs://, e.g. hdfs:///tmp/10kDiabetes.csv
- portint, optional
The port to use. If not specified, will default to the server default (50070)
- project_namestr, optional
A name to give to the project
- max_waitint
The maximum number of seconds to wait before giving up.
- Returns
- Project
Examples
p = Project.create_from_hdfs('hdfs:///tmp/somedataset.csv', project_name="New API project") p.id >>> '5921731dkqshda8yd28h' p.project_name >>> 'New API project'
- classmethod create_from_data_source(cls, data_source_id, username=None, password=None, credential_id=None, use_kerberos=None, credential_data=None, project_name=None, max_wait=600, *, use_case=None)¶
Create a project from a data source. Either data_source or data_source_id should be specified.
- Parameters
- data_source_idstr
the identifier of the data source.
- usernamestr, optional
The username for database authentication. If supplied
password
must also be supplied.- passwordstr, optional
The password for database authentication. The password is encrypted at server side and never saved / stored. If supplied
username
must also be supplied.- credential_id: str, optional
The ID of the set of credentials to use instead of user and password. Note that with this change, username and password will become optional.
- use_kerberos: bool, optional
Server default is False. If true, use kerberos authentication for database authentication.
- credential_data: dict, optional
The credentials to authenticate with the database, to use instead of user/password or credential ID.
- project_namestr, optional
optional, a name to give to the project.
- max_waitint
optional, the maximum number of seconds to wait before giving up.
- use_case: UseCase | string, optional
A single UseCase object or ID to add this new Project to. Must be a kwarg.
- Returns
- Project
- Raises
- InvalidUsageError
Raised if either
username
orpassword
is passed without the other.
- classmethod create_from_dataset(cls, dataset_id, dataset_version_id=None, project_name=None, user=None, password=None, credential_id=None, use_kerberos=None, credential_data=None, max_wait=600, *, use_case=None)¶
Create a Project from a
datarobot.models.Dataset
- Parameters
- dataset_id: string
The ID of the dataset entry to user for the project’s Dataset
- dataset_version_id: string, optional
The ID of the dataset version to use for the project dataset. If not specified - uses latest version associated with dataset_id
- project_name: string, optional
The name of the project to be created. If not specified, will be “Untitled Project” for database connections, otherwise the project name will be based on the file used.
- user: string, optional
The username for database authentication.
- password: string, optional
The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored
- credential_id: string, optional
The ID of the set of credentials to use instead of user and password.
- use_kerberos: bool, optional
Server default is False. If true, use kerberos authentication for database authentication.
- credential_data: dict, optional
The credentials to authenticate with the database, to use instead of user/password or credential ID.
- max_wait: int
optional, the maximum number of seconds to wait before giving up.
- use_case: UseCase | string, optional
A single UseCase object or ID to add this new Project to. Must be a kwarg.
- Returns
- Project
- Return type
TypeVar
(TProject
, bound=Project
)
- classmethod create_segmented_project_from_clustering_model(cls, clustering_project_id, clustering_model_id, target, max_wait=600, *, use_case=None)¶
Create a new segmented project from a clustering model
- Parameters
- clustering_project_idstr
The identifier of the clustering project you want to use as the base.
- clustering_model_idstr
The identifier of the clustering model you want to use as the segmentation method.
- targetstr
The name of the target column that will be used from the clustering project.
- max_wait: int
optional, the maximum number of seconds to wait before giving up.
- use_case: UseCase | string, optional
A single UseCase object or ID to add this new Project to. Must be a kwarg.
- Returns
- projectProject
The created project
- Return type
TypeVar
(TProject
, bound=Project
)
- classmethod from_async(async_location, max_wait=600)¶
Given a temporary async status location poll for no more than max_wait seconds until the async process (project creation or setting the target, for example) finishes successfully, then return the ready project
- Parameters
- async_locationstr
The URL for the temporary async status resource. This is returned as a header in the response to a request that initiates an async process
- max_waitint
The maximum number of seconds to wait before giving up.
- Returns
- projectProject
The project, now ready
- Raises
- ProjectAsyncFailureError
If the server returned an unexpected response while polling for the asynchronous operation to resolve
- AsyncProcessUnsuccessfulError
If the final result of the asynchronous operation was a failure
- AsyncTimeoutError
If the asynchronous operation did not resolve within the time specified
- Return type
TypeVar
(TProject
, bound=Project
)
- classmethod start(cls, sourcedata, target=None, project_name='Untitled Project', worker_count=None, metric=None, autopilot_on=True, blueprint_threshold=None, response_cap=None, partitioning_method=None, positive_class=None, target_type=None, unsupervised_mode=False, blend_best_models=None, prepare_model_for_deployment=None, consider_blenders_in_recommendation=None, scoring_code_only=None, min_secondary_validation_model_count=None, shap_only_mode=None, relationships_configuration_id=None, autopilot_with_feature_discovery=None, feature_discovery_supervised_feature_reduction=None, unsupervised_type=None, autopilot_cluster_list=None, bias_mitigation_feature_name=None, bias_mitigation_technique=None, include_bias_mitigation_feature_as_predictor_variable=None, incremental_learning_only_mode=None, incremental_learning_on_best_model=None, *, use_case=None)¶
Chain together project creation, file upload, and target selection.
Note
While this function provides a simple means to get started, it does not expose all possible parameters. For advanced usage, using
create
,set_advanced_options
andanalyze_and_model
directly is recommended.- Parameters
- sourcedatastr or pandas.DataFrame
The path to the file to upload. Can be either a path to a local file or a publicly accessible URL (starting with
http://
,https://
,file://
, ors3://
). If the source is a DataFrame, it will be serialized to a temporary buffer. If using a file, the filename must consist of ASCII characters only.- targetstr, optional
The name of the target column in the uploaded file. Should not be provided if
unsupervised_mode
isTrue
.- project_namestr
The project name.
- Returns
- projectProject
The newly created and initialized project.
- Other Parameters
- worker_countint, optional
The number of workers that you want to allocate to this project.
- metricstr, optional
The name of metric to use.
- autopilot_onboolean, default
True
Whether or not to begin modeling automatically.
- blueprint_thresholdint, optional
Number of hours the model is permitted to run. Minimum 1
- response_capfloat, optional
Quantile of the response distribution to use for response capping Must be in range 0.5 .. 1.0
- partitioning_methodPartitioningMethod object, optional
Instance of one of the Partition Classes defined in
datarobot.helpers.partitioning_methods
. As an alternative, useProject.set_partitioning_method
orProject.set_datetime_partitioning
to set the partitioning for the project.- positive_classstr, float, or int; optional
Specifies a level of the target column that should be treated as the positive class for binary classification. May only be specified for binary classification targets.
- target_typestr, optional
Override the automatically selected target_type. An example usage would be setting the target_type=’Multiclass’ when you want to preform a multiclass classification task on a numeric column that has a low cardinality. You can use
TARGET_TYPE
enum.- unsupervised_modeboolean, default
False
Specifies whether to create an unsupervised project.
- blend_best_models: bool, optional
blend best models during Autopilot run
- scoring_code_only: bool, optional
Keep only models that can be converted to scorable java code during Autopilot run.
- shap_only_mode: bool, optional
Keep only models that support SHAP values during Autopilot run. Use SHAP-based insights wherever possible. Defaults to False.
- prepare_model_for_deployment: bool, optional
Prepare model for deployment during Autopilot run. The preparation includes creating reduced feature list models, retraining best model on higher sample size, computing insights and assigning “RECOMMENDED FOR DEPLOYMENT” label.
- consider_blenders_in_recommendation: bool, optional
Include blenders when selecting a model to prepare for deployment in an Autopilot Run. Defaults to False.
- min_secondary_validation_model_count: int, optional
Compute “All backtest” scores (datetime models) or cross validation scores for the specified number of highest ranking models on the Leaderboard, if over the Autopilot default.
- relationships_configuration_idstr, optional
(New in version v2.23) id of the relationships configuration to use
- autopilot_with_feature_discovery: bool, optional.
(New in version v2.23) If true, autopilot will run on a feature list that includes features found via search for interactions.
- feature_discovery_supervised_feature_reduction: bool, optional
(New in version v2.23) Run supervised feature reduction for feature discovery projects.
- unsupervised_typeUnsupervisedTypeEnum, optional
(New in version v2.27) Specifies whether an unsupervised project is anomaly detection or clustering.
- autopilot_cluster_listlist(int), optional
(New in version v2.27) Specifies the list of clusters to build for each model during Autopilot. Specifying multiple values in a list will build models with each number of clusters for the Leaderboard.
- bias_mitigation_feature_namestr, optional
The feature from protected features that will be used in a bias mitigation task to mitigate bias
- bias_mitigation_techniquestr, optional
One of datarobot.enums.BiasMitigationTechnique Options: - ‘preprocessingReweighing’ - ‘postProcessingRejectionOptionBasedClassification’ The technique by which we’ll mitigate bias, which will inform which bias mitigation task we insert into blueprints
- include_bias_mitigation_feature_as_predictor_variablebool, optional
Whether we should also use the mitigation feature as in input to the modeler just like any other categorical used for training, i.e. do we want the model to “train on” this feature in addition to using it for bias mitigation
- use_case: UseCase | string, optional
A single UseCase object or ID to add this new Project to. Must be a kwarg.
- Raises
- AsyncFailureError
Polling for status of async process resulted in response with unsupported status code
- AsyncProcessUnsuccessfulError
Raised if project creation or target setting was unsuccessful
- AsyncTimeoutError
Raised if project creation or target setting timed out
Examples
Project.start("./tests/fixtures/file.csv", "a_target", project_name="test_name", worker_count=4, metric="a_metric")
This is an example of using a URL to specify the datasource:
Project.start("https://example.com/data/file.csv", "a_target", project_name="test_name", worker_count=4, metric="a_metric")
- Return type
TypeVar
(TProject
, bound=Project
)
- classmethod list(search_params=None, use_cases=None, offset=None, limit=None)¶
Returns the projects associated with this account.
- Parameters
- search_paramsdict, optional.
If not None, the returned projects are filtered by lookup. Currently you can query projects by:
project_name
- use_casesUnion[UseCase, List[UseCase], str, List[str]], optional.
If not None, the returned projects are filtered to those associated with a specific Use Case or Use Cases. Accepts either the entity or the ID.
- offsetint, optional
If provided, specifies the number of results to skip.
- limitint, optional
If provided, specifies the maximum number of results to return. If not provided, returns a maximum of 1000 results.
- Returns
- projectslist of Project instances
Contains a list of projects associated with this user account.
- Raises
- TypeError
Raised if
search_params
parameter is provided, but is not of supported type.
Examples
List all projects .. code-block:: python
p_list = Project.list() p_list >>> [Project(‘Project One’), Project(‘Two’)]
Search for projects by name .. code-block:: python
Project.list(search_params={‘project_name’: ‘red’}) >>> [Project(‘Prediction Time’), Project(‘Fred Project’)]
List 2nd and 3rd projects .. code-block:: python
Project.list(offset=1, limit=2) >>> [Project(‘Project 2’), Project(‘Project 3’)]
- Return type
List
[Project
]
- refresh()¶
Fetches the latest state of the project, and updates this object with that information. This is an in place update, not a new object.
- Returns
- selfProject
the now-updated project
- Return type
None
- delete()¶
Removes this project from your account.
- Return type
None
- analyze_and_model(target=None, mode='quick', metric=None, worker_count=None, positive_class=None, partitioning_method=None, featurelist_id=None, advanced_options=None, max_wait=600, target_type=None, credentials=None, feature_engineering_prediction_point=None, unsupervised_mode=False, relationships_configuration_id=None, class_mapping_aggregation_settings=None, segmentation_task_id=None, unsupervised_type=None, autopilot_cluster_list=None, use_gpu=None)¶
Set target variable of an existing project and begin the autopilot process or send data to DataRobot for feature analysis only if manual mode is specified.
Any options saved using
set_options
will be used if nothing is passed toadvanced_options
. However, saved options will be ignored ifadvanced_options
are passed.Target setting is an asynchronous process, which means that after initial request we will keep polling status of async process that is responsible for target setting until it’s finished. For SDK users this only means that this method might raise exceptions related to it’s async nature.
When execution returns to the caller, the autopilot process will already have commenced (again, unless manual mode is specified).
- Parameters
- targetstr, optional
The name of the target column in the uploaded file. Should not be provided if
unsupervised_mode
isTrue
.- modestr, optional
You can use
AUTOPILOT_MODE
enum to choose betweenAUTOPILOT_MODE.FULL_AUTO
AUTOPILOT_MODE.MANUAL
AUTOPILOT_MODE.QUICK
AUTOPILOT_MODE.COMPREHENSIVE
: Runs all blueprints in the repository (warning: this may be extremely slow).
If unspecified,
QUICK
is used. If theMANUAL
value is used, the model creation process will need to be started by executing thestart_autopilot
function with the desired featurelist. It will start immediately otherwise.- metricstr, optional
Name of the metric to use for evaluating models. You can query the metrics available for the target by way of
Project.get_metrics
. If none is specified, then the default recommended by DataRobot is used.- worker_countint, optional
The number of concurrent workers to request for this project. If None, then the default is used. (New in version v2.14) Setting this to -1 will request the maximum number available to your account.
- partitioning_methodPartitioningMethod object, optional
Instance of one of the Partition Classes defined in
datarobot.helpers.partitioning_methods
. As an alternative, useProject.set_partitioning_method
orProject.set_datetime_partitioning
to set the partitioning for the project.- positive_classstr, float, or int; optional
Specifies a level of the target column that should be treated as the positive class for binary classification. May only be specified for binary classification targets.
- featurelist_idstr, optional
Specifies which feature list to use.
- advanced_optionsAdvancedOptions, optional
Used to set advanced options of project creation. Will override any options saved using
set_options
.- max_waitint, optional
Time in seconds after which target setting is considered unsuccessful.
- target_typestr, optional
Override the automatically selected target_type. An example usage would be setting the target_type=’Multiclass’ when you want to preform a multiclass classification task on a numeric column that has a low cardinality. You can use
TARGET_TYPE
enum.- credentials: list, optional,
a list of credentials for the datasets used in relationship configuration (previously graphs).
- feature_engineering_prediction_pointstr, optional
additional aim parameter.
- unsupervised_modeboolean, default
False
(New in version v2.20) Specifies whether to create an unsupervised project. If
True
,target
may not be provided.- relationships_configuration_idstr, optional
(New in version v2.21) ID of the relationships configuration to use.
- segmentation_task_idstr or SegmentationTask, optional
(New in version v2.28) The segmentation task that should be used to split the project for segmented modeling.
- unsupervised_typeUnsupervisedTypeEnum, optional
(New in version v2.27) Specifies whether an unsupervised project is anomaly detection or clustering.
- autopilot_cluster_listlist(int), optional
(New in version v2.27) Specifies the list of clusters to build for each model during Autopilot. Specifying multiple values in a list will build models with each number of clusters for the Leaderboard.
- use_gpubool, optional
(New in version v3.2) Specifies whether project should use GPUs
- Returns
- projectProject
The instance with updated attributes.
- Raises
- AsyncFailureError
Polling for status of async process resulted in response with unsupported status code
- AsyncProcessUnsuccessfulError
Raised if target setting was unsuccessful
- AsyncTimeoutError
Raised if target setting took more time, than specified by
max_wait
parameter- TypeError
Raised if
advanced_options
,partitioning_method
ortarget_type
is provided, but is not of supported type
See also
datarobot.models.Project.start
combines project creation, file upload, and target selection. Provides fewer options, but is useful for getting started quickly.
- set_target(target=None, mode='quick', metric=None, worker_count=None, positive_class=None, partitioning_method=None, featurelist_id=None, advanced_options=None, max_wait=600, target_type=None, credentials=None, feature_engineering_prediction_point=None, unsupervised_mode=False, relationships_configuration_id=None, class_mapping_aggregation_settings=None, segmentation_task_id=None, unsupervised_type=None, autopilot_cluster_list=None)¶
Set target variable of an existing project and begin the Autopilot process (unless manual mode is specified).
Target setting is an asynchronous process, which means that after initial request DataRobot keeps polling status of an async process that is responsible for target setting until it’s finished. For SDK users, this method might raise exceptions related to its async nature.
When execution returns to the caller, the Autopilot process will already have commenced (again, unless manual mode is specified).
- Parameters
- targetstr, optional
The name of the target column in the uploaded file. Should not be provided if
unsupervised_mode
isTrue
.- modestr, optional
You can use
AUTOPILOT_MODE
enum to choose betweenAUTOPILOT_MODE.FULL_AUTO
AUTOPILOT_MODE.MANUAL
AUTOPILOT_MODE.QUICK
AUTOPILOT_MODE.COMPREHENSIVE
: Runs all blueprints in the repository (warning: this may be extremely slow).
If unspecified,
QUICK
mode is used. If theMANUAL
value is used, the model creation process needs to be started by executing thestart_autopilot
function with the desired feature list. It will start immediately otherwise.- metricstr, optional
Name of the metric to use for evaluating models. You can query the metrics available for the target by way of
Project.get_metrics
. If none is specified, then the default recommended by DataRobot is used.- worker_countint, optional
The number of concurrent workers to request for this project. If None, then the default is used. (New in version v2.14) Setting this to -1 will request the maximum number available to your account.
- positive_classstr, float, or int; optional
Specifies a level of the target column that should be treated as the positive class for binary classification. May only be specified for binary classification targets.
- partitioning_methodPartitioningMethod object, optional
Instance of one of the Partition Classes defined in
datarobot.helpers.partitioning_methods
. As an alternative, useProject.set_partitioning_method
orProject.set_datetime_partitioning
to set the partitioning for the project.- featurelist_idstr, optional
Specifies which feature list to use.
- advanced_optionsAdvancedOptions, optional
Used to set advanced options of project creation.
- max_waitint, optional
Time in seconds after which target setting is considered unsuccessful.
- target_typestr, optional
Override the automatically selected target_type. An example usage would be setting the target_type=Multiclass’ when you want to preform a multiclass classification task on a numeric column that has a low cardinality. You can use ``TARGET_TYPE` enum.
- credentials: list, optional,
A list of credentials for the datasets used in relationship configuration (previously graphs).
- feature_engineering_prediction_pointstr, optional
For time-aware Feature Engineering, this parameter specifies the column from the primary dataset to use as the prediction point.
- unsupervised_modeboolean, default
False
(New in version v2.20) Specifies whether to create an unsupervised project. If
True
,target
may not be provided.- relationships_configuration_idstr, optional
(New in version v2.21) ID of the relationships configuration to use.
- class_mapping_aggregation_settingsClassMappingAggregationSettings, optional
Instance of
datarobot.helpers.ClassMappingAggregationSettings
- segmentation_task_idstr or SegmentationTask, optional
(New in version v2.28) The segmentation task that should be used to split the project for segmented modeling.
- unsupervised_typeUnsupervisedTypeEnum, optional
(New in version v2.27) Specifies whether an unsupervised project is anomaly detection or clustering.
- autopilot_cluster_listlist(int), optional
(New in version v2.27) Specifies the list of clusters to build for each model during Autopilot. Specifying multiple values in a list will build models with each number of clusters for the Leaderboard.
- Returns
- projectProject
The instance with updated attributes.
- Raises
- AsyncFailureError
Polling for status of async process resulted in response with unsupported status code.
- AsyncProcessUnsuccessfulError
Raised if target setting was unsuccessful.
- AsyncTimeoutError
Raised if target setting took more time, than specified by
max_wait
parameter.- TypeError
Raised if
advanced_options
,partitioning_method
ortarget_type
is provided, but is not of supported type.
See also
datarobot.models.Project.start
Combines project creation, file upload, and target selection. Provides fewer options, but is useful for getting started quickly.
datarobot.models.Project.analyze_and_model
the method replacing
set_target
after it is removed.
- get_model_records(sort_by_partition='validation', sort_by_metric=None, with_metric=None, search_term=None, featurelists=None, families=None, blueprints=None, labels=None, characteristics=None, training_filters=None, number_of_clusters=None, limit=100, offset=0)¶
Retrieve paginated model records, sorted by scores, with optional filtering.
- Parameters
- sort_by_partition: str, one of `validation`, `backtesting`, `crossValidation` or `holdout`
Set the partition to use for sorted (by score) list of models. validation is the default.
- sort_by_metric: str
- Set the project metric to use for model sorting. DataRobot-selected project optimization metric
is the default.
- with_metric: str
For a single-metric list of results, specify that project metric.
- search_term: str
If specified, only models containing the term in their name or processes are returned.
- featurelists: list of str
If specified, only models trained on selected featurelists are returned.
- families: list of str
If specified, only models belonging to selected families are returned.
- blueprints: list of str
If specified, only models trained on specified blueprint IDs are returned.
- labels: list of str, `starred` or `prepared for deployment`
If specified, only models tagged with all listed labels are returned.
- characteristics: list of str
If specified, only models matching all listed characteristics are returned. Possible values “frozen”,”trained on gpu”,”with exportable coefficients”,”with mono constraints”,”with rating table”, “with scoring code”,”new series optimized”
- training_filters: list of str
If specified, only models matching at least one of the listed training conditions are returned. The following formats are supported for autoML and datetime partitioned projects: - number of rows in training subset For datetime partitioned projects: - <training duration>, example P6Y0M0D - <training_duration>-<time_window_sample_percent>-<sampling_method> Example: P6Y0M0D-78-Random, (returns models trained on 6 years of data, sampling rate 78%, random sampling). - Start/end date - Project settings
- number_of_clusters: list of int
Filter models by number of clusters. Applicable only in unsupervised clustering projects.
- limit: int
- offset: int
- Returns
- generic_models: list of GenericModel
- Return type
List
[GenericModel
]
- get_models(order_by=None, search_params=None, with_metric=None, use_new_models_retrieval=False)¶
List all completed, successful models in the leaderboard for the given project.
- Parameters
- order_bystr or list of strings, optional
If not None, the returned models are ordered by this attribute. If None, the default return is the order of default project metric.
Allowed attributes to sort by are:
metric
sample_pct
If the sort attribute is preceded by a hyphen, models will be sorted in descending order, otherwise in ascending order.
Multiple sort attributes can be included as a comma-delimited string or in a list e.g. order_by=`sample_pct,-metric` or order_by=[sample_pct, -metric]
Using metric to sort by will result in models being sorted according to their validation score by how well they did according to the project metric.
- search_paramsdict, optional.
If not None, the returned models are filtered by lookup. Currently you can query models by:
name
sample_pct
is_starred
- with_metricstr, optional.
If not None, the returned models will only have scores for this metric. Otherwise all the metrics are returned.
- use_new_models_retrieval: bool, False by default
If true, new retrieval route is used, which supports filtering and returns fewer attributes per individual model. Following attributes are absent and could be retrieved from the blueprint level: monotonic_increasing_featurelist_id, monotonic_decreasing_featurelist_id, supports_composable_ml and supports_monotonic_constraints. Following attributes are absent and could be retrieved from the individual model level: has_empty_clusters, is_n_clusters_dynamically_determined, prediction_threshold and prediction_threshold_read_only. Attribute n_clusters in Model is renamed to number_of_clusters in GenericModel and is returned for unsupervised clustering models.
- Returns
- modelsa list of Model or a list of GenericModel if use_new_models_retrieval is True.
All models trained in the project.
- Raises
- TypeError
Raised if
order_by
orsearch_params
parameter is provided, but is not of supported type.
Examples
Project.get('pid').get_models(order_by=['-sample_pct', 'metric']) # Getting models that contain "Ridge" in name Project.get('pid').get_models( search_params={ 'name': "Ridge" }) # Filtering models based on 'starred' flag: Project.get('pid').get_models(search_params={'is_starred': True})
# retrieve additional attributes for the model model_records = project.get_models(use_new_models_retrieval=True) model_record = model_records[0] blueprint_id = model_record.blueprint_id blueprint = dr.Blueprint.get(project.id, blueprint_id) model_record.number_of_clusters blueprint.supports_composable_ml blueprint.supports_monotonic_constraints blueprint.monotonic_decreasing_featurelist_id blueprint.monotonic_increasing_featurelist_id model = dr.Model.get(project.id, model_record.id) model.prediction_threshold model.prediction_threshold_read_only model.has_empty_clusters model.is_n_clusters_dynamically_determined
- Return type
Union
[List
[Model
],List
[GenericModel
]]
- recommended_model()¶
Returns the default recommended model, or None if there is no default recommended model.
- Returns
- recommended_modelModel or None
The default recommended model.
- Return type
Optional
[Model
]
- get_top_model(metric=None)¶
Obtain the top ranked model for a given metric/ If no metric is passed in, it uses the project’s default metric. Models that display score of N/A in the UI are not included in the ranking (see https://docs.datarobot.com/en/docs/modeling/reference/model-detail/leaderboard-ref.html#na-scores).
- Parameters
- metricstr, optional
Metric to sort models
- Returns
- modelModel
The top model
- Raises
- ValueError
Raised if the project is unsupervised. Raised if the project has no target set. Raised if no metric was passed or the project has no metric. Raised if the metric passed is not used by the models on the leaderboard.
Examples
from datarobot.models.project import Project project = Project.get("<MY_PROJECT_ID>") top_model = project.get_top_model()
- Return type
- get_datetime_models()¶
List all models in the project as DatetimeModels
Requires the project to be datetime partitioned. If it is not, a ClientError will occur.
- Returns
- modelslist of DatetimeModel
the datetime models
- Return type
List
[DatetimeModel
]
- get_prime_models()¶
List all DataRobot Prime models for the project Prime models were created to approximate a parent model, and have downloadable code.
- Returns
- modelslist of PrimeModel
- Return type
List
[PrimeModel
]
- get_prime_files(parent_model_id=None, model_id=None)¶
List all downloadable code files from DataRobot Prime for the project
- Parameters
- parent_model_idstr, optional
Filter for only those prime files approximating this parent model
- model_idstr, optional
Filter for only those prime files with code for this prime model
- Returns
- files: list of PrimeFile
- get_dataset()¶
Retrieve the dataset used to create a project.
- Returns
- Dataset
Dataset used for creation of project or None if no
catalog_id
present.
Examples
from datarobot.models.project import Project project = Project.get("<MY_PROJECT_ID>") dataset = project.get_dataset()
- Return type
Optional
[Dataset
]
- get_datasets()¶
List all the datasets that have been uploaded for predictions
- Returns
- datasetslist of PredictionDataset instances
- Return type
List
[PredictionDataset
]
- upload_dataset(sourcedata, max_wait=600, read_timeout=600, forecast_point=None, predictions_start_date=None, predictions_end_date=None, dataset_filename=None, relax_known_in_advance_features_check=None, credentials=None, actual_value_column=None, secondary_datasets_config_id=None)¶
Upload a new dataset to make predictions against
- Parameters
- sourcedatastr, file or pandas.DataFrame
Data to be used for predictions. If string, can be either a path to a local file, a publicly accessible URL (starting with
http://
,https://
,file://
), or raw file content. If using a file on disk, the filename must consist of ASCII characters only.- max_waitint, optional
The maximum number of seconds to wait for the uploaded dataset to be processed before raising an error.
- read_timeoutint, optional
The maximum number of seconds to wait for the server to respond indicating that the initial upload is complete
- forecast_pointdatetime.datetime or None, optional
(New in version v2.8) May only be specified for time series projects, otherwise the upload will be rejected. The time in the dataset relative to which predictions should be generated in a time series project. See the Time Series documentation for more information. If not provided, will default to using the latest forecast point in the dataset.
- predictions_start_datedatetime.datetime or None, optional
(New in version v2.11) May only be specified for time series projects. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Cannot be provided with theforecast_point
parameter.- predictions_end_datedatetime.datetime or None, optional
(New in version v2.11) May only be specified for time series projects. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Cannot be provided with theforecast_point
parameter.- actual_value_columnstring, optional
(New in version v2.21) Actual value column name, valid for the prediction files if the project is unsupervised and the dataset is considered as bulk predictions dataset. Cannot be provided with the
forecast_point
parameter.- dataset_filenamestring or None, optional
(New in version v2.14) File name to use for the dataset. Ignored for url and file path sources.
- relax_known_in_advance_features_checkbool, optional
(New in version v2.15) For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- credentials: list, optional, a list of credentials for the datasets used
in Feature discovery project
- secondary_datasets_config_id: string or None, optional
(New in version v2.23) The Id of the alternative secondary dataset config to use during prediction for Feature discovery project.
- Returns
- ——-
- datasetPredictionDataset
The newly uploaded dataset.
- Raises
- InputNotUnderstoodError
Raised if
sourcedata
isn’t one of supported types.- AsyncFailureError
Raised if polling for the status of an async process resulted in a response with an unsupported status code.
- AsyncProcessUnsuccessfulError
Raised if project creation was unsuccessful (i.e. the server reported an error in uploading the dataset).
- AsyncTimeoutError
Raised if processing the uploaded dataset took more time than specified by the
max_wait
parameter.- ValueError
Raised if
forecast_point
orpredictions_start_date
andpredictions_end_date
are provided, but are not of the supported type.
- Return type
- upload_dataset_from_data_source(data_source_id, username, password, max_wait=600, forecast_point=None, relax_known_in_advance_features_check=None, credentials=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, secondary_datasets_config_id=None)¶
Upload a new dataset from a data source to make predictions against
- Parameters
- data_source_idstr
The identifier of the data source.
- usernamestr
The username for database authentication.
- passwordstr
The password for database authentication. The password is encrypted at server side and never saved / stored.
- max_waitint, optional
Optional, the maximum number of seconds to wait before giving up.
- forecast_pointdatetime.datetime or None, optional
(New in version v2.8) For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- relax_known_in_advance_features_checkbool, optional
(New in version v2.15) For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- credentials: list, optional, a list of credentials for the datasets used
in Feature discovery project
- predictions_start_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_datedatetime.datetime or None, optional
(New in version v2.20) For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.- actual_value_columnstring, optional
(New in version v2.21) Actual value column name, valid for the prediction files if the project is unsupervised and the dataset is considered as bulk predictions dataset. Cannot be provided with the
forecast_point
parameter.- secondary_datasets_config_id: string or None, optional
(New in version v2.23) The Id of the alternative secondary dataset config to use during prediction for Feature discovery project.
- Returns
- ——-
- datasetPredictionDataset
the newly uploaded dataset
- Return type
- upload_dataset_from_catalog(dataset_id, credential_id=None, credential_data=None, dataset_version_id=None, max_wait=600, forecast_point=None, relax_known_in_advance_features_check=None, credentials=None, predictions_start_date=None, predictions_end_date=None, actual_value_column=None, secondary_datasets_config_id=None)¶
Upload a new dataset from a catalog dataset to make predictions against
- Parameters
- dataset_idstr
The identifier of the dataset.
- credential_idstr, optional
The credential ID of the AI Catalog dataset to upload.
- credential_dataBasicCredentialsDataDict | S3CredentialsDataDict | OAuthCredentialsDataDict, optional
Credential data of the catalog dataset to upload. credential_data can be in one of the following forms:
- Basic Credentials
- credentialTypestr
The credential type. For basic credentials, this value must be CredentialTypes.BASIC.
- userstr
The username for database authentication.
- passwordstr
The password for database authentication. The password is encrypted at rest and never saved or stored.
- S3 Credentials
- credentialTypestr
The credential type. For S3 credentials, this value must be CredentialTypes.S3.
- awsAccessKeyIdstr, optional
The S3 AWS access key ID.
- awsSecretAccessKeystr, optional
The S3 AWS secret access key.
- awsSessionTokenstr, optional
The S3 AWS session token.
- config_id: str, optional
The ID of the saved shared secure configuration. If specified, cannot include awsAccessKeyId, awsSecretAccessKey or awsSessionToken.
- OAuth Credentials
- credentialTypestr
The credential type. For OAuth credentials, this value must be CredentialTypes.OAUTH.
- oauthRefreshTokenstr
The oauth refresh token.
- oauthClientIdstr
The oauth client ID.
- oauthClientSecretstr
The oauth client secret.
- oauthAccessTokenstr
The oauth access token.
- Snowflake Key Pair Credentials
- credentialTypestr
The credential type. For Snowflake Key Pair, this value must be CredentialTypes.SNOWFLAKE_KEY_PAIR_AUTH.
- userstr, optional
The Snowflake login name.
- privateKeyStrstr, optional
The private key copied exactly from user private key file. Since it contains multiple lines, when assign to a variable, put the key string inside triple-quotes
- passphrasestr, optional
The string used to encrypt the private key.
- configIdstr, optional
The ID of the saved shared secure configuration. If specified, cannot include user, privateKeyStr or passphrase.
- Databricks Access Token Credentials
- credentialTypestr
The credential type. For a Databricks access token, this value must be CredentialTypes.DATABRICKS_ACCESS_TOKEN.
- databricksAccessTokenstr
The Databricks personal access token.
- Databricks Service Principal Credentials
- credentialTypestr
The credential type. For Databricks service principal, this value must be CredentialTypes.DATABRICKS_SERVICE_PRINCIPAL.
- clientIdstr, optional
The client ID for Databricks service principal.
- clientSecretstr, optional
The client secret for Databricks service principal.
- configIdstr, optional
The ID of the saved shared secure configuration. If specified, cannot include clientId and clientSecret.
- dataset_version_idstr, optional
The version id of the dataset to use.
- max_waitint, optional
Optional, the maximum number of seconds to wait before giving up.
- forecast_pointdatetime.datetime or None, optional
For time series projects only. This is the default point relative to which predictions will be generated, based on the forecast window of the project. See the time series prediction documentation for more information.
- relax_known_in_advance_features_checkbool, optional
For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- credentials: list[BasicCredentialsDict | CredentialIdCredentialsDict], optional
A list of credentials for the datasets used in Feature discovery project.
Items in credentials can have the following forms:
- Basic Credentials
- userstr
The username for database authentication.
- passwordstr
The password (in cleartext) for database authentication. The password will be encrypted on the server side in scope of HTTP request and never saved or stored.
- Credential ID
- credentialIdstr
The ID of the set of credentials to use instead of user and password. Note that with this change, username and password will become optional.
- predictions_start_datedatetime.datetime or None, optional
For time series projects only. The start date for bulk predictions. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_end_date
. Can’t be provided with theforecast_point
parameter.- predictions_end_datedatetime.datetime or None, optional
For time series projects only. The end date for bulk predictions, exclusive. Note that this parameter is for generating historical predictions using the training data. This parameter should be provided in conjunction with
predictions_start_date
. Can’t be provided with theforecast_point
parameter.- actual_value_columnstring, optional
Actual value column name, valid for the prediction files if the project is unsupervised and the dataset is considered as bulk predictions dataset. Cannot be provided with the
forecast_point
parameter.- secondary_datasets_config_id: string or None, optional
The Id of the alternative secondary dataset config to use during prediction for Feature discovery project.
- Returns
- ——-
- datasetPredictionDataset
the newly uploaded dataset
- Return type
- get_blueprints()¶
List all blueprints recommended for a project.
- Returns
- menulist of Blueprint instances
All blueprints in a project’s repository.
- get_features()¶
List all features for this project
- Returns
- list of Feature
all features for this project
- Return type
List
[Feature
]
- get_modeling_features(batch_size=None)¶
List all modeling features for this project
Only available once the target and partitioning settings have been set. For more information on the distinction between input and modeling features, see the time series documentation.
- Parameters
- batch_sizeint, optional
The number of features to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of features. If not specified, an appropriate default will be chosen by the server.
- Returns
- list of ModelingFeature
All modeling features in this project
- Return type
List
[ModelingFeature
]
- get_featurelists()¶
List all featurelists created for this project
- Returns
- list of Featurelist
All featurelists created for this project
- Return type
List
[Featurelist
]
- get_associations(assoc_type, metric, featurelist_id=None)¶
Get the association statistics and metadata for a project’s informative features
New in version v2.17.
- Parameters
- assoc_typestring or None
The type of association, must be either ‘association’ or ‘correlation’
- metricstring or None
The specified association metric, belongs under either association or correlation umbrella
- featurelist_idstring or None
The desired featurelist for which to get association statistics (New in version v2.19)
- Returns
- association_datadict
Pairwise metric strength data, feature clustering data, and ordering data for Feature Association Matrix visualization
- get_association_featurelists()¶
List featurelists and get feature association status for each
New in version v2.19.
- Returns
- feature_listsdict
Dict with ‘featurelists’ as key, with list of featurelists as values
- get_association_matrix_details(feature1, feature2)¶
Get a sample of the actual values used to measure the association between a pair of features
New in version v2.17.
- Parameters
- feature1str
Feature name for the first feature of interest
- feature2str
Feature name for the second feature of interest
- Returns
- dict
This data has 3 keys: chart_type, features, values, and types
- chart_typestr
Type of plotting the pair of features gets in the UI. e.g. ‘HORIZONTAL_BOX’, ‘VERTICAL_BOX’, ‘SCATTER’ or ‘CONTINGENCY’
- valueslist
A list of triplet lists e.g. {“values”: [[460.0, 428.5, 0.001], [1679.3, 259.0, 0.001], …] The first entry of each list is a value of feature1, the second entry of each list is a value of feature2, and the third is the relative frequency of the pair of datapoints in the sample.
- featureslist of str
A list of the passed features, [feature1, feature2]
- typeslist of str
A list of the passed features’ types inferred by DataRobot. e.g. [‘NUMERIC’, ‘CATEGORICAL’]
- get_modeling_featurelists(batch_size=None)¶
List all modeling featurelists created for this project
Modeling featurelists can only be created after the target and partitioning options have been set for a project. In time series projects, these are the featurelists that can be used for modeling; in other projects, they behave the same as regular featurelists.
See the time series documentation for more information.
- Parameters
- batch_sizeint, optional
The number of featurelists to retrieve in a single API call. If specified, the client may make multiple calls to retrieve the full list of features. If not specified, an appropriate default will be chosen by the server.
- Returns
- list of ModelingFeaturelist
all modeling featurelists in this project
- Return type
List
[ModelingFeaturelist
]
- get_discarded_features()¶
Retrieve discarded during feature generation features. Applicable for time series projects. Can be called at the modeling stage.
- Returns
- discarded_features_info: DiscardedFeaturesInfo
- Return type
- restore_discarded_features(features, max_wait=600)¶
Restore discarded during feature generation features. Applicable for time series projects. Can be called at the modeling stage.
- Returns
- status: FeatureRestorationStatus
information about features requested to be restored.
- Return type
- create_type_transform_feature(name, parent_name, variable_type, replacement=None, date_extraction=None, max_wait=600)¶
Create a new feature by transforming the type of an existing feature in the project
Note that only the following transformations are supported:
Text to categorical or numeric
Categorical to text or numeric
Numeric to categorical
Date to categorical or numeric
Note
Special considerations when casting numeric to categorical
There are two parameters which can be used for
variableType
to convert numeric data to categorical levels. These differ in the assumptions they make about the input data, and are very important when considering the data that will be used to make predictions. The assumptions that each makes are:categorical
: The data in the column is all integral, and there are no missing values. If either of these conditions do not hold in the training set, the transformation will be rejected. During predictions, if any of the values in the parent column are missing, the predictions will error.categoricalInt
: New in v2.6 All of the data in the column should be considered categorical in its string form when cast to an int by truncation. For example the value3
will be cast as the string3
and the value3.14
will also be cast as the string3
. Further, the value-3.6
will become the string-3
. Missing values will still be recognized as missing.
For convenience these are represented in the enum
VARIABLE_TYPE_TRANSFORM
with the namesCATEGORICAL
andCATEGORICAL_INT
.- Parameters
- namestr
The name to give to the new feature
- parent_namestr
The name of the feature to transform
- variable_typestr
The type the new column should have. See the values within
datarobot.enums.VARIABLE_TYPE_TRANSFORM
.- replacementstr or float, optional
The value that missing or unconvertable data should have
- date_extractionstr, optional
Must be specified when parent_name is a date column (and left None otherwise). Specifies which value from a date should be extracted. See the list of values in
datarobot.enums.DATE_EXTRACTION
- max_waitint, optional
The maximum amount of time to wait for DataRobot to finish processing the new column. This process can take more time with more data to process. If this operation times out, an AsyncTimeoutError will occur. DataRobot continues the processing and the new column may successfully be constructed.
- Returns
- Feature
The data of the new Feature
- Raises
- AsyncFailureError
If any of the responses from the server are unexpected
- AsyncProcessUnsuccessfulError
If the job being waited for has failed or has been cancelled
- AsyncTimeoutError
If the resource did not resolve in time
- Return type
- get_featurelist_by_name(name)¶
Creates a new featurelist
- Parameters
- namestr, optional
The name of the Project’s featurelist to get.
- Returns
- Featurelist
featurelist found by name, optional
Examples
project = Project.get('5223deadbeefdeadbeef0101') featurelist = project.get_featurelist_by_name("Raw Features")
- Return type
Optional
[Featurelist
]
- create_featurelist(name=None, features=None, starting_featurelist=None, starting_featurelist_id=None, starting_featurelist_name=None, features_to_include=None, features_to_exclude=None)¶
Creates a new featurelist
- Parameters
- namestr, optional
The name to give to this new featurelist. Names must be unique, so an error will be returned from the server if this name has already been used in this project. We dynamically create a name if none is provided.
- featureslist of str, optional
The names of the features. Each feature must exist in the project already.
- starting_featurelistFeaturelist, optional
The featurelist to use as the basis when creating a new featurelist. starting_featurelist.features will be read to get the list of features that we will manipulate.
- starting_featurelist_idstr, optional
The featurelist ID used instead of passing an object instance.
- starting_featurelist_namestr, optional
The featurelist name like “Informative Features” to find a featurelist via the API, and use to fetch features.
- features_to_includelist of str, optional
The list of the feature names to include in new featurelist. Throws an error if an item in this list is not in the featurelist that was passed, or that was retrieved from the API. If nothing is passed, all features are included from the starting featurelist.
- features_to_excludelist of str, optional
The list of the feature names to exclude in the new featurelist. Throws an error if an item in this list is not in the featurelist that was passed, also throws an error if a feature is in this list as well as features_to_include. Method cannot use both at the same time.
- Returns
- Featurelist
newly created featurelist
- Raises
- DuplicateFeaturesError
Raised if features variable contains duplicate features
- InvalidUsageError
Raised method is called with incompatible arguments
Examples
project = Project.get('5223deadbeefdeadbeef0101') flists = project.get_featurelists() # Create a new featurelist using a subset of features from an # existing featurelist flist = flists[0] features = flist.features[::2] # Half of the features new_flist = project.create_featurelist( name='Feature Subset', features=features, )
project = Project.get('5223deadbeefdeadbeef0101') # Create a new featurelist using a subset of features from an # existing featurelist by using features_to_exclude param new_flist = project.create_featurelist( name='Feature Subset of Existing Featurelist', starting_featurelist_name="Informative Features", features_to_exclude=["metformin", "weight", "age"], )
- Return type
- create_modeling_featurelist(name, features, skip_datetime_partition_column=False)¶
Create a new modeling featurelist
Modeling featurelists can only be created after the target and partitioning options have been set for a project. In time series projects, these are the featurelists that can be used for modeling; in other projects, they behave the same as regular featurelists.
See the time series documentation for more information.
- Parameters
- namestr
the name of the modeling featurelist to create. Names must be unique within the project, or the server will return an error.
- featureslist of str
the names of the features to include in the modeling featurelist. Each feature must be a modeling feature.
- skip_datetime_partition_column: boolean, optional
False by default. If True, featurelist will not contain datetime partition column. Use to create monotonic feature lists in Time Series projects. Setting makes no difference for not Time Series projects. Monotonic featurelists can not be used for modeling.
- Returns
- featurelistModelingFeaturelist
the newly created featurelist
Examples
project = Project.get('1234deadbeeffeeddead4321') modeling_features = project.get_modeling_features() selected_features = [feat.name for feat in modeling_features][:5] # select first five new_flist = project.create_modeling_featurelist('Model This', selected_features)
- Return type
- get_metrics(feature_name)¶
Get the metrics recommended for modeling on the given feature.
- Parameters
- feature_namestr
The name of the feature to query regarding which metrics are recommended for modeling.
- Returns
- feature_name: str
The name of the feature that was looked up
- available_metrics: list of str
An array of strings representing the appropriate metrics. If the feature cannot be selected as the target, then this array will be empty.
- metric_details: list of dict
The list of metricDetails objects
- metric_name: str
Name of the metric
- supports_timeseries: boolean
This metric is valid for timeseries
- supports_multiclass: boolean
This metric is valid for multiclass classification
- supports_binary: boolean
This metric is valid for binary classification
- supports_regression: boolean
This metric is valid for regression
- ascending: boolean
Should the metric be sorted in ascending order
- get_status()¶
Query the server for project status.
- Returns
- statusdict
Contains:
autopilot_done
: a boolean.stage
: a short string indicating which stage the project is in.stage_description
: a description of whatstage
means.
Examples
{"autopilot_done": False, "stage": "modeling", "stage_description": "Ready for modeling"}
- pause_autopilot()¶
Pause autopilot, which stops processing the next jobs in the queue.
- Returns
- pausedboolean
Whether the command was acknowledged
- Return type
bool
- unpause_autopilot()¶
Unpause autopilot, which restarts processing the next jobs in the queue.
- Returns
- unpausedboolean
Whether the command was acknowledged.
- Return type
bool
- start_autopilot(featurelist_id, mode='quick', blend_best_models=False, scoring_code_only=False, prepare_model_for_deployment=True, consider_blenders_in_recommendation=False, run_leakage_removed_feature_list=True, autopilot_cluster_list=None)¶
Start Autopilot on provided featurelist with the specified Autopilot settings, halting the current Autopilot run.
Only one autopilot can be running at the time. That’s why any ongoing autopilot on a different featurelist will be halted - modeling jobs in queue would not be affected but new jobs would not be added to queue by the halted autopilot.
- Parameters
- featurelist_idstr
Identifier of featurelist that should be used for autopilot
- modestr, optional
The Autopilot mode to run. You can use
AUTOPILOT_MODE
enum to choose betweenAUTOPILOT_MODE.FULL_AUTO
AUTOPILOT_MODE.QUICK
AUTOPILOT_MODE.COMPREHENSIVE
If unspecified,
AUTOPILOT_MODE.QUICK
is used.- blend_best_modelsbool, optional
Blend best models during Autopilot run. This option is not supported in SHAP-only ‘ ‘mode.
- scoring_code_onlybool, optional
Keep only models that can be converted to scorable java code during Autopilot run.
- prepare_model_for_deploymentbool, optional
Prepare model for deployment during Autopilot run. The preparation includes creating reduced feature list models, retraining best model on higher sample size, computing insights and assigning “RECOMMENDED FOR DEPLOYMENT” label.
- consider_blenders_in_recommendationbool, optional
Include blenders when selecting a model to prepare for deployment in an Autopilot Run. This option is not supported in SHAP-only mode or for multilabel projects.
- run_leakage_removed_feature_listbool, optional
Run Autopilot on Leakage Removed feature list (if exists).
- autopilot_cluster_listlist of int, optional
(New in v2.27) A list of integers, where each value will be used as the number of clusters in Autopilot model(s) for unsupervised clustering projects. Cannot be specified unless project unsupervisedMode is true and unsupervisedType is set to ‘clustering’.
- Raises
- AppPlatformError
Raised project’s target was not selected or the settings for Autopilot are invalid for the project project.
- Return type
None
- train(trainable, sample_pct=None, featurelist_id=None, source_project_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, n_clusters=None)¶
Submit a job to the queue to train a model.
Either sample_pct or training_row_count can be used to specify the amount of data to use, but not both. If neither are specified, a default of the maximum amount of data that can safely be used to train any blueprint without going into the validation data will be selected.
In smart-sampled projects, sample_pct and training_row_count are assumed to be in terms of rows of the minority class.
Note
If the project uses datetime partitioning, use
Project.train_datetime
instead.- Parameters
- trainablestr or Blueprint
For
str
, this is assumed to be a blueprint_id. If nosource_project_id
is provided, theproject_id
will be assumed to be the project that this instance represents.Otherwise, for a
Blueprint
, it contains the blueprint_id and source_project_id that we want to use.featurelist_id
will assume the default for this project if not provided, andsample_pct
will default to using the maximum training value allowed for this project’s partition setup.source_project_id
will be ignored if aBlueprint
instance is used for this parameter- sample_pctfloat, optional
The amount of data to use for training, as a percentage of the project dataset from 0 to 100.
- featurelist_idstr, optional
The identifier of the featurelist to use. If not defined, the default for this project is used.
- source_project_idstr, optional
Which project created this blueprint_id. If
None
, it defaults to looking in this project. Note that you must have read permissions in this project.- scoring_typestr, optional
Either
validation
orcrossValidation
(alsodr.SCORING_TYPE.validation
ordr.SCORING_TYPE.cross_validation
).validation
is available for every partitioning type, and indicates that the default model validation should be used for the project. If the project uses a form of cross-validation partitioning,crossValidation
can also be used to indicate that all of the available training/validation combinations should be used to evaluate the model.- training_row_countint, optional
The number of rows to use to train the requested model.
- monotonic_increasing_featurelist_idstr, optional
(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_idstr, optional
(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- n_clusters: int, optional
(new in version 2.27) Number of clusters to use in an unsupervised clustering model. This parameter is used only for unsupervised clustering models that don’t automatically determine the number of clusters.
- Returns
- model_job_idstr
id of created job, can be used as parameter to
ModelJob.get
method orwait_for_async_model_creation
function
Examples
Use a
Blueprint
instance:blueprint = project.get_blueprints()[0] model_job_id = project.train(blueprint, training_row_count=project.max_train_rows)
Use a
blueprint_id
, which is a string. In the first case, it is assumed that the blueprint was created by this project. If you are using a blueprint used by another project, you will need to pass the id of that other project as well.blueprint_id = 'e1c7fc29ba2e612a72272324b8a842af' project.train(blueprint, training_row_count=project.max_train_rows) another_project.train(blueprint, source_project_id=project.id)
You can also easily use this interface to train a new model using the data from an existing model:
model = project.get_models()[0] model_job_id = project.train(model.blueprint.id, sample_pct=100)
- train_datetime(blueprint_id, featurelist_id=None, training_row_count=None, training_duration=None, source_project_id=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)¶
Create a new model in a datetime partitioned project
If the project is not datetime partitioned, an error will occur.
All durations should be specified with a duration string such as those returned by the
partitioning_methods.construct_duration_string
helper method. Please see datetime partitioned project documentation for more information on duration strings.- Parameters
- blueprint_idstr
the blueprint to use to train the model
- featurelist_idstr, optional
the featurelist to use to train the model. If not specified, the project default will be used.
- training_row_countint, optional
the number of rows of data that should be used to train the model. If specified, neither
training_duration
noruse_project_settings
may be specified.- training_durationstr, optional
a duration string specifying what time range the data used to train the model should span. If specified, neither
training_row_count
noruse_project_settings
may be specified.- sampling_methodstr, optional
(New in version v2.23) defines the way training data is selected. Can be either
random
orlatest
. In combination withtraining_row_count
defines how rows are selected from backtest (latest
by default). When training data is defined using time range (training_duration
oruse_project_settings
) this setting changes the waytime_window_sample_pct
is applied (random
by default). Applicable to OTV projects only.- use_project_settingsbool, optional
(New in version v2.20) defaults to
False
. IfTrue
, indicates that the custom backtest partitioning settings specified by the user will be used to train the model and evaluate backtest scores. If specified, neithertraining_row_count
nortraining_duration
may be specified.- source_project_idstr, optional
the id of the project this blueprint comes from, if not this project. If left unspecified, the blueprint must belong to this project.
- monotonic_increasing_featurelist_idstr, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. Passing
None
disables increasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- monotonic_decreasing_featurelist_idstr, optional
(New in version v2.18) optional, the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. Passing
None
disables decreasing monotonicity constraint. Default (dr.enums.MONOTONICITY_FEATURELIST_DEFAULT
) is the one specified by the blueprint.- n_clustersint, optional
The number of clusters to use in the specified unsupervised clustering model. ONLY VALID IN UNSUPERVISED CLUSTERING PROJECTS
- Returns
- jobModelJob
the created job to build the model
- blend(model_ids, blender_method)¶
Submit a job for creating blender model. Upon success, the new job will be added to the end of the queue.
- Parameters
- model_idslist of str
List of model ids that will be used to create blender. These models should have completed validation stage without errors, and can’t be blenders or DataRobot Prime
- blender_methodstr
Chosen blend method, one from
datarobot.enums.BLENDER_METHOD
. If this is a time series project, only methods indatarobot.enums.TS_BLENDER_METHOD
are allowed.
- Returns
- model_jobModelJob
New
ModelJob
instance for the blender creation job in queue.
See also
datarobot.models.Project.check_blendable
to confirm if models can be blended
- Return type
- check_blendable(model_ids, blender_method)¶
Check if the specified models can be successfully blended
- Parameters
- model_idslist of str
List of model ids that will be used to create blender. These models should have completed validation stage without errors, and can’t be blenders or DataRobot Prime
- blender_methodstr
Chosen blend method, one from
datarobot.enums.BLENDER_METHOD
. If this is a time series project, only methods indatarobot.enums.TS_BLENDER_METHOD
are allowed.
- Returns
- Return type
- start_prepare_model_for_deployment(model_id)¶
Prepare a specific model for deployment.
The requested model will be trained on the maximum autopilot size then go through the recommendation stages. For datetime partitioned projects, this includes the feature impact stage, retraining on a reduced feature list, and retraining the best of the reduced feature list model and the max autopilot original model on recent data. For non-datetime partitioned projects, this includes the feature impact stage, retraining on a reduced feature list, retraining the best of the reduced feature list model and the max autopilot original model up to the holdout size, then retraining the up-to-the holdout model on the full dataset.
- Parameters
- model_idstr
The model to prepare for deployment.
- Return type
None
- get_all_jobs(status=None)¶
Get a list of jobs
This will give Jobs representing any type of job, including modeling or predict jobs.
- Parameters
- statusQUEUE_STATUS enum, optional
If called with QUEUE_STATUS.INPROGRESS, will return the jobs that are currently running.
If called with QUEUE_STATUS.QUEUE, will return the jobs that are waiting to be run.
If called with QUEUE_STATUS.ERROR, will return the jobs that have errored.
If no value is provided, will return all jobs currently running or waiting to be run.
- Returns
- jobslist
Each is an instance of Job
- Return type
List
[Job
]
- get_blenders()¶
Get a list of blender models.
- Returns
- list of BlenderModel
list of all blender models in project.
- Return type
List
[BlenderModel
]
- get_frozen_models()¶
Get a list of frozen models
- Returns
- list of FrozenModel
list of all frozen models in project.
- Return type
List
[FrozenModel
]
- get_combined_models()¶
Get a list of models in segmented project.
- Returns
- list of CombinedModel
list of all combined models in segmented project.
- Return type
List
[CombinedModel
]
- get_active_combined_model()¶
Retrieve currently active combined model in segmented project.
- Returns
- CombinedModel
currently active combined model in segmented project.
- Return type
- get_segments_models(combined_model_id=None)¶
Retrieve a list of all models belonging to the segments/child projects of the segmented project.
- Parameters
- combined_model_idstr, optional
Id of the combined model to get segments for. If there is only a single combined model it can be retrieved automatically, but this must be specified when there are > 1 combined models.
- Returns
- segments_modelslist(dict)
A list of dictionaries containing all of the segments/child projects, each with a list of their models ordered by metric from best to worst.
- Return type
List
[Dict
[str
,Any
]]
- get_model_jobs(status=None)¶
Get a list of modeling jobs
- Parameters
- statusQUEUE_STATUS enum, optional
If called with QUEUE_STATUS.INPROGRESS, will return the modeling jobs that are currently running.
If called with QUEUE_STATUS.QUEUE, will return the modeling jobs that are waiting to be run.
If called with QUEUE_STATUS.ERROR, will return the modeling jobs that have errored.
If no value is provided, will return all modeling jobs currently running or waiting to be run.
- Returns
- jobslist
Each is an instance of ModelJob
- Return type
List
[ModelJob
]
- get_predict_jobs(status=None)¶
Get a list of prediction jobs
- Parameters
- statusQUEUE_STATUS enum, optional
If called with QUEUE_STATUS.INPROGRESS, will return the prediction jobs that are currently running.
If called with QUEUE_STATUS.QUEUE, will return the prediction jobs that are waiting to be run.
If called with QUEUE_STATUS.ERROR, will return the prediction jobs that have errored.
If called without a status, will return all prediction jobs currently running or waiting to be run.
- Returns
- jobslist
Each is an instance of PredictJob
- Return type
List
[PredictJob
]
- wait_for_autopilot(check_interval=20.0, timeout=86400, verbosity=1)¶
Blocks until autopilot is finished. This will raise an exception if the autopilot mode is changed from AUTOPILOT_MODE.FULL_AUTO.
It makes API calls to sync the project state with the server and to look at which jobs are enqueued.
- Parameters
- check_intervalfloat or int
The maximum time (in seconds) to wait between checks for whether autopilot is finished
- timeoutfloat or int or None
After this long (in seconds), we give up. If None, never timeout.
- verbosity:
This should be VERBOSITY_LEVEL.SILENT or VERBOSITY_LEVEL.VERBOSE. For VERBOSITY_LEVEL.SILENT, nothing will be displayed about progress. For VERBOSITY_LEVEL.VERBOSE, the number of jobs in progress or queued is shown. Note that new jobs are added to the queue along the way.
- Raises
- AsyncTimeoutError
If autopilot does not finished in the amount of time specified
- RuntimeError
If a condition is detected that indicates that autopilot will not complete on its own
- Return type
None
- rename(project_name)¶
Update the name of the project.
- Parameters
- project_namestr
The new name
- Return type
None
- set_project_description(project_description)¶
Set or Update the project description.
- Parameters
- project_descriptionstr
The new description for this project.
- Return type
None
- unlock_holdout()¶
Unlock the holdout for this project.
This will cause subsequent queries of the models of this project to contain the metric values for the holdout set, if it exists.
Take care, as this cannot be undone. Remember that best practice is to select a model before analyzing the model performance on the holdout set
- Return type
None
- set_worker_count(worker_count)¶
Sets the number of workers allocated to this project.
Note that this value is limited to the number allowed by your account. Lowering the number will not stop currently running jobs, but will cause the queue to wait for the appropriate number of jobs to finish before attempting to run more jobs.
- Parameters
- worker_countint
The number of concurrent workers to request from the pool of workers. (New in version v2.14) Setting this to -1 will update the number of workers to the maximum available to your account.
- Return type
None
- set_advanced_options(advanced_options=None, **kwargs)¶
Update the advanced options of this project.
Note
project options will not be stored at the database level, so the options set via this method will only be attached to a project instance for the lifetime of a client session (if you quit your session and reopen a new one before running autopilot, the advanced options will be lost).
Either accepts an AdvancedOptions object to replace all advanced options or individual keyword arguments. This is an inplace update, not a new object. The options set will only remain for the life of this project instance within a given session.
- Parameters
- advanced_optionsAdvancedOptions, optional
AdvancedOptions instance as an alternative to passing individual parameters.
- weightsstring, optional
The name of a column indicating the weight of each row
- response_capfloat in [0.5, 1), optional
Quantile of the response distribution to use for response capping.
- blueprint_thresholdint, optional
Number of hours models are permitted to run before being excluded from later autopilot stages Minimum 1
- seedint, optional
a seed to use for randomization
- smart_downsampledbool, optional
whether to use smart downsampling to throw away excess rows of the majority class. Only applicable to classification and zero-boosted regression projects.
- majority_downsampling_ratefloat, optional
The percentage between 0 and 100 of the majority rows that should be kept. Specify only if using smart downsampling. May not cause the majority class to become smaller than the minority class.
- offsetlist of str, optional
(New in version v2.6) the list of the names of the columns containing the offset of each row
- exposurestring, optional
(New in version v2.6) the name of a column containing the exposure of each row
- accuracy_optimized_mbbool, optional
(New in version v2.6) Include additional, longer-running models that will be run by the autopilot and available to run manually.
- events_countstring, optional
(New in version v2.8) the name of a column specifying events count.
- monotonic_increasing_featurelist_idstring, optional
(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overridden at model submission time if desired.
- monotonic_decreasing_featurelist_idstring, optional
(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overridden at model submission time if desired.
- only_include_monotonic_blueprintsbool, optional
(new in version 2.11) when true, only blueprints that support enforcing monotonic constraints will be available in the project or selected for the autopilot.
- allowed_pairwise_interaction_groupslist of tuple, optional
(New in version v2.19) For GA2M models - specify groups of columns for which pairwise interactions will be allowed. E.g. if set to [(A, B, C), (C, D)] then GA2M models will allow interactions between columns A x B, B x C, A x C, C x D. All others (A x D, B x D) will not be considered.
- blend_best_models: bool, optional
(New in version v2.19) blend best models during Autopilot run
- scoring_code_only: bool, optional
(New in version v2.19) Keep only models that can be converted to scorable java code during Autopilot run
- shap_only_mode: bool, optional
(New in version v2.21) Keep only models that support SHAP values during Autopilot run. Use SHAP-based insights wherever possible. Defaults to False.
- prepare_model_for_deployment: bool, optional
(New in version v2.19) Prepare model for deployment during Autopilot run. The preparation includes creating reduced feature list models, retraining best model on higher sample size, computing insights and assigning “RECOMMENDED FOR DEPLOYMENT” label.
- consider_blenders_in_recommendation: bool, optional
(New in version 2.22.0) Include blenders when selecting a model to prepare for deployment in an Autopilot Run. Defaults to False.
- min_secondary_validation_model_count: int, optional
(New in version v2.19) Compute “All backtest” scores (datetime models) or cross validation scores for the specified number of highest ranking models on the Leaderboard, if over the Autopilot default.
- autopilot_data_sampling_method: str, optional
(New in version v2.23) one of
datarobot.enums.DATETIME_AUTOPILOT_DATA_SAMPLING_METHOD
. Applicable for OTV projects only, defines if autopilot uses “random” or “latest” sampling when iteratively building models on various training samples. Defaults to “random” for duration-based projects and to “latest” for row-based projects.- run_leakage_removed_feature_list: bool, optional
(New in version v2.23) Run Autopilot on Leakage Removed feature list (if exists).
- autopilot_with_feature_discovery: bool, optional.
(New in version v2.23) If true, autopilot will run on a feature list that includes features found via search for interactions.
- feature_discovery_supervised_feature_reduction: bool, optional
(New in version v2.23) Run supervised feature reduction for feature discovery projects.
- exponentially_weighted_moving_alpha: float, optional
(New in version v2.26) defaults to None, value between 0 and 1 (inclusive), indicates alpha parameter used in exponentially weighted moving average within feature derivation window.
- external_time_series_baseline_dataset_id: str, optional.
(New in version v2.26) If provided, will generate metrics scaled by external model predictions metric for time series projects. The external predictions catalog must be validated before autopilot starts, see
Project.validate_external_time_series_baseline
and external baseline predictions documentation for further explanation.- use_supervised_feature_reduction: bool, default ``True` optional
Time Series only. When true, during feature generation DataRobot runs a supervised algorithm to retain only qualifying features. Setting to false can severely impact autopilot duration, especially for datasets with many features.
- primary_location_column: str, optional.
The name of primary location column.
- protected_features: list of str, optional.
(New in version v2.24) A list of project features to mark as protected for Bias and Fairness testing calculations. Max number of protected features allowed is 10.
- preferable_target_value: str, optional.
(New in version v2.24) A target value that should be treated as a favorable outcome for the prediction. For example, if we want to check gender discrimination for giving a loan and our target is named
is_bad
, then the positive outcome for the prediction would beNo
, which means that the loan is good and that’s what we treat as a favorable result for the loaner.- fairness_metrics_set: str, optional.
(New in version v2.24) Metric to use for calculating fairness. Can be one of
proportionalParity
,equalParity
,predictionBalance
,trueFavorableAndUnfavorableRateParity
orfavorableAndUnfavorablePredictiveValueParity
. Used and required only if Bias & Fairness in AutoML feature is enabled.- fairness_threshold: str, optional.
(New in version v2.24) Threshold value for the fairness metric. Can be in a range of
[0.0, 1.0]
. If the relative (i.e. normalized) fairness score is below the threshold, then the user will see a visual indication on the- bias_mitigation_feature_namestr, optional
The feature from protected features that will be used in a bias mitigation task to mitigate bias
- bias_mitigation_techniquestr, optional
One of datarobot.enums.BiasMitigationTechnique Options: - ‘preprocessingReweighing’ - ‘postProcessingRejectionOptionBasedClassification’ The technique by which we’ll mitigate bias, which will inform which bias mitigation task we insert into blueprints
- include_bias_mitigation_feature_as_predictor_variablebool, optional
Whether we should also use the mitigation feature as in input to the modeler just like any other categorical used for training, i.e. do we want the model to “train on” this feature in addition to using it for bias mitigation
- model_group_idstring, optional
(New in version v3.3) The name of a column containing the model group ID for each row.
- model_regime_idstring, optional
(New in version v3.3) The name of a column containing the model regime ID for each row.
- model_baselineslist of str, optional
(New in version v3.3) The list of the names of the columns containing the model baselines
- for each row.
- incremental_learning_only_modebool, optional
(New in version v3.4) Keep only models that support incremental learning during Autopilot run.
- incremental_learning_on_best_modelbool, optional
(New in version v3.4) Run incremental learning on the best model during Autopilot run.
- chunk_definition_idstring, optional
(New in version v3.4) Unique definition for chunks needed to run automated incremental learning.
- incremental_learning_early_stopping_rounds: int, optional
(New in version v3.4) Early stopping rounds used in the automated incremental learning service.
- Return type
None
- list_advanced_options()¶
View the advanced options that have been set on a project instance. Includes those that haven’t been set (with value of None).
- Returns
- dict of advanced options and their values
- Return type
Dict
[str
,Any
]
- set_partitioning_method(cv_method=None, validation_type=None, seed=0, reps=None, user_partition_col=None, training_level=None, validation_level=None, holdout_level=None, cv_holdout_level=None, validation_pct=None, holdout_pct=None, partition_key_cols=None, partitioning_method=None)¶
Configures the partitioning method for this project.
If this project does not already have a partitioning method set, creates a new configuration based on provided args.
If the partitioning_method arg is set, that configuration will instead be used.
Note
This is an inplace update, not a new object. The options set will only remain for the life of this project instance within a given session. You must still call
set_target
to make this change permanent for the project. Callingrefresh
without first callingset_target
will invalidate this configuration. Similarly, callingget
to retrieve a second copy of the project will not include this configuration.New in version v3.0.
- Parameters
- cv_method: str
The partitioning method used. Supported values can be found in
datarobot.enums.CV_METHOD
.- validation_type: str
May be “CV” (K-fold cross-validation) or “TVH” (Training, validation, and holdout).
- seedint
A seed to use for randomization.
- repsint
Number of cross validation folds to use.
- user_partition_colstr
The name of the column containing the partition assignments.
- training_levelUnion[str,int]
The value of the partition column indicating a row is part of the training set.
- validation_levelUnion[str,int]
The value of the partition column indicating a row is part of the validation set.
- holdout_levelUnion[str,int]
The value of the partition column indicating a row is part of the holdout set (use
None
if you want no holdout set).- cv_holdout_level: Union[str,int]
The value of the partition column indicating a row is part of the holdout set.
- validation_pctint
The desired percentage of dataset to assign to validation set.
- holdout_pctint
The desired percentage of dataset to assign to holdout set.
- partition_key_colslist
A list containing a single string, where the string is the name of the column whose values should remain together in partitioning.
- partitioning_methodPartitioningMethod, optional
An instance of
datarobot.helpers.partitioning_methods.PartitioningMethod
that will be used instead of creating a new instance from the other args.
- Returns
- projectProject
The instance with updated attributes.
- Raises
- TypeError
If cv_method or validation_type are not set and partitioning_method is not set.
- InvalidUsageError
If invoked after project.set_target or project.start, or if invoked with the wrong combination of args for a given partitioning method.
- Return type
- get_uri()¶
- Returns
- urlstr
Permanent static hyperlink to a project leaderboard.
- Return type
str
- get_rating_table_models()¶
Get a list of models with a rating table
- Returns
- list of RatingTableModel
list of all models with a rating table in project.
- Return type
List
[RatingTableModel
]
- get_rating_tables()¶
Get a list of rating tables
- Returns
- list of RatingTable
list of rating tables in project.
- Return type
List
[RatingTable
]
- get_access_list()¶
Retrieve users who have access to this project and their access levels
New in version v2.15.
- Returns
- list ofclass:SharingAccess <datarobot.SharingAccess>
- Return type
List
[SharingAccess
]
Modify the ability of users to access this project
New in version v2.15.
- Parameters
- access_listlist of
SharingAccess
the modifications to make.
- send_notificationboolean, default
None
(New in version v2.21) optional, whether or not an email notification should be sent, default to None
- include_feature_discovery_entitiesboolean, default
None
(New in version v2.21) optional (default: None), whether or not to share all the related entities i.e., datasets for a project with Feature Discovery enabled
- access_listlist of
- Raises
- datarobot.ClientError
if you do not have permission to share this project, if the user you’re sharing with doesn’t exist, if the same user appears multiple times in the access_list, or if these changes would leave the project without an owner
Examples
Transfer access to the project from old_user@datarobot.com to new_user@datarobot.com
import datarobot as dr new_access = dr.SharingAccess(new_user@datarobot.com, dr.enums.SHARING_ROLE.OWNER, can_share=True) access_list = [dr.SharingAccess(old_user@datarobot.com, None), new_access] dr.Project.get('my-project-id').share(access_list)
- Return type
None
- batch_features_type_transform(parent_names, variable_type, prefix=None, suffix=None, max_wait=600)¶
Create new features by transforming the type of existing ones.
New in version v2.17.
Note
The following transformations are only supported in batch mode:
Text to categorical or numeric
Categorical to text or numeric
Numeric to categorical
See here for special considerations when casting numeric to categorical. Date to categorical or numeric transformations are not currently supported for batch mode but can be performed individually using
create_type_transform_feature
.- Parameters
- parent_nameslist[str]
The list of variable names to be transformed.
- variable_typestr
The type new columns should have. Can be one of ‘categorical’, ‘categoricalInt’, ‘numeric’, and ‘text’ - supported values can be found in
datarobot.enums.VARIABLE_TYPE_TRANSFORM
.- prefixstr, optional
Note
Either
prefix
,suffix
, or both must be provided.The string that will preface all feature names. At least one of
prefix
andsuffix
must be specified.- suffixstr, optional
Note
Either
prefix
,suffix
, or both must be provided.The string that will be appended at the end to all feature names. At least one of
prefix
andsuffix
must be specified.- max_waitint, optional
The maximum amount of time to wait for DataRobot to finish processing the new column. This process can take more time with more data to process. If this operation times out, an AsyncTimeoutError will occur. DataRobot continues the processing and the new column may successfully be constructed.
- Returns
- list of Features
all features for this project after transformation.
- Raises
- TypeError:
If parent_names is not a list.
- ValueError
If value of
variable_type
is not fromdatarobot.enums.VARIABLE_TYPE_TRANSFORM
.- AsyncFailureError`
If any of the responses from the server are unexpected.
- AsyncProcessUnsuccessfulError
If the job being waited for has failed or has been cancelled.
- AsyncTimeoutError
If the resource did not resolve in time.
- Return type
List
[Feature
]
- clone_project(new_project_name=None, max_wait=600)¶
Create a fresh (post-EDA1) copy of this project that is ready for setting targets and modeling options.
- Parameters
- new_project_namestr, optional
The desired name of the new project. If omitted, the API will default to ‘Copy of <original project>’
- max_waitint, optional
Time in seconds after which project creation is considered unsuccessful
- Returns
- datarobot.models.Project
- Return type
- create_interaction_feature(name, features, separator, max_wait=600)¶
Create a new interaction feature by combining two categorical ones.
New in version v2.21.
- Parameters
- namestr
The name of final Interaction Feature
- featureslist(str)
List of two categorical feature names
- separatorstr
The character used to join the two data values, one of these ` + - / | & . _ , `
- max_waitint, optional
Time in seconds after which project creation is considered unsuccessful.
- Returns
- datarobot.models.InteractionFeature
The data of the new Interaction feature
- Raises
- ClientError
If requested Interaction feature can not be created. Possible reasons for example are:
one of features either does not exist or is of unsupported type
feature with requested name already exists
invalid separator character submitted.
- AsyncFailureError
If any of the responses from the server are unexpected
- AsyncProcessUnsuccessfulError
If the job being waited for has failed or has been cancelled
- AsyncTimeoutError
If the resource did not resolve in time
- Return type
- get_relationships_configuration()¶
Get the relationships configuration for a given project
New in version v2.21.
- Returns
- relationships_configuration: RelationshipsConfiguration
relationships configuration applied to project
- Return type
- download_feature_discovery_dataset(file_name, pred_dataset_id=None)¶
Download Feature discovery training or prediction dataset
- Parameters
- file_namestr
File path where dataset will be saved.
- pred_dataset_idstr, optional
ID of the prediction dataset
- Return type
None
- download_feature_discovery_recipe_sqls(file_name, model_id=None, max_wait=600)¶
Export and download Feature discovery recipe SQL statements .. versionadded:: v2.25
- Parameters
- file_namestr
File path where dataset will be saved.
- model_idstr, optional
ID of the model to export SQL for. If specified, QL to generate only features used by the model will be exported. If not specified, SQL to generate all features will be exported.
- max_waitint, optional
Time in seconds after which export is considered unsuccessful.
- Raises
- ClientError
If requested SQL cannot be exported. Possible reason is the feature is not available to user.
- AsyncFailureError
If any of the responses from the server are unexpected.
- AsyncProcessUnsuccessfulError
If the job being waited for has failed or has been cancelled.
- AsyncTimeoutError
If the resource did not resolve in time.
- Return type
None
- validate_external_time_series_baseline(catalog_version_id, target, datetime_partitioning, max_wait=600)¶
Validate external baseline prediction catalog.
The forecast windows settings, validation and holdout duration specified in the datetime specification must be consistent with project settings as these parameters are used to check whether the specified catalog version id has been validated or not. See external baseline predictions documentation for example usage.
- Parameters
- catalog_version_id: str
Id of the catalog version for validating external baseline predictions.
- target: str
The name of the target column.
- datetime_partitioning: DatetimePartitioning object
Instance of the DatetimePartitioning defined in
datarobot.helpers.partitioning_methods
.Attributes of the object used to check the validation are:
datetime_partition_column
forecast_window_start
forecast_window_end
holdout_start_date
holdout_end_date
backtests
multiseries_id_columns
If the above attributes are different from the project settings, the catalog version will not pass the validation check in the autopilot.
- max_wait: int, optional
The maximum number of seconds to wait for the catalog version to be validated before raising an error.
- Returns
- external_baseline_validation_info: ExternalBaselineValidationInfo
Validation result of the specified catalog version.
- Raises
- AsyncTimeoutError
Raised if the catalog version validation took more time than specified by the
max_wait
parameter.
- Return type
- download_multicategorical_data_format_errors(file_name)¶
Download multicategorical data format errors to the CSV file. If any format errors where detected in potentially multicategorical features the resulting file will contain at max 10 entries. CSV file content contains feature name, dataset index in which the error was detected, row value and type of error detected. In case that there were no errors or none of the features where potentially multicategorical the CSV file will be empty containing only the header.
- Parameters
- file_namestr
File path where CSV file will be saved.
- Return type
None
- get_multiseries_names()¶
For a multiseries timeseries project it returns all distinct entries in the multiseries column. For a non timeseries project it will just return an empty list.
- Returns
- multiseries_names: List[str]
List of all distinct entries in the multiseries column
- Return type
List
[Optional
[str
]]
- restart_segment(segment)¶
Restart single segment in a segmented project.
New in version v2.28.
Segment restart is allowed only for segments that haven’t reached modeling phase. Restart will permanently remove previous project and trigger set up of a new one for particular segment.
- Parameters
- segmentstr
Segment to restart
- get_bias_mitigated_models(parent_model_id=None, offset=0, limit=100)¶
List the child models with bias mitigation applied
New in version v2.29.
- Parameters
- parent_model_idstr, optional
Filter by parent models
- offsetint, optional
Number of items to skip.
- limitint, optional
Number of items to return.
- Returns
- modelslist of dict
- Return type
List
[Dict
[str
,Any
]]
- apply_bias_mitigation(bias_mitigation_parent_leaderboard_id, bias_mitigation_feature_name, bias_mitigation_technique, include_bias_mitigation_feature_as_predictor_variable)¶
Apply bias mitigation to an existing model by training a version of that model but with bias mitigation applied. An error will be returned if the model does not support bias mitigation with the technique requested.
New in version v2.29.
- Parameters
- bias_mitigation_parent_leaderboard_idstr
The leaderboard id of the model to apply bias mitigation to
- bias_mitigation_feature_namestr
The feature name of the protected features that will be used in a bias mitigation task to attempt to mitigate bias
- bias_mitigation_techniquestr, optional
One of datarobot.enums.BiasMitigationTechnique Options: - ‘preprocessingReweighing’ - ‘postProcessingRejectionOptionBasedClassification’ The technique by which we’ll mitigate bias, which will inform which bias mitigation task we insert into blueprints
- include_bias_mitigation_feature_as_predictor_variablebool
Whether we should also use the mitigation feature as in input to the modeler just like any other categorical used for training, i.e. do we want the model to “train on” this feature in addition to using it for bias mitigation
- Returns
- ModelJob
the job of the model with bias mitigation applied that was just submitted for training
- Return type
- request_bias_mitigation_feature_info(bias_mitigation_feature_name)¶
Request a compute job for bias mitigation feature info for a given feature, which will include - if there are any rare classes - if there are any combinations of the target values and the feature values that never occur in the same row - if the feature has a high number of missing values. Note that this feature check is dependent on the current target selected for the project.
New in version v2.29.
- Parameters
- bias_mitigation_feature_namestr
The feature name of the protected features that will be used in a bias mitigation task to attempt to mitigate bias
- Returns
- BiasMitigationFeatureInfo
Bias mitigation feature info model for the requested feature
- Return type
- get_bias_mitigation_feature_info(bias_mitigation_feature_name)¶
Get the computed bias mitigation feature info for a given feature, which will include - if there are any rare classes - if there are any combinations of the target values and the feature values that never occur in the same row - if the feature has a high number of missing values. Note that this feature check is dependent on the current target selected for the project. If this info has not already been computed, this will raise a 404 error.
New in version v2.29.
- Parameters
- bias_mitigation_feature_namestr
The feature name of the protected features that will be used in a bias mitigation task to attempt to mitigate bias
- Returns
- BiasMitigationFeatureInfo
Bias mitigation feature info model for the requested feature
- Return type
- classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters
- datadict
Correctly snake_cased keys and their values.
- Return type
TypeVar
(T
, bound=APIObject
)
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
TypeVar
(T
, bound=APIObject
)
- open_in_browser()¶
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type
None
- set_datetime_partitioning(datetime_partition_spec=None, **kwargs)¶
Set the datetime partitioning method for a time series project by either passing in a DatetimePartitioningSpecification instance or any individual attributes of that class. Updates
self.partitioning_method
if already set previously (does not replace it).This is an alternative to passing a specification to
Project.analyze_and_model
via thepartitioning_method
parameter. To see the full partitioning based on the project dataset, useDatetimePartitioning.generate
.New in version v3.0.
- Parameters
- datetime_partition_spec
DatetimePartitioningSpecification
, optional The customizable aspects of datetime partitioning for a time series project. An alternative to passing individual settings (attributes of the DatetimePartitioningSpecification class).
- Returns
- DatetimePartitioning
Full partitioning including user-specified attributes as well as those determined by DR based on the dataset.
- Return type
- list_datetime_partition_spec()¶
List datetime partitioning settings.
This method makes an API call to retrieve settings from the DB if project is in the modeling stage, i.e. if analyze_and_model (autopilot) has already been called.
If analyze_and_model has not yet been called, this method will instead simply print settings from project.partitioning_method.
New in version v3.0.
- Returns
- DatetimePartitioningSpecification or None
- Return type
Optional
[DatetimePartitioningSpecification
]
- class datarobot.helpers.eligibility_result.EligibilityResult(supported, reason='', context='')¶
Represents whether a particular operation is supported
For instance, a function to check whether a set of models can be blended can return an EligibilityResult specifying whether or not blending is supported and why it may not be supported.
- Attributes
- supportedbool
whether the operation this result represents is supported
- reasonstr
why the operation is or is not supported
- contextstr
what operation isn’t supported
Rating Table¶
- class datarobot.models.RatingTable(id, rating_table_name, original_filename, project_id, parent_model_id, model_id=None, model_job_id=None, validation_job_id=None, validation_error=None)¶
Interface to modify and download rating tables.
- Attributes
- idstr
The id of the rating table.
- project_idstr
The id of the project this rating table belongs to.
- rating_table_namestr
The name of the rating table.
- original_filenamestr
The name of the file used to create the rating table.
- parent_model_idstr
The model id of the model the rating table was validated against.
- model_idstr
The model id of the model that was created from the rating table. Can be None if a model has not been created from the rating table.
- model_job_idstr
The id of the job to create a model from this rating table. Can be None if a model has not been created from the rating table.
- validation_job_idstr
The id of the created job to validate the rating table. Can be None if the rating table has not been validated.
- validation_errorstr
Contains a description of any errors caused during validation.
- classmethod get(project_id, rating_table_id)¶
Retrieve a single rating table
- Parameters
- project_idstr
The ID of the project the rating table is associated with.
- rating_table_idstr
The ID of the rating table
- Returns
- rating_tableRatingTable
The queried instance
- Return type
- classmethod create(project_id, parent_model_id, filename, rating_table_name='Uploaded Rating Table')¶
Uploads and validates a new rating table CSV
- Parameters
- project_idstr
id of the project the rating table belongs to
- parent_model_idstr
id of the model for which this rating table should be validated against
- filenamestr
The path of the CSV file containing the modified rating table.
- rating_table_namestr, optional
A human friendly name for the new rating table. The string may be truncated and a suffix may be added to maintain unique names of all rating tables.
- Returns
- job: Job
an instance of created async job
- Raises
- InputNotUnderstoodError
Raised if filename isn’t one of supported types.
- ClientError (400)
Raised if parent_model_id is invalid.
- Return type
- download(filepath)¶
Download a csv file containing the contents of this rating table
- Parameters
- filepathstr
The path at which to save the rating table file.
- Return type
None
- rename(rating_table_name)¶
Renames a rating table to a different name.
- Parameters
- rating_table_namestr
The new name to rename the rating table to.
- Return type
None
- create_model()¶
Creates a new model from this rating table record. This rating table must not already be associated with a model and must be valid.
- Returns
- job: Job
an instance of created async job
- Raises
- ClientError (422)
Raised if creating model from a RatingTable that failed validation
- JobAlreadyRequested
Raised if creating model from a RatingTable that is already associated with a RatingTableModel
- Return type
Recommended Models¶
- class datarobot.models.ModelRecommendation(project_id, model_id, recommendation_type)¶
A collection of information about a recommended model for a project.
- Attributes
- project_idstr
the id of the project the model belongs to
- model_idstr
the id of the recommended model
- recommendation_typestr
the type of model recommendation
- classmethod get(project_id, recommendation_type=None)¶
Retrieves the default or specified by recommendation_type recommendation.
- Parameters
- project_idstr
The project’s id.
- recommendation_typeenums.RECOMMENDED_MODEL_TYPE
The type of recommendation to get. If None, returns the default recommendation.
- Returns
- recommended_modelModelRecommendation
- Return type
Optional
[ModelRecommendation
]
- classmethod get_all(project_id)¶
Retrieves all of the current recommended models for the project.
- Parameters
- project_idstr
The project’s id.
- Returns
- recommended_modelslist of ModelRecommendation
- Return type
List
[ModelRecommendation
]
- classmethod get_recommendation(recommended_models, recommendation_type)¶
Returns the model in the given list with the requested type.
- Parameters
- recommended_modelslist of ModelRecommendation
- recommendation_typeenums.RECOMMENDED_MODEL_TYPE
the type of model to extract from the recommended_models list
- Returns
- recommended_modelModelRecommendation or None if no model with the requested type exists
- Return type
Optional
[ModelRecommendation
]
- get_model()¶
Returns the Model associated with this ModelRecommendation.
- Returns
- recommended_modelModel or DatetimeModel if the project is datetime-partitioned
- Return type
Union
[DatetimeModel
,Model
]
Registered Model¶
- class datarobot.models.RegisteredModel(id, name, description, created_at, modified_at, target, created_by, last_version_num, is_archived, modified_by=None)¶
A registered model is a logical grouping of model packages (versions) that are related to each other.
- Attributes
- idstr
The ID of the registered model.
- namestr
The name of the registered model.
- descriptionstr
The description of the registered model.
- created_atstr
The creation time of the registered model.
- modified_atstr
The last modification time for the registered model.
- modified_bydatarobot.models.model_registry.common.UserMetadata
Information on the user who last modified the registered model.
- targetTarget
Information on the target variable.
- created_bydatarobot.models.model_registry.common.UserMetadata
Information on the creator of the registered model.
- last_version_numint
The latest version number associated to this registered model.
- is_archivedbool
Determines whether the registered model is archived.
- classmethod get(registered_model_id)¶
Get a registered model by ID.
- Parameters
- registered_model_idstr
ID of the registered model to retrieve
- Returns
- registered_modelRegisteredModel
Registered Model Object
Examples
from datarobot import RegisteredModel registered_model = RegisteredModel.get(registered_model_id='5c939e08962d741e34f609f0') registered_model.id >>>'5c939e08962d741e34f609f0' registered_model.name >>>'My Registered Model'
- Return type
TypeVar
(TRegisteredModel
, bound=RegisteredModel
)
- classmethod list(limit=100, offset=None, sort_key=None, sort_direction=None, search=None, filters=None)¶
List all registered models a user can view.
- Parameters
- limitint, optional
Maximum number of registered models to return
- offsetint, optional
Number of registered models to skip before returning results
- sort_keyRegisteredModelSortKey, optional
Key to order result by
- sort_directionRegisteredModelSortDirection, optional
Sort direction
- searchstr, optional
A term to search for in registered model name, description, or target name
- filtersRegisteredModelListFilters, optional
An object containing all filters that you’d like to apply to the resulting list of registered models.
- Returns
- ——-
- registered_modelsList[RegisteredModel]
A list of registered models user can view.
Examples
from datarobot import RegisteredModel registered_models = RegisteredModel.list() >>> [RegisteredModel('My Registered Model'), RegisteredModel('My Other Registered Model')]
from datarobot import RegisteredModel from datarobot.models.model_registry import RegisteredModelListFilters from datarobot.enums import RegisteredModelSortKey, RegisteredModelSortDirection filters = RegisteredModelListFilters(target_type='Regression') registered_models = RegisteredModel.list( filters=filters, sort_key=RegisteredModelSortKey.NAME.value, sort_direction=RegisteredModelSortDirection.DESC.value search='other') >>> [RegisteredModel('My Other Registered Model')]
- Return type
List
[TypeVar
(TRegisteredModel
, bound=RegisteredModel
)]
- classmethod archive(registered_model_id)¶
Permanently archive a registered model and all of its versions.
- Parameters
- registered_model_idstr
ID of the registered model to be archived
- Returns
- Return type
None
- classmethod update(registered_model_id, name)¶
Update the name of a registered model.
- Parameters
- registered_model_idstr
ID of the registered model to be updated
- namestr
New name for the registered model
- Returns
- registered_modelRegisteredModel
Updated registered model object
- Return type
TypeVar
(TRegisteredModel
, bound=RegisteredModel
)
Retrieve access control information for this registered model.
- Parameters
- offsetOptional[int]
The number of records to skip over. Optional. Default is 0.
- limit: Optional[int]
The number of records to return. Optional. Default is 100.
- id: Optional[str]
Return the access control information for a user with this user ID. Optional.
- Return type
List
[SharingRole
]
Share this registered model or remove access from one or more user(s).
- Parameters
- rolesList[SharingRole]
A list of
SharingRole
instances, each of which references a user and a role to be assigned.
Examples
>>> from datarobot import RegisteredModel, SharingRole >>> from datarobot.enums import SHARING_ROLE, SHARING_RECIPIENT_TYPE >>> registered_model = RegisteredModel.get('5c939e08962d741e34f609f0') >>> sharing_role = SharingRole( ... role=SHARING_ROLE.CONSUMER, ... recipient_type=SHARING_RECIPIENT_TYPE.USER, ... id='5c939e08962d741e34f609f0', ... can_share=True, ... ) >>> registered_model.share(roles=[sharing_role])
- Return type
None
- get_version(version_id)¶
Retrieve a registered model version.
- Parameters
- version_idstr
The ID of the registered model version to retrieve.
- Returns
- registered_model_versionRegisteredModelVersion
A registered model version object.
Examples
from datarobot import RegisteredModel registered_model = RegisteredModel.get('5c939e08962d741e34f609f0') registered_model_version = registered_model.get_version('5c939e08962d741e34f609f0') >>> RegisteredModelVersion('My Registered Model Version')
- Return type
- list_versions(filters=None, search=None, sort_key=None, sort_direction=None, limit=None, offset=None)¶
Retrieve a list of registered model versions.
- Parameters
- filtersOptional[RegisteredModelVersionsListFilters]
A RegisteredModelVersionsListFilters instance used to filter the list of registered model versions returned.
- searchOptional[str]
A search string used to filter the list of registered model versions returned.
- sort_keyOptional[RegisteredModelVersionSortKey]
The key to use to sort the list of registered model versions returned.
- sort_directionOptional[RegisteredModelSortDirection]
The direction to use to sort the list of registered model versions returned.
- limitOptional[int]
The maximum number of registered model versions to return. Default is 100.
- offsetOptional[int]
The number of registered model versions to skip over. Default is 0.
- Returns
- registered_model_versionsList[RegisteredModelVersion]
A list of registered model version objects.
Examples
from datarobot import RegisteredModel from datarobot.models.model_registry import RegisteredModelVersionsListFilters from datarobot.enums import RegisteredModelSortKey, RegisteredModelSortDirection registered_model = RegisteredModel.get('5c939e08962d741e34f609f0') filters = RegisteredModelVersionsListFilters(tags=['tag1', 'tag2']) registered_model_versions = registered_model.list_versions(filters=filters) >>> [RegisteredModelVersion('My Registered Model Version')]
- Return type
List
[RegisteredModelVersion
]
- list_associated_deployments(search=None, sort_key=None, sort_direction=None, limit=None, offset=None)¶
Retrieve a list of deployments associated with this registered model.
- Parameters
- searchOptional[str]
- sort_keyOptional[RegisteredModelDeploymentSortKey]
- sort_directionOptional[RegisteredModelSortDirection]
- limitOptional[int]
- offsetOptional[int]
- Returns
- deploymentsList[VersionAssociatedDeployment]
A list of deployments associated with this registered model.
- Return type
- class datarobot.models.RegisteredModelVersion(id, registered_model_id, registered_model_version, name, model_id, model_execution_type, is_archived, import_meta, source_meta, model_kind, target, model_description, datasets, timeseries, is_deprecated, permissions, active_deployment_count, bias_and_fairness=None, build_status=None, user_provided_id=None, updated_at=None, updated_by=None, tags=None, mlpkg_file_contents=None)¶
Represents a version of a registered model.
- Parameters
- idstr
The ID of the registered model version.
- registered_model_idstr
The ID of the parent registered model.
- registered_model_versionint
The version of the registered model.
- namestr
The name of the registered model version.
- model_idstr
The ID of the model.
- model_execution_typestr
Type of model package (version). dedicated (native DataRobot models) and custom_inference_model` (user added inference models) both execute on DataRobot prediction servers, external do not
- is_archivedbool
- Whether the model package (version) is permanently archived (cannot be used in deployment or
replacement)
- import_metaImportMeta
Information from when this Model Package (version) was first saved.
- source_metaSourceMeta
Meta information from where this model was generated
- model_kindModelKind
Model attribute information.
- targetTarget
Target information for the registered model version.
- model_descriptionModelDescription
Model description information.
- datasetsDataset
Dataset information for the registered model version.
- timeseriesTimeseries
Timeseries information for the registered model version.
- bias_and_fairnessBiasAndFairness
Bias and fairness information for the registered model version.
- is_deprecatedbool
- Whether the model package (version) is deprecated (cannot be used in deployment or
replacement)
- permissionsList[str]
Permissions for the registered model version.
- active_deployment_countint or None
Number of the active deployments associated with the registered model version.
- build_statusstr or None
Model package (version) build status. One of complete, inProgress, failed.
- user_provided_idstr or None
User provided ID for the registered model version.
- updated_atstr or None
The time the registered model version was last updated.
- updated_byUserMetadata or None
The user who last updated the registered model version.
- tagsList[TagWithId] or None
The tags associated with the registered model version.
- mlpkg_file_contentsstr or None
The contents of the model package file.
- classmethod create_for_leaderboard_item(model_id, name=None, prediction_threshold=None, distribution_prediction_model_id=None, description=None, compute_all_ts_intervals=None, registered_model_name=None, registered_model_id=None, tags=None, registered_model_tags=None, registered_model_description=None)¶
- Parameters
- model_idstr
ID of the DataRobot model.
- namestr or None
Name of the version (model package).
- prediction_thresholdfloat or None
Threshold used for binary classification in predictions.
- distribution_prediction_model_idstr or None
ID of the DataRobot distribution prediction model trained on predictions from the DataRobot model.
- descriptionstr or None
Description of the version (model package).
- compute_all_ts_intervalsbool or None
Whether to compute all time series prediction intervals (1-100 percentiles).
- registered_model_nameOptional[str]
Name of the new registered model that will be created from this model package (version). The model package (version) will be created as version 1 of the created registered model. If neither registeredModelName nor registeredModelId is provided, it defaults to the model package (version) name. Mutually exclusive with registeredModelId.
- registered_model_idOptional[str]
Creates a model package (version) as a new version for the provided registered model ID. Mutually exclusive with registeredModelName.
- tagsOptional[List[Tag]]
Tags for the registered model version.
- registered_model_tags: Optional[List[Tag]]
Tags for the registered model.
- registered_model_description: Optional[str]
Description for the registered model.
- Returns
- regitered_model_versionRegisteredModelVersion
A new registered model version object.
- Return type
TypeVar
(TRegisteredModelVersion
, bound=RegisteredModelVersion
)
- classmethod create_for_external(name, target, model_id=None, model_description=None, datasets=None, timeseries=None, registered_model_name=None, registered_model_id=None, tags=None, registered_model_tags=None, registered_model_description=None)¶
Create a new registered model version from an external model.
- Parameters
- namestr
Name of the registered model version.
- targetExternalTarget
Target information for the registered model version.
- model_idOptional[str]
Model ID of the registered model version.
- model_descriptionOptional[ModelDescription]
Information about the model.
- datasetsOptional[ExternalDatasets]
Dataset information for the registered model version.
- timeseriesOptional[Timeseries]
Timeseries properties for the registered model version.
- registered_model_nameOptional[str]
Name of the new registered model that will be created from this model package (version). The model package (version) will be created as version 1 of the created registered model. If neither registeredModelName nor registeredModelId is provided, it defaults to the model package (version) name. Mutually exclusive with registeredModelId.
- registered_model_idOptional[str]
Creates a model package (version) as a new version for the provided registered model ID. Mutually exclusive with registeredModelName.
- tagsOptional[List[Tag]]
Tags for the registered model version.
- registered_model_tags: Optional[List[Tag]]
Tags for the registered model.
- registered_model_description: Optional[str]
Description for the registered model.
- Returns
- registered_model_versionRegisteredModelVersion
A new registered model version object.
- Return type
TypeVar
(TRegisteredModelVersion
, bound=RegisteredModelVersion
)
- classmethod create_for_custom_model_version(custom_model_version_id, name=None, description=None, registered_model_name=None, registered_model_id=None, tags=None, registered_model_tags=None, registered_model_description=None)¶
Create a new registered model version from a custom model version.
- Parameters
- custom_model_version_idstr
ID of the custom model version.
- nameOptional[str]
Name of the registered model version.
- descriptionOptional[str]
Description of the registered model version.
- registered_model_nameOptional[str]
Name of the new registered model that will be created from this model package (version). The model package (version) will be created as version 1 of the created registered model. If neither registeredModelName nor registeredModelId is provided, it defaults to the model package (version) name. Mutually exclusive with registeredModelId.
- registered_model_idOptional[str]
Creates a model package (version) as a new version for the provided registered model ID. Mutually exclusive with registeredModelName.
- tagsOptional[List[Tag]]
Tags for the registered model version.
- registered_model_tags: Optional[List[Tag]]
Tags for the registered model.
- registered_model_description: Optional[str]
Description for the registered model.
- Returns
- registered_model_versionRegisteredModelVersion
A new registered model version object.
- Return type
TypeVar
(TRegisteredModelVersion
, bound=RegisteredModelVersion
)
- list_associated_deployments(search=None, sort_key=None, sort_direction=None, limit=None, offset=None)¶
Retrieve a list of deployments associated with this registered model version.
- Parameters
- searchOptional[str]
- sort_keyOptional[RegisteredModelDeploymentSortKey]
- sort_directionOptional[RegisteredModelSortDirection]
- limitOptional[int]
- offsetOptional[int]
- Returns
- deploymentsList[VersionAssociatedDeployment]
A list of deployments associated with this registered model version.
- Return type
- class datarobot.models.model_registry.deployment.VersionAssociatedDeployment(id, currently_deployed, registered_model_version, is_challenger, status, label=None, first_deployed_at=None, first_deployed_by=None, created_by=None, prediction_environment=None)¶
Represents a deployment associated with a registered model version.
- Parameters
- idstr
The ID of the deployment.
- currently_deployedbool
Whether this version is currently deployed.
- registered_model_versionint
The version of the registered model associated with this deployment.
- is_challengerbool
Whether the version associated with this deployment is a challenger.
- statusstr
The status of the deployment.
- labelstr, optional
The label of the deployment.
- first_deployed_atdatetime.datetime, optional
The time the version was first deployed.
- first_deployed_byUserMetadata, optional
The user who first deployed the version.
- created_byUserMetadata, optional
The user who created the deployment.
- prediction_environmentDeploymentPredictionEnvironment, optional
The prediction environment of the deployment.
- class datarobot.models.model_registry.RegisteredModelVersionsListFilters(target_name=None, target_type=None, compatible_with_leaderboard_model_id=None, compatible_with_model_package_id=None, for_challenger=None, prediction_threshold=None, imported=None, prediction_environment_id=None, model_kind=None, build_status=None, use_case_id=None, tags=None)¶
Filters for listing of registered model versions.
- Parameters
- target_name: str or None
Name of the target to filter by.
- target_type: str or None
Type of the target to filter by.
- compatible_with_leaderboard_model_id: str or None.
If specified, limit results to versions (model packages) of the Leaderboard model with the specified ID.
- compatible_with_model_package_id: str or None.
Returns versions compatible with the given model package (version) ID. If used, it will only return versions that match target.name, target.type, target.classNames (for classification models), modelKind.isTimeSeries and modelKind.isMultiseries for the specified model package (version).
- for_challenger: bool or None
Can be used with compatibleWithModelPackageId to request similar versions that can be used as challenger models; for external model packages (versions), instead of returning similar external model packages (versions), similar DataRobot and Custom model packages (versions) will be retrieved.
- prediction_threshold: float or None
Return versions with the specified prediction threshold used for binary classification models.
- imported: bool or None
If specified, return either imported (true) or non-imported (false) versions (model packages).
- prediction_environment_id: str or None
Can be used to filter versions (model packages) by what is supported by the prediction environment
- model_kind: str or None
Can be used to filter versions (model packages) by model kind.
- build_status: str or None
If specified, filter versions by the build status.
- class datarobot.models.model_registry.RegisteredModelListFilters(created_at_start=None, created_at_end=None, modified_at_start=None, modified_at_end=None, target_name=None, target_type=None, created_by=None, compatible_with_leaderboard_model_id=None, compatible_with_model_package_id=None, for_challenger=None, prediction_threshold=None, imported=None, prediction_environment_id=None, model_kind=None, build_status=None)¶
Filters for listing registered models.
- Parameters
- created_at_startdatetime.datetime
Registered models created on or after this timestamp.
- created_at_enddatetime.datetime
Registered models created before this timestamp. Defaults to the current time.
- modified_at_startdatetime.datetime
Registered models modified on or after this timestamp.
- modified_at_enddatetime.datetime
Registered models modified before this timestamp. Defaults to the current time.
- target_namestr
Name of the target to filter by.
- target_typestr
Type of the target to filter by.
- created_bystr
Email of the user that created registered model to filter by.
- compatible_with_leaderboard_model_idstr
If specified, limit results to registered models containing versions (model packages) for the leaderboard model with the specified ID.
- compatible_with_model_package_idstr
Return registered models that have versions (model packages) compatible with given model package (version) ID. If used, will only return registered models which have versions that match target.name, target.type, target.classNames (for classification models), modelKind.isTimeSeries, and modelKind.isMultiseries of the specified model package (version).
- for_challengerbool
Can be used with compatibleWithModelPackageId to request similar registered models that contain versions (model packages) that can be used as challenger models; for external model packages (versions), instead of returning similar external model packages (versions), similar DataRobot and Custom model packages will be retrieved.
- prediction_thresholdfloat
If specified, return any registered models containing one or more versions matching the prediction threshold used for binary classification models.
- importedbool
If specified, return any registered models that contain either imported (true) or non-imported (false) versions (model packages).
- prediction_environment_idstr
Can be used to filter registered models by what is supported by the prediction environment.
- model_kindstr
Return models that contain versions matching a specific format.
- build_statusstr
If specified, only return models that have versions with specified build status.
ROC Curve¶
- class datarobot.models.roc_curve.RocCurve(source, roc_points, negative_class_predictions, positive_class_predictions, source_model_id, data_slice_id=None)¶
ROC curve data for model.
- Attributes
- sourcestr
ROC curve data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.
- roc_pointslist of dict
List of precalculated metrics associated with thresholds for ROC curve.
- negative_class_predictionslist of float
List of predictions from example for negative class
- positive_class_predictionslist of float
List of predictions from example for positive class
- source_model_idstr
ID of the model this ROC curve represents; in some cases, insights from the parent of a frozen model may be used
- data_slice_id: str
ID of the data slice this ROC curve represents.
- classmethod from_server_data(data, keep_attrs=None, use_insights_format=False, **kwargs)¶
Overwrite APIObject.from_server_data to handle roc curve data retrieved from either legacy URL or /insights/ new URL.
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place.
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- use_insights_formatbool, optional
Whether to repack the data from the format used in the GET /insights/RocCur/ URL to the format used in the legacy URL.
- Return type
- class datarobot.models.roc_curve.LabelwiseRocCurve(source, roc_points, negative_class_predictions, positive_class_predictions, source_model_id, label, kolmogorov_smirnov_metric, auc)¶
Labelwise ROC curve data for one label and one source.
- Attributes
- sourcestr
ROC curve data source. Can be ‘validation’, ‘crossValidation’ or ‘holdout’.
- roc_pointslist of dict
List of precalculated metrics associated with thresholds for ROC curve.
- negative_class_predictionslist of float
List of predictions from example for negative class
- positive_class_predictionslist of float
List of predictions from example for positive class
- source_model_idstr
ID of the model this ROC curve represents; in some cases, insights from the parent of a frozen model may be used
- labelstr
Label name for
- kolmogorov_smirnov_metricfloat
Kolmogorov-Smirnov metric value for label
- aucfloat
AUC metric value for label
Ruleset¶
- class datarobot.models.Ruleset(project_id, parent_model_id, ruleset_id, rule_count, score, model_id=None)¶
Represents an approximation of a model with DataRobot Prime
- Attributes
- idstr
the id of the ruleset
- rule_countint
the number of rules used to approximate the model
- scorefloat
the validation score of the approximation
- project_idstr
the project the approximation belongs to
- parent_model_idstr
the model being approximated
- model_idstr or None
the model using this ruleset (if it exists). Will be None if no such model has been trained.
Segmented Modeling¶
API Reference for entities used in Segmented Modeling. See dedicated User Guide for examples.
- class datarobot.CombinedModel(id=None, project_id=None, segmentation_task_id=None, is_active_combined_model=False)¶
A model from a segmented project. Combination of ordinary models in child segments projects.
- Attributes
- idstr
the id of the model
- project_idstr
the id of the project the model belongs to
- segmentation_task_idstr
the id of a segmentation task used in this model
- is_active_combined_modelbool
flag indicating if this is the active combined model in segmented project
- classmethod get(project_id, combined_model_id)¶
Retrieve combined model
- Parameters
- project_idstr
The project’s id.
- combined_model_idstr
Id of the combined model.
- Returns
- CombinedModel
The queried combined model.
- Return type
- classmethod set_segment_champion(project_id, model_id, clone=False)¶
Update a segment champion in a combined model by setting the model_id that belongs to the child project_id as the champion.
- Parameters
- project_idstr
The project id for the child model that contains the model id.
- model_idstr
Id of the model to mark as the champion
- clonebool
(New in version v2.29) optional, defaults to False. Defines if combined model has to be cloned prior to setting champion (champion will be set for new combined model if yes).
- Returns
- combined_model_idstr
Id of the combined model that was updated
- Return type
str
- get_segments_info()¶
Retrieve Combined Model segments info
- Returns
- list[SegmentInfo]
List of segments
- Return type
List
[SegmentInfo
]
- get_segments_as_dataframe(encoding='utf-8')¶
Retrieve Combine Models segments as a DataFrame.
- Parameters
- encodingstr, optional
A string representing the encoding to use in the output csv file. Defaults to ‘utf-8’.
- Returns
- DataFrame
Combined model segments
- Return type
DataFrame
- get_segments_as_csv(filename, encoding='utf-8')¶
Save the Combine Models segments to a csv.
- Parameters
- filenamestr or file object
The path or file object to save the data to.
- encodingstr, optional
A string representing the encoding to use in the output csv file. Defaults to ‘utf-8’.
- Return type
None
- train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)¶
Inherited from Model - CombinedModels cannot be retrained directly
- Return type
NoReturn
- train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)¶
Inherited from Model - CombinedModels cannot be retrained directly
- Return type
NoReturn
- retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)¶
Inherited from Model - CombinedModels cannot be retrained directly
- Return type
NoReturn
- request_frozen_model(sample_pct=None, training_row_count=None)¶
Inherited from Model - CombinedModels cannot be retrained as frozen
- Return type
NoReturn
- request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)¶
Inherited from Model - CombinedModels cannot be retrained as frozen
- Return type
NoReturn
- cross_validate()¶
Inherited from Model - CombinedModels cannot request cross validation
- Return type
NoReturn
- class datarobot.SegmentationTask(id, project_id, name, type, created, segments_count, segments, metadata, data)¶
A Segmentation Task is used for segmenting an existing project into multiple child projects. Each child project (or segment) will be a separate autopilot run. Currently only user defined segmentation is supported.
Example for creating a new SegmentationTask for Time Series segmentation with a user defined id column:
from datarobot import SegmentationTask # Create the SegmentationTask segmentation_task_results = SegmentationTask.create( project_id=project.id, target=target, use_time_series=True, datetime_partition_column=datetime_partition_column, multiseries_id_columns=[multiseries_id_column], user_defined_segment_id_columns=[user_defined_segment_id_column] ) # Retrieve the completed SegmentationTask object from the job results segmentation_task = segmentation_task_results['completedJobs'][0]
- Attributes
- idObjectId
The id of the segmentation task.
- project_idObjectId
The associated id of the parent project.
- typestr
What type of job the segmentation task is associated with, e.g. auto_ml or auto_ts.
- createddatetime
The date this segmentation task was created.
- segments_countint
The number of segments the segmentation task generated.
- segmentslist of strings
The segment names that the segmentation task generated.
- metadatadict
List of features that help to identify the parameters used by the segmentation task.
- datadict
Optional parameters that are associated with enabled metadata for the segmentation task.
- classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters
- datadict
Correctly snake_cased keys and their values.
- Return type
- collect_payload()¶
Convert the record to a dictionary
- Return type
Dict
[str
,str
]
- classmethod create(project_id, target, use_time_series=False, datetime_partition_column=None, multiseries_id_columns=None, user_defined_segment_id_columns=None, max_wait=600, model_package_id=None)¶
Creates segmentation tasks for the project based on the defined parameters.
- Parameters
- project_idstr
The associated id of the parent project.
- targetstr
The column that represents the target in the dataset.
- use_time_seriesbool
Whether AutoTS or AutoML segmentations should be generated.
- datetime_partition_columnstr or null
Required for Time Series. The name of the column whose values as dates are used to assign a row to a particular partition.
- multiseries_id_columnslist of str or null
Required for Time Series. A list of the names of multiseries id columns to define series within the training data. Currently only one multiseries id column is supported.
- user_defined_segment_id_columnslist of str or null
Required when using a column for segmentation. A list of the segment id columns to use to define what columns are used to manually segment data. Currently only one user defined segment id column is supported.
- model_package_idstr
Required when using automated segmentation. The associated id of the model in the DataRobot Model Registry that will be used to perform automated segmentation on a dataset.
- max_waitinteger
The number of seconds to wait
- Returns
- segmentation_tasksdict
Dictionary containing the numberOfJobs, completedJobs, and failedJobs. completedJobs is a list of SegmentationTask objects, while failed jobs is a list of dictionaries indicating problems with submitted tasks.
- Return type
- classmethod list(project_id)¶
List all of the segmentation tasks that have been created for a specific project_id.
- Parameters
- project_idstr
The id of the parent project
- Returns
- segmentation_taskslist of SegmentationTask
List of instances with initialized data.
- Return type
List
[SegmentationTask
]
- classmethod get(project_id, segmentation_task_id)¶
Retrieve information for a single segmentation task associated with a project_id.
- Parameters
- project_idstr
The id of the parent project
- segmentation_task_idstr
The id of the segmentation task
- Returns
- segmentation_taskSegmentationTask
Instance with initialized data.
- Return type
- class datarobot.SegmentInfo(project_id, segment, project_stage, project_status_error, autopilot_done, model_count=None, model_id=None)¶
A SegmentInfo is an object containing information about the combined model segments
- Attributes
- project_idstr
The associated id of the child project.
- segmentstr
the name of the segment
- project_stagestr
A description of the current stage of the project
- project_status_errorstr
Project status error message.
- autopilot_donebool
Is autopilot done for the project.
- model_countint
Count of trained models in project.
- model_idstr
ID of segment champion model.
- classmethod list(project_id, model_id)¶
List all of the segments that have been created for a specific project_id.
- Parameters
- project_idstr
The id of the parent project
- Returns
- segmentslist of datarobot.models.segmentation.SegmentInfo
List of instances with initialized data.
- Return type
List
[SegmentInfo
]
- class datarobot.models.segmentation.SegmentationTask(id, project_id, name, type, created, segments_count, segments, metadata, data)¶
A Segmentation Task is used for segmenting an existing project into multiple child projects. Each child project (or segment) will be a separate autopilot run. Currently only user defined segmentation is supported.
Example for creating a new SegmentationTask for Time Series segmentation with a user defined id column:
from datarobot import SegmentationTask # Create the SegmentationTask segmentation_task_results = SegmentationTask.create( project_id=project.id, target=target, use_time_series=True, datetime_partition_column=datetime_partition_column, multiseries_id_columns=[multiseries_id_column], user_defined_segment_id_columns=[user_defined_segment_id_column] ) # Retrieve the completed SegmentationTask object from the job results segmentation_task = segmentation_task_results['completedJobs'][0]
- Attributes
- idObjectId
The id of the segmentation task.
- project_idObjectId
The associated id of the parent project.
- typestr
What type of job the segmentation task is associated with, e.g. auto_ml or auto_ts.
- createddatetime
The date this segmentation task was created.
- segments_countint
The number of segments the segmentation task generated.
- segmentslist of strings
The segment names that the segmentation task generated.
- metadatadict
List of features that help to identify the parameters used by the segmentation task.
- datadict
Optional parameters that are associated with enabled metadata for the segmentation task.
- classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters
- datadict
Correctly snake_cased keys and their values.
- Return type
- collect_payload()¶
Convert the record to a dictionary
- Return type
Dict
[str
,str
]
- classmethod create(project_id, target, use_time_series=False, datetime_partition_column=None, multiseries_id_columns=None, user_defined_segment_id_columns=None, max_wait=600, model_package_id=None)¶
Creates segmentation tasks for the project based on the defined parameters.
- Parameters
- project_idstr
The associated id of the parent project.
- targetstr
The column that represents the target in the dataset.
- use_time_seriesbool
Whether AutoTS or AutoML segmentations should be generated.
- datetime_partition_columnstr or null
Required for Time Series. The name of the column whose values as dates are used to assign a row to a particular partition.
- multiseries_id_columnslist of str or null
Required for Time Series. A list of the names of multiseries id columns to define series within the training data. Currently only one multiseries id column is supported.
- user_defined_segment_id_columnslist of str or null
Required when using a column for segmentation. A list of the segment id columns to use to define what columns are used to manually segment data. Currently only one user defined segment id column is supported.
- model_package_idstr
Required when using automated segmentation. The associated id of the model in the DataRobot Model Registry that will be used to perform automated segmentation on a dataset.
- max_waitinteger
The number of seconds to wait
- Returns
- segmentation_tasksdict
Dictionary containing the numberOfJobs, completedJobs, and failedJobs. completedJobs is a list of SegmentationTask objects, while failed jobs is a list of dictionaries indicating problems with submitted tasks.
- Return type
- classmethod list(project_id)¶
List all of the segmentation tasks that have been created for a specific project_id.
- Parameters
- project_idstr
The id of the parent project
- Returns
- segmentation_taskslist of SegmentationTask
List of instances with initialized data.
- Return type
List
[SegmentationTask
]
- classmethod get(project_id, segmentation_task_id)¶
Retrieve information for a single segmentation task associated with a project_id.
- Parameters
- project_idstr
The id of the parent project
- segmentation_task_idstr
The id of the segmentation task
- Returns
- segmentation_taskSegmentationTask
Instance with initialized data.
- Return type
- class datarobot.models.segmentation.SegmentationTaskCreatedResponse() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
SHAP¶
- class datarobot.models.ShapImpact(count, shap_impacts, row_count=None)¶
Represents SHAP impact score for a feature in a model.
New in version v2.21.
Notes
SHAP impact score for a feature has the following structure:
feature_name
: (str) the feature name in datasetimpact_normalized
: (float) normalized impact score value (largest value is 1)impact_unnormalized
: (float) raw impact score value
- Attributes
- countint
the number of SHAP Impact object returned
- row_count: int or None
the sample size (specified in rows) to use for Shap Impact computation
- shap_impactslist
a list which contains SHAP impact scores for top 1000 features used by a model
- classmethod create(cls, project_id, model_id, row_count=None)¶
Create SHAP impact for the specified model.
- Parameters
- project_idstr
id of the project the model belongs to
- model_idstr
id of the model to calculate shap impact for
- row_countint
the sample size (specified in rows) to use for Feature Impact computation
- Returns
- jobJob
an instance of created async job
- Return type
- classmethod get(cls, project_id, model_id)¶
Retrieve SHAP impact scores for features in a model.
- Parameters
- project_idstr
id of the project the model belongs to
- model_idstr
id of the model the SHAP impact is for
- Returns
- shap_impactShapImpact
The queried instance.
- Raises
- ClientError (404)
If the project or model does not exist or the SHAP impact has not been computed.
- Return type
Training Predictions¶
- class datarobot.models.training_predictions.TrainingPredictionsIterator(client, path, limit=None)¶
Lazily fetches training predictions from DataRobot API in chunks of specified size and then iterates rows from responses as named tuples. Each row represents a training prediction computed for a dataset’s row. Each named tuple has the following structure:
Notes
Each
PredictionValue
dict contains these keys:- label
describes what this model output corresponds to. For regression projects, it is the name of the target feature. For classification and multiclass projects, it is a label from the target feature.
- value
the output of the prediction. For regression projects, it is the predicted value of the target. For classification and multiclass projects, it is the predicted probability that the row belongs to the class identified by the label.
Each
PredictionExplanations
dictionary contains these keys:- labelstring
describes what output was driven by this prediction explanation. For regression projects, it is the name of the target feature. For classification projects, it is the class whose probability increasing would correspond to a positive strength of this prediction explanation.
- featurestring
the name of the feature contributing to the prediction
- feature_valueobject
the value the feature took on for this row. The type corresponds to the feature (boolean, integer, number, string)
- strengthfloat
algorithm-specific explanation value attributed to feature in this row
ShapMetadata
dictionary contains these keys:- shap_remaining_totalfloat
The total of SHAP values for features beyond the
max_explanations
. This can be identically 0 in all rows, if max_explanations is greater than the number of features and thus all features are returned.- shap_base_valuefloat
the model’s average prediction over the training data. SHAP values are deviations from the base value.
- warningsdict or None
SHAP values calculation warnings (e.g. additivity check failures in XGBoost models). Schema described as
ShapWarnings
.
ShapWarnings
dictionary contains these keys:- mismatch_row_countint
the count of rows for which additivity check failed
- max_normalized_mismatchfloat
the maximal relative normalized mismatch value
Examples
import datarobot as dr # Fetch existing training predictions by their id training_predictions = dr.TrainingPredictions.get(project_id, prediction_id) # Iterate over predictions for row in training_predictions.iterate_rows() print(row.row_id, row.prediction)
- Attributes
- row_idint
id of the record in original dataset for which training prediction is calculated
- partition_idstr or float
The ID of the data partition that the row belongs to. “0.0” corresponds to the validation partition or backtest 1.
- predictionfloat or str or list of str
The model’s prediction for this data row.
- prediction_valueslist of dictionaries
An array of dictionaries with a schema described as
PredictionValue
.- timestampstr or None
(New in version v2.11) an ISO string representing the time of the prediction in time series project; may be None for non-time series projects
- forecast_pointstr or None
(New in version v2.11) an ISO string representing the point in time used as a basis to generate the predictions in time series project; may be None for non-time series projects
- forecast_distancestr or None
(New in version v2.11) how many time steps are between the forecast point and the timestamp in time series project; None for non-time series projects
- series_idstr or None
(New in version v2.11) the id of the series in a multiseries project; may be NaN for single series projects; None for non-time series projects
- prediction_explanationslist of dict or None
(New in version v2.21) The prediction explanations for each feature. The total elements in the array are bounded by
max_explanations
and feature count. Only present if prediction explanations were requested. Schema described asPredictionExplanations
.- shap_metadatadict or None
(New in version v2.21) The additional information necessary to understand SHAP based prediction explanations. Only present if explanation_algorithm equals datarobot.enums.EXPLANATIONS_ALGORITHM.SHAP was added in compute request. Schema described as
ShapMetadata
.
- class datarobot.models.training_predictions.TrainingPredictions(project_id, prediction_id, model_id=None, data_subset=None, explanation_algorithm=None, max_explanations=None, shap_warnings=None)¶
Represents training predictions metadata and provides access to prediction results.
Notes
Each element in
shap_warnings
has the following schema:- partition_namestr
the partition used for the prediction record.
- valueobject
the warnings related to this partition.
The objects in
value
are:- mismatch_row_countint
the count of rows for which additivity check failed.
- max_normalized_mismatchfloat
the maximal relative normalized mismatch value.
Examples
Compute training predictions for a model on the whole dataset
import datarobot as dr # Request calculation of training predictions training_predictions_job = model.request_training_predictions(dr.enums.DATA_SUBSET.ALL) training_predictions = training_predictions_job.get_result_when_complete() print('Training predictions {} are ready'.format(training_predictions.prediction_id)) # Iterate over actual predictions for row in training_predictions.iterate_rows(): print(row.row_id, row.partition_id, row.prediction)
List all training predictions for a project
import datarobot as dr # Fetch all training predictions for a project all_training_predictions = dr.TrainingPredictions.list(project_id) # Inspect all calculated training predictions for training_predictions in all_training_predictions: print( 'Prediction {} is made for data subset "{}"'.format( training_predictions.prediction_id, training_predictions.data_subset, ) )
Retrieve training predictions by id
import datarobot as dr # Getting training predictions by id training_predictions = dr.TrainingPredictions.get(project_id, prediction_id) # Iterate over actual predictions for row in training_predictions.iterate_rows(): print(row.row_id, row.partition_id, row.prediction)
- Attributes
- project_idstr
id of the project the model belongs to
- model_idstr
id of the model
- prediction_idstr
id of generated predictions
- data_subsetdatarobot.enums.DATA_SUBSET
data set definition used to build predictions. Choices are:
- datarobot.enums.DATA_SUBSET.ALL
for all data available. Not valid for models in datetime partitioned projects.
- datarobot.enums.DATA_SUBSET.VALIDATION_AND_HOLDOUT
for all data except training set. Not valid for models in datetime partitioned projects.
- datarobot.enums.DATA_SUBSET.HOLDOUT
for holdout data set only.
- datarobot.enums.DATA_SUBSET.ALL_BACKTESTS
for downloading the predictions for all backtest validation folds. Requires the model to have successfully scored all backtests. Datetime partitioned projects only.
- explanation_algorithmdatarobot.enums.EXPLANATIONS_ALGORITHM
(New in version v2.21) Optional. If set to shap, the response will include prediction explanations based on the SHAP explainer (SHapley Additive exPlanations). Defaults to null (no prediction explanations).
- max_explanationsint
(New in version v2.21) The number of top contributors that are included in prediction explanations. Max 100. Defaults to null for datasets narrower than 100 columns, defaults to 100 for datasets wider than 100 columns.
- shap_warningslist
(New in version v2.21) Will be present if
explanation_algorithm
was set to datarobot.enums.EXPLANATIONS_ALGORITHM.SHAP and there were additivity failures during SHAP values calculation.
- classmethod list(project_id)¶
Fetch all the computed training predictions for a project.
- Parameters
- project_idstr
id of the project
- Returns
- A list ofpy:class:TrainingPredictions objects
- classmethod get(project_id, prediction_id)¶
Retrieve training predictions on a specified data set.
- Parameters
- project_idstr
id of the project the model belongs to
- prediction_idstr
id of the prediction set
- Returns
TrainingPredictions
object which is ready to operate with specified predictions
- iterate_rows(batch_size=None)¶
Retrieve training prediction rows as an iterator.
- Parameters
- batch_sizeint, optional
maximum number of training prediction rows to fetch per request
- Returns
- iterator
TrainingPredictionsIterator
an iterator which yields named tuples representing training prediction rows
- iterator
- get_all_as_dataframe(class_prefix='class_', serializer='json')¶
Retrieve all training prediction rows and return them as a pandas.DataFrame.
- Returned dataframe has the following structure:
row_id : row id from the original dataset
prediction : the model’s prediction for this row
class_<label> : the probability that the target is this class (only appears for classification and multiclass projects)
timestamp : the time of the prediction (only appears for out of time validation or time series projects)
forecast_point : the point in time used as a basis to generate the predictions (only appears for time series projects)
forecast_distance : how many time steps are between timestamp and forecast_point (only appears for time series projects)
series_id : he id of the series in a multiseries project or None for a single series project (only appears for time series projects)
- Parameters
- class_prefixstr, optional
The prefix to append to labels in the final dataframe. Default is
class_
(e.g., apple -> class_apple)- serializerstr, optional
Serializer to use for the download. Options:
json
(default) orcsv
.
- Returns
- dataframe: pandas.DataFrame
- download_to_csv(filename, encoding='utf-8', serializer='json')¶
Save training prediction rows into CSV file.
- Parameters
- filenamestr or file object
path or file object to save training prediction rows
- encodingstring, optional
A string representing the encoding to use in the output file, defaults to ‘utf-8’
- serializerstr, optional
Serializer to use for the download. Options:
json
(default) orcsv
.
Types¶
- class datarobot.models.RocCurveEstimatedMetric¶
Typed dict for estimated metric
- class datarobot.models.AnomalyAssessmentRecordMetadata¶
Typed dict for record metadata
- class datarobot.models.AnomalyAssessmentPreviewBin¶
Typed dict for preview bin
- class datarobot.models.ShapleyFeatureContribution¶
Typed dict for shapley feature contribution
- class datarobot.models.AnomalyAssessmentDataPoint¶
Typed dict for data points
- class datarobot.models.RegionExplanationsData¶
Typed dict for region explanations
Use Cases¶
- class datarobot.UseCase(id, name, created_at, created, updated_at, updated, models_count, projects_count, datasets_count, notebooks_count, applications_count, playgrounds_count, vector_databases_count, members, description=None, owners=None)¶
Representation of a Use Case.
Examples
import datarobot with UseCase.get("2348ac"): print(f"The current use case is {dr.Context.use_case}")
- Attributes
- idstr
The ID of the Use Case.
- namestr
The name of the Use Case.
- descriptionstr
The description of the Use Case. Nullable.
- created_atstr
The timestamp generated at record creation.
- createdUseCaseUser
The user who created the Use Case.
- updated_atstr
The timestamp generated when the record was last updated.
- updatedUseCaseUser
The most recent user to update the Use Case.
- models_countint
The number of models in a Use Case.
- projects_countint
The number of projects in a Use Case.
- datasets_count: int
The number of datasets in a Use Case.
- notebooks_count: int
The number of notebooks in a Use Case.
- applications_count: int
The number of applications in a Use Case.
- playgrounds_count: int
The number of playgrounds in a Use Case.
- vector_databases_count: int
The number of vector databases in a Use Case.
- ownersList[UseCaseUser]
The most recent user to update the Use Case.
- membersList[UseCaseUser]
The most recent user to update the Use Case.
- get_uri()¶
- Returns
- urlstr
Permanent static hyperlink to this Use Case.
- Return type
str
- classmethod get(use_case_id)¶
Gets information about a Use Case.
- Parameters
- use_case_idstr
The identifier of the Use Case you want to load.
- Returns
- use_caseUseCase
The queried Use Case.
- Return type
- classmethod list(search_params=None)¶
Returns the Use Cases associated with this account.
- Parameters
- search_paramsdict, optional.
If not None, the returned projects are filtered by lookup. Currently, you can query use cases by:
offset
- The number of records to skip over. Default 0.limit
- The number of records to return in the range from 1 to 100. Default 100.search
- Only return Use Cases with names that match the given string.project_id
- Only return Use Cases associated with the given project ID.application_id
- Only return Use Cases associated with the given app.orderBy
- The order to sort the Use Cases.
orderBy
queries can use the following options:id
or-id
name
or-name
description
or-description
projects_count
or-projects_count
datasets_count
or-datasets_count
notebooks_count
or-notebooks_count
applications_count
or-applications_count
created_at
or-created_at
created_by
or-created_by
updated_at
or-updated_at
updated_by
or-updated_by
- Returns
- use_caseslist of UseCase instances
Contains a list of Use Cases associated with this user account.
- Raises
- TypeError
Raised if
search_params
parameter is provided, but is not of supported type.
- Return type
List
[UseCase
]
- classmethod create(name=None, description=None)¶
Create a new Use Case.
- Parameters
- namestr
Optional. The name of the new Use Case.
- description: str
The description of the new Use Case. Optional.
- Returns
- use_caseUseCase
The created Use Case.
- Return type
- classmethod delete(use_case_id)¶
Delete a Use Case.
- Parameters
- use_case_idstr
The ID of the Use Case to be deleted.
- Return type
None
- update(name=None, description=None)¶
Update a Use Case’s name or description.
- Parameters
- namestr
The updated name of the Use Case.
- descriptionstr
The updated description of the Use Case.
- Returns
- use_caseUseCase
The updated Use Case.
- Return type
- add(entity=None, entity_type=None, entity_id=None)¶
Add an entity (project, dataset, etc.) to a Use Case. Can only accept either an entity or an entity type and entity ID, but not both.
Projects and Applications can only be linked to a single Use Case. Datasets can be linked to multiple Use Cases.
There are some prerequisites for linking Projects to a Use Case which are explained in the user guide.
- Parameters
- entityUnion[UseCaseReferenceEntity, Project, Dataset, Application]
An existing entity to be linked to this Use Case. Cannot be used if entity_type and entity_id are passed.
- entity_typeUseCaseEntityType
The entity type of the entity to link to this Use Case. Cannot be used if entity is passed.
- entity_idstr
The ID of the entity to link to this Use Case. Cannot be used if entity is passed.
- Returns
- use_case_reference_entityUseCaseReferenceEntity
The newly created reference link between this Use Case and the entity.
- Return type
- remove(entity=None, entity_type=None, entity_id=None)¶
Remove an entity from a Use Case. Can only accept either an entity or an entity type and entity ID, but not both.
- Parameters
- entityUnion[UseCaseReferenceEntity, Project, Dataset, Application]
An existing entity instance to be removed from a Use Case. Cannot be used if entity_type and entity_id are passed.
- entity_typeUseCaseEntityType
The entity type of the entity to link to this Use Case. Cannot be used if entity is passed.
- entity_idstr
The ID of the entity to link to this Use Case. Cannot be used if entity is passed.
- Return type
None
Share this Use Case with or remove access from one or more user(s).
- Parameters
- rolesList[SharingRole]
A list of
SharingRole
instances, each of which references a user and a role to be assigned.Currently, the only supported roles for Use Cases are OWNER, EDITOR, and CONSUMER, and the only supported SHARING_RECIPIENT_TYPE is USER.
To remove access, set a user’s role to
datarobot.enums.SHARING_ROLE.NO_ROLE
.
Examples
The
SharingRole
class is needed in order to share a Use Case with one or more users.For example, suppose you had a list of user IDs you wanted to share this Use Case with. You could use a loop to generate a list of
SharingRole
objects for them, and bulk share this Use Case.>>> from datarobot.models.use_cases.use_case import UseCase >>> from datarobot.models.sharing import SharingRole >>> from datarobot.enums import SHARING_ROLE, SHARING_RECIPIENT_TYPE >>> >>> user_ids = ["60912e09fd1f04e832a575c1", "639ce542862e9b1b1bfa8f1b", "63e185e7cd3a5f8e190c6393"] >>> sharing_roles = [] >>> for user_id in user_ids: ... new_sharing_role = SharingRole( ... role=SHARING_ROLE.CONSUMER, ... share_recipient_type=SHARING_RECIPIENT_TYPE.USER, ... id=user_id, ... ) ... sharing_roles.append(new_sharing_role) >>> use_case = UseCase.get(use_case_id="5f33f1fd9071ae13568237b2") >>> use_case.share(roles=sharing_roles)
Similarly, a
SharingRole
instance can be used to remove a user’s access if therole
is set toSHARING_ROLE.NO_ROLE
, like in this example:>>> from datarobot.models.use_cases.use_case import UseCase >>> from datarobot.models.sharing import SharingRole >>> from datarobot.enums import SHARING_ROLE, SHARING_RECIPIENT_TYPE >>> >>> user_to_remove = "[email protected]" ... remove_sharing_role = SharingRole( ... role=SHARING_ROLE.NO_ROLE, ... share_recipient_type=SHARING_RECIPIENT_TYPE.USER, ... username=user_to_remove, ... ) >>> use_case = UseCase.get(use_case_id="5f33f1fd9071ae13568237b2") >>> use_case.share(roles=[remove_sharing_role])
- Return type
None
Retrieve access control information for this Use Case.
- Parameters
- offsetOptional[int]
The number of records to skip over. Optional. Default is 0.
- limit: Optional[int]
The number of records to return. Optional. Default is 100.
- id: Optional[str]
Return the access control information for a user with this user ID. Optional.
- Return type
List
[SharingRole
]
- list_projects()¶
List all projects associated with this Use Case.
- Returns
- projectsList[Project]
All projects associated with this Use Case.
- Return type
List
[TypeVar
(T
)]
- list_datasets()¶
List all datasets associated with this Use Case.
- Returns
- datasetsList[Dataset]
All datasets associated with this Use Case.
- Return type
List
[TypeVar
(T
)]
- list_applications()¶
List all applications associated with this Use Case.
- Returns
- applicationsList[Application]
All applications associated with this Use Case.
- Return type
List
[TypeVar
(T
)]
- classmethod from_data(data)¶
Instantiate an object of this class using a dict.
- Parameters
- datadict
Correctly snake_cased keys and their values.
- Return type
TypeVar
(T
, bound=APIObject
)
- classmethod from_server_data(data, keep_attrs=None)¶
Instantiate an object of this class using the data directly from the server, meaning that the keys may have the wrong camel casing
- Parameters
- datadict
The directly translated dict of JSON from the server. No casing fixes have taken place
- keep_attrsiterable
List, set or tuple of the dotted namespace notations for attributes to keep within the object structure even if their values are None
- Return type
TypeVar
(T
, bound=APIObject
)
- open_in_browser()¶
Opens class’ relevant web browser location. If default browser is not available the URL is logged.
Note: If text-mode browsers are used, the calling process will block until the user exits the browser.
- Return type
None
- class datarobot.models.use_cases.use_case.UseCaseUser(id, full_name=None, email=None, userhash=None, username=None)¶
Representation of a Use Case user.
- Attributes
- idstr
The id of the user.
- full_namestr
The full name of the user. Optional.
- emailstr
The email address of the user. Optional.
- userhashstr
User’s gravatar hash. Optional.
- usernamestr
The username of the user. Optional.
- class datarobot.models.use_cases.use_case.UseCaseReferenceEntity(id, entity_type, entity_id, use_case_id, created_at, created, is_deleted)¶
An entity associated with a Use Case.
- Attributes
- entity_typeUseCaseEntityType
The type of the entity.
- use_case_idstr
The Use Case this entity is associated with.
- idstr
The ID of the entity.
- created_atstr
The date and time this entity was linked with the Use Case.
- is_deletedbool
Whether or not the linked entity has been deleted.
- createdUseCaseUser
The user who created the link between this entity and the Use Case.
User Blueprints¶
- class datarobot.UserBlueprint(blender, blueprint_id, diagram, features, features_text, icons, insights, model_type, supported_target_types, user_blueprint_id, user_id, is_time_series=False, reference_model=False, shap_support=False, supports_gpu=False, blueprint=None, custom_task_version_metadata=None, hex_column_name_lookup=None, project_id=None, vertex_context=None, blueprint_context=None, **kwargs)¶
A representation of a blueprint which may be modified by the user, saved to a user’s AI Catalog, trained on projects, and shared with others.
It is recommended to install the python library called
datarobot_bp_workshop
, available viapip
, for the best experience when building blueprints.Please refer to
http://blueprint-workshop.datarobot.com
for tutorials, examples, and other documentation.- Parameters
- blender: bool
Whether the blueprint is a blender.
- blueprint_id: string
The deterministic id of the blueprint, based on its content.
- custom_task_version_metadata: list[list[string]], Optional
An association of custom entity ids and task ids.
- diagram: string
The diagram used by the UI to display the blueprint.
- features: list[string]
A list of the names of tasks used in the blueprint.
- features_text: string
A description of the blueprint via the names of tasks used.
- hex_column_name_lookup: list[UserBlueprintsHexColumnNameLookupEntry], Optional
A lookup between hex values and data column names used in the blueprint.
- icons: list[int]
The icon(s) associated with the blueprint.
- insights: string
An indication of the insights generated by the blueprint.
- is_time_series: bool (Default=False)
Whether the blueprint contains time-series tasks.
- model_type: string
The generated or provided title of the blueprint.
- project_id: string, Optional
The id of the project the blueprint was originally created with, if applicable.
- reference_model: bool (Default=False)
Whether the blueprint is a reference model.
- shap_support: bool (Default=False)
Whether the blueprint supports shapley additive explanations.
- supported_target_types: list[enum(‘binary’, ‘multiclass’, ‘multilabel’, ‘nonnegative’,
- ‘regression’, ‘unsupervised’, ‘unsupervisedclustering’)]
The list of supported targets of the current blueprint.
- supports_gpu: bool (Default=False)
Whether the blueprint supports execution on the GPU.
- user_blueprint_id: string
The unique id associated with the user blueprint.
- user_id: string
The id of the user who owns the blueprint.
- blueprint: list[dict] or list[UserBlueprintTask], Optional
The representation of a directed acyclic graph defining a pipeline of data through tasks and a final estimator.
- vertex_context: list[VertexContextItem], Optional
Info about, warnings about, and errors with a specific vertex in the blueprint.
- blueprint_context: VertexContextItemMessages
Warnings and errors which may describe or summarize warnings or errors in the blueprint’s vertices
- classmethod list(limit=100, offset=0, project_id=None)¶
Fetch a list of the user blueprints the current user created
- Parameters
- limit: int (Default=100)
The max number of results to return.
- offset: int (Default=0)
The number of results to skip (for pagination).
- project_id: string, Optional
The id of the project, used to filter for original project_id.
- Returns
- list[UserBlueprint]
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
List
[UserBlueprint
]
- classmethod get(user_blueprint_id, project_id=None)¶
Retrieve a user blueprint
- Parameters
- user_blueprint_id: string
Used to identify a specific user-owned blueprint.
- project_id: string (optional, default is None)
String representation of ObjectId for a given project. Used to validate selected columns in the user blueprint.
- Returns
- UserBlueprint
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
- classmethod create(blueprint, model_type=None, project_id=None, save_to_catalog=True)¶
Create a user blueprint
- Parameters
- blueprint: list[dict] or list[UserBlueprintTask]
A list of tasks in the form of dictionaries which define a blueprint.
- model_type: string, Optional
The title to give to the blueprint.
- project_id: string, Optional
The project associated with the blueprint. Necessary in the event of project specific tasks, such as column selection tasks.
- save_to_catalog: bool, (Default=True)
Whether the blueprint being created should be saved to the catalog.
- Returns
- UserBlueprint
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
- classmethod create_from_custom_task_version_id(custom_task_version_id, save_to_catalog=True, description=None)¶
Create a user blueprint with a single custom task version
- Parameters
- custom_task_version_id: string
Id of custom task version from which the user blueprint is created
- save_to_catalog: bool, (Default=True)
Whether the blueprint being created should be saved to the catalog
- description: string (Default=None)
The description for the user blueprint that will be created from the custom task version.
- Returns
- UserBlueprint
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
- classmethod clone_project_blueprint(blueprint_id, project_id, model_type=None, save_to_catalog=True)¶
Clone a blueprint from a project.
- Parameters
- blueprint_id: string
The id associated with the blueprint to create the user blueprint from.
- model_type: string, Optional
The title to give to the blueprint.
- project_id: string
The id of the project which the blueprint to copy comes from.
- save_to_catalog: bool, (Default=True)
Whether the blueprint being created should be saved to the catalog.
- Returns
- UserBlueprint
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
- classmethod clone_user_blueprint(user_blueprint_id, model_type=None, project_id=None, save_to_catalog=True)¶
Clone a user blueprint.
- Parameters
- model_type: string, Optional
The title to give to the blueprint.
- project_id: string, Optional
String representation of ObjectId for a given project. Used to validate selected columns in the user blueprint.
- user_blueprint_id: string
The id of the existing user blueprint to copy.
- save_to_catalog: bool, (Default=True)
Whether the blueprint being created should be saved to the catalog.
- Returns
- UserBlueprint
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
- classmethod update(blueprint, user_blueprint_id, model_type=None, project_id=None, include_project_id_if_none=False)¶
Update a user blueprint
- Parameters
- blueprint: list(dict) or list(UserBlueprintTask)
A list of tasks in the form of dictionaries which define a blueprint. If None, will not be passed.
- model_type: string, Optional
The title to give to the blueprint. If None, will not be passed.
- project_id: string, Optional
The project associated with the blueprint. Necessary in the event of project specific tasks, such as column selection tasks. If None, will not be passed. To explicitly pass None, pass True to include_project_id_if_none (useful if unlinking a blueprint from a project)
- user_blueprint_id: string
Used to identify a specific user-owned blueprint.
- include_project_id_if_none: bool (Default=False)
Allows project_id to be passed as None, instead of ignored. If set to False, will not pass project_id in the API request if it is set to None. If True, the project id will be passed even if it is set to None.
- Returns
- UserBlueprint
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
- classmethod delete(user_blueprint_id)¶
Delete a user blueprint, specified by the userBlueprintId.
- Parameters
- user_blueprint_id: string
Used to identify a specific user-owned blueprint.
- Returns
requests.models.Response
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
Response
- classmethod get_input_types()¶
Retrieve the input types which can be used with User Blueprints.
- Returns
- UserBlueprintAvailableInput
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
- classmethod add_to_project(project_id, user_blueprint_ids)¶
Add a list of user blueprints, by id, to a specified (by id) project’s repository.
- Parameters
- project_id: string
The projectId of the project for the repository to add the specified user blueprints to.
- user_blueprint_ids: list(string) or string
The ids of the user blueprints to add to the specified project’s repository.
- Returns
- UserBlueprintAddToProjectMenu
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
- classmethod get_available_tasks(project_id=None, user_blueprint_id=None)¶
Retrieve the available tasks, organized into categories, which can be used to create or modify User Blueprints.
- Parameters
- project_id: string, Optional
- user_blueprint_id: string, Optional
- Returns
- UserBlueprintAvailableTasks
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
- classmethod validate_task_parameters(output_method, task_code, task_parameters, project_id=None)¶
Validate that each value assigned to specified task parameters are valid.
- Parameters
- output_method: enum(‘P’, ‘Pm’, ‘S’, ‘Sm’, ‘T’, ‘TS’)
The method representing how the task will output data.
- task_code: string
The task code representing the task to validate parameter values.
- task_parameters: list(UserBlueprintTaskParameterValidationRequestParamItem)
A list of task parameters and proposed values to be validated.
- project_id: string (optional, default is None)
The projectId representing the project where this user blueprint is edited.
- Returns
- UserBlueprintValidateTaskParameters
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
Get a list of users, groups and organizations that have an access to this user blueprint
- Parameters
- id: str, Optional
Only return the access control information for a organization, group or user with this ID.
- limit: int (Default=100)
At most this many results are returned.
- name: string, Optional
Only return the access control information for a organization, group or user with this name.
- offset: int (Default=0)
This many results will be skipped.
- share_recipient_type: enum(‘user’, ‘group’, ‘organization’), Optional
Describes the recipient type, either user, group, or organization.
- user_blueprint_id: str
Used to identify a specific user-owned blueprint.
- Returns
- list[UserBlueprintSharedRolesResponseValidator]
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
- classmethod validate_blueprint(blueprint, project_id=None)¶
Validate a user blueprint and return information about the inputs expected and outputs provided by each task.
- Parameters
- blueprint: list(dict) or list(UserBlueprintTask)
The representation of a directed acyclic graph defining a pipeline of data through tasks and a final estimator.
- project_id: string (optional, default is None)
The projectId representing the project where this user blueprint is edited.
- Returns
- list[VertexContextItem]
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
List
[VertexContextItem
]
Share a user blueprint with a user, group, or organization
- Parameters
- user_blueprint_id: str
Used to identify a specific user-owned blueprint.
- roles: list(or(GrantAccessControlWithUsernameValidator, GrantAccessControlWithIdValidator))
Array of GrantAccessControl objects., up to maximum 100 objects.
- Returns
requests.models.Response
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
Response
- classmethod search_catalog(search=None, tag=None, limit=100, offset=0, owner_user_id=None, owner_username=None, order_by='-created')¶
Fetch a list of the user blueprint catalog entries the current user has access to based on an optional search term, tags, owner user info, or sort order.
- Parameters
- search: string, Optional.
A value to search for in the dataset’s name, description, tags, column names, categories, and latest error. The search is case insensitive. If no value is provided for this parameter, or if the empty string is used, or if the string contains only whitespace, no filtering will be done. Partial matching is performed on dataset name and description fields while all other fields will only match if the search matches the whole value exactly.
- tag: string, Optional.
If provided, the results will be filtered to include only items with the specified tag.
- limit: int, Optional. (default: 0), at most this many results are returned. To specify no
limit, use 0. The default may change and a maximum limit may be imposed without notice.
- offset: int, Optional. (default: 0), this many results will be skipped.
- owner_user_id: string, Optional.
Filter results to those owned by one or more owner identified by UID.
- owner_username: string, Optional.
Filter results to those owned by one or more owner identified by username.
- order_by: string, Optional. Defaults to ‘-created’.
Sort order which will be applied to catalog list, valid options are “catalogName”, “originalName”, “description”, “created”, and “relevance”. For all options other than relevance, you may prefix the attribute name with a dash to sort in descending order. e.g. orderBy=’-catalogName’.
- Return type
- class datarobot.models.user_blueprints.models.UserBlueprintAvailableInput(input_types, **kwargs)¶
Retrieve the input types which can be used with User Blueprints.
- Parameters
- input_types: list(UserBlueprintsInputType)
A list of associated pairs of an input types and their human-readable names.
- classmethod get_input_types()¶
Retrieve the input types which can be used with User Blueprints.
- Returns
- UserBlueprintAvailableInput
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
- class datarobot.models.user_blueprints.models.UserBlueprintAddToProjectMenu(added_to_menu, not_added_to_menu=None, message=None, **kwargs)¶
Add a list of user blueprints, by id, to a specified (by id) project’s repository.
- Parameters
- added_to_menu: list(UserBlueprintAddedToMenuItem)
The list of userBlueprintId and blueprintId pairs representing blueprints successfully added to the project repository.
- not_added_to_menu: list(UserBlueprintNotAddedToMenuItem)
The list of userBlueprintId and error message representing blueprints which failed to be added to the project repository.
- message: string
A success message or a list of reasons why the list of blueprints could not be added to the project repository.
- classmethod add_to_project(project_id, user_blueprint_ids)¶
Add a list of user blueprints, by id, to a specified (by id) project’s repository.
- Parameters
- project_id: string
The projectId of the project for the repository to add the specified user blueprints to.
- user_blueprint_ids: list(string)
The ids of the user blueprints to add to the specified project’s repository.
- Returns
- UserBlueprintAddToProjectMenu
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
- class datarobot.models.user_blueprints.models.UserBlueprintAvailableTasks(categories, tasks, **kwargs)¶
Retrieve the available tasks, organized into categories, which can be used to create or modify User Blueprints.
- Parameters
- categories: list(UserBlueprintTaskCategoryItem)
A list of the available task categories, sub-categories, and tasks.
- tasks: list(UserBlueprintTaskLookupEntry)
A list of task codes and their task definitions.
- classmethod get_available_tasks(project_id=None, user_blueprint_id=None)¶
Retrieve the available tasks, organized into categories, which can be used to create or modify User Blueprints.
- Parameters
- project_id: string, Optional
- user_blueprint_id: string, Optional
- Returns
- UserBlueprintAvailableTasks
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
- class datarobot.models.user_blueprints.models.UserBlueprintValidateTaskParameters(errors, **kwargs)¶
Validate that each value assigned to specified task parameters are valid.
- Parameters
- errors: list(UserBlueprintsValidateTaskParameter)
A list of the task parameters, their proposed values, and messages describing why each is not valid.
- classmethod validate_task_parameters(output_method, task_code, task_parameters, project_id=None)¶
Validate that each value assigned to specified task parameters are valid.
- Parameters
- output_method: enum(‘P’, ‘Pm’, ‘S’, ‘Sm’, ‘T’, ‘TS’)
The method representing how the task will output data.
- task_code: string
The task code representing the task to validate parameter values.
- task_parameters: list(UserBlueprintTaskParameterValidationRequestParamItem)
A list of task parameters and proposed values to be validated.
- project_id: string (optional, default is None)
The projectId representing the project where this user blueprint is edited.
- Returns
- UserBlueprintValidateTaskParameters
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
A list of SharedRoles objects.
- Parameters
- share_recipient_type: enum(‘user’, ‘group’, ‘organization’)
Describes the recipient type, either user, group, or organization.
- role: str, one of enum(‘CONSUMER’, ‘EDITOR’, ‘OWNER’)
The role of the org/group/user on this dataset or “NO_ROLE” for removing access when used with route to modify access.
- id: str
The ID of the recipient organization, group or user.
- name: string
The name of the recipient organization, group or user.
- class datarobot.models.user_blueprints.models.VertexContextItem(information, messages, task_id, **kwargs)¶
Info about, warnings about, and errors with a specific vertex in the blueprint.
- Parameters
- task_id: string
The id associated with a specific vertex in the blueprint.
- information: VertexContextItemInfo
- messages: VertexContextItemMessages
- class datarobot.models.user_blueprints.models.UserBlueprintCatalogSearch(id, catalog_name, info_creator_full_name, user_blueprint_id, description=None, last_modifier_full_name=None, **kwargs)¶
An APIObject representing a user blueprint catalog entry the current user has access to based on an optional search term and/or tags.
- Parameters
- id: str
The ID of the catalog entry linked to the user blueprint.
- catalog_name: str
The name of the user blueprint.
- creator: str
The name of the user that created the user blueprint.
- user_blueprint_id: str
The ID of the user blueprint.
- description: str, Optional (Default=None)
The description of the user blueprint.
- last_modifier_name: str, Optional (Default=None)
The name of the user that last modified the user blueprint.
- classmethod search_catalog(search=None, tag=None, limit=100, offset=0, owner_user_id=None, owner_username=None, order_by='-created')¶
Fetch a list of the user blueprint catalog entries the current user has access to based on an optional search term, tags, owner user info, or sort order.
- Parameters
- search: string, Optional.
A value to search for in the dataset’s name, description, tags, column names, categories, and latest error. The search is case insensitive. If no value is provided for this parameter, or if the empty string is used, or if the string contains only whitespace, no filtering will be done. Partial matching is performed on dataset name and description fields while all other fields will only match if the search matches the whole value exactly.
- tag: string, Optional.
If provided, the results will be filtered to include only items with the specified tag.
- limit: int, Optional. (default: 0), at most this many results are returned. To specify no
limit, use 0. The default may change and a maximum limit may be imposed without notice.
- offset: int, Optional. (default: 0), this many results will be skipped.
- owner_user_id: string, Optional.
Filter results to those owned by one or more owner identified by UID.
- owner_username: string, Optional.
Filter results to those owned by one or more owner identified by username.
- order_by: string, Optional. Defaults to ‘-created’.
Sort order which will be applied to catalog list, valid options are “catalogName”, “originalName”, “description”, “created”, and “relevance”. For all options other than relevance, you may prefix the attribute name with a dash to sort in descending order. e.g. orderBy=’-catalogName’.
- Return type
VisualAI¶
- class datarobot.models.visualai.Image(image_id, project_id, height=0, width=0)¶
An image stored in a project’s dataset.
- Attributes
- idstr
Image ID for this image.
- image_typestr
Image media type. Accessing this may require a server request and an associated delay in returning.
- image_bytesbytes
Raw bytes of this image. Accessing this may require a server request and an associated delay in returning.
- heightint
Height of the image in pixels.
- widthint
Width of the image in pixels.
- class datarobot.models.visualai.SampleImage(project_id, image_id, height, width, target_value=None)¶
A sample image in a project’s dataset.
If
Project.stage
isdatarobot.enums.PROJECT_STAGE.EDA2
then thetarget_*
attributes of this class will have values, otherwise the values will all be None.- Attributes
- imageImage
Image object.
- target_valueTargetValue
Value associated with the
feature_name
.- project_idstr
Id of the project that contains the images.
- classmethod list(project_id, feature_name, target_value=None, target_bin_start=None, target_bin_end=None, offset=None, limit=None)¶
Get sample images from a project.
- Parameters
- project_idstr
Project that contains the images.
- feature_namestr
Name of feature column that contains images.
- target_valueTargetValue
For classification projects - target value to filter images. Please note that you can only use this parameter when the project has finished the EDA2 stage.
- target_bin_startOptional[Union[int, float]]
For regression projects - only images corresponding to the target values above (inclusive) this value will be returned. Must be specified together with target_bin_end. Please note that you can only use this parameter when the project has finished the EDA2 stage.
- target_bin_endOptional[Union[int, float]]
For regression projects - only images corresponding to the target values below (exclusive) this value will be returned. Must be specified together with target_bin_start. Please note that you can only use this parameter when the project has finished the EDA2 stage.
- offsetOptional[int]
Number of images to be skipped.
- limitOptional[int]
Number of images to be returned.
- Return type
List
[SampleImage
]
- class datarobot.models.visualai.DuplicateImage(image_id, row_count, project_id)¶
An image that was duplicated in the project dataset.
- Attributes
- imageImage
Image object.
- countint
Number of times the image was duplicated.
- classmethod list(project_id, feature_name, offset=None, limit=None)¶
Get all duplicate images in a project.
- Parameters
- project_idstr
Project that contains the images.
- feature_namestr
Name of feature column that contains images.
- offsetOptional[int]
Number of images to be skipped.
- limitOptional[int]
Number of images to be returned.
- Return type
List
[DuplicateImage
]
- class datarobot.models.visualai.ImageEmbedding(feature_name, position_x, position_y, image_id, project_id, model_id, actual_target_value=None, target_values=None, target_bins=None)¶
Vector representation of an image in an embedding space.
A vector in an embedding space will allow linear computations to be carried out between images: for example computing the Euclidean distance of the images.
- Attributes
- imageImage
Image object used to create this map.
- feature_namestr
Name of the feature column this embedding is associated with.
- position_xint
X coordinate of the image in the embedding space.
- position_yint
Y coordinate of the image in the embedding space.
- actual_target_valueobject
Actual target value of the dataset row.
- target_valuesOptional[List[str]]
For classification projects, a list of target values of this project.
- target_binsOptional[List[Dict[str, float]]]
For regression projects, a list of target bins of this project.
- project_idstr
Id of the project this Image Embedding belongs to.
- model_idstr
Id of the model this Image Embedding belongs to.
- classmethod compute(project_id, model_id)¶
Start the computation of image embeddings for the model.
- Parameters
- project_idstr
Project to start creation in.
- model_idstr
Project’s model to start creation in.
- Returns
- str
URL to check for image embeddings progress.
- Raises
- datarobot.errors.ClientError
Server rejected creation due to client error. Most likely cause is bad
project_id
ormodel_id
.
- Return type
str
- classmethod models(project_id)¶
For a given project_id, list all model_id - feature_name pairs with available Image Embeddings.
- Parameters
- project_idstr
Id of the project to list model_id - feature_name pairs with available Image Embeddings for.
- Returns
- list( tuple(model_id, feature_name) )
List of model and feature name pairs.
- Return type
List
[Tuple
[str
,str
]]
- classmethod list(project_id, model_id, feature_name)¶
Return a list of ImageEmbedding objects.
- Parameters
- project_id: str
Id of the project the model belongs to.
- model_id: str
Id of the model to list Image Embeddings for.
- feature_name: str
Name of feature column to list Image Embeddings for.
- Return type
List
[ImageEmbedding
]
- class datarobot.models.visualai.ImageActivationMap(feature_name, activation_values, image_width, image_height, image_id, overlay_image_id, project_id, model_id, actual_target_value=None, predicted_target_value=None, target_values=None, target_bins=None)¶
Mark areas of image with weight of impact on training.
This is a technique to display how various areas of the region were used in training, and their effect on predictions. Larger values in
activation_values
indicates a larger impact.- Attributes
- imageImage
Image object used to create this map.
- overlay_imageImage
Image object containing the original image overlaid by the activation heatmap.
- feature_namestr
Name of the feature column that contains the value this map is based on.
- activation_valuesList[List[int]]
A row-column matrix that contains the activation strengths for image regions. Values are integers in the range [0, 255].
- actual_target_valueTargetValue
Actual target value of the dataset row.
- predicted_target_valueTargetValue
Predicted target value of the dataset row that contains this image.
- target_valuesOptional[List[str]]
For classification projects a list of target values of this project.
- target_binsOptional[List[Dict[str, float]]]
For regression projects a list of target bins.
- project_idstr
Id of the project this Activation Map belongs to.
- model_idstr
Id of the model this Activation Map belongs to.
- classmethod compute(project_id, model_id)¶
Start the computation of activation maps for the given model.
- Parameters
- project_idstr
Project to start creation in.
- model_idstr
Project’s model to start creation in.
- Returns
- str
URL to check for image embeddings progress.
- Raises
- datarobot.errors.ClientError
Server rejected creation due to client error. Most likely cause is bad
project_id
ormodel_id
.
- Return type
str
- classmethod models(project_id)¶
For a given project_id, list all model_id - feature_name pairs with available Image Activation Maps.
- Parameters
- project_idstr
Id of the project to list model_id - feature_name pairs with available Image Activation Maps for.
- Returns
- list( tuple(model_id, feature_name) )
List of model and feature name pairs.
- Return type
List
[Tuple
[str
,str
]]
- classmethod list(project_id, model_id, feature_name, offset=None, limit=None)¶
Return a list of ImageActivationMap objects.
- Parameters
- project_idstr
Project that contains the images.
- model_idstr
Model that contains the images.
- feature_namestr
Name of feature column that contains images.
- offsetOptional[int]
Number of images to be skipped.
- limitOptional[int]
Number of images to be returned.
- Return type
List
[ImageActivationMap
]
- class datarobot.models.visualai.ImageAugmentationOptions(id, name, project_id, min_transformation_probability, current_transformation_probability, max_transformation_probability, min_number_of_new_images, current_number_of_new_images, max_number_of_new_images, transformations=None)¶
A List of all supported Image Augmentation Transformations for a project. Includes additional information about minimum, maximum, and default values for a transformation.
- Attributes
- name: string
The name of the augmentation list
- project_id: string
The project containing the image data to be augmented
- min_transformation_probability: float
The minimum allowed value for transformation probability.
- current_transformation_probability: float
Default setting for probability that each transformation will be applied to an image.
- max_transformation_probability: float
The maximum allowed value for transformation probability.
- min_number_of_new_images: int
The minimum allowed number of new rows to add for each existing row
- current_number_of_new_images: int
The default number of new rows to add for each existing row
- max_number_of_new_images: int
The maximum allowed number of new rows to add for each existing row
- transformations: list[dict]
List of transformations to possibly apply to each image
- classmethod get(project_id)¶
Returns a list of all supported transformations for the given project
- Parameters
project_id (
str
) – sting The id of the project for which to return the list of supported transformations.- Return type
- Returns
- ImageAugmentationOptions
A list containing all the supported transformations for the project.
- class datarobot.models.visualai.ImageAugmentationList(id, name, project_id, feature_name=None, in_use=False, initial_list=False, transformation_probability=0.0, number_of_new_images=1, transformations=None, samples_id=None)¶
A List of Image Augmentation Transformations
- Attributes
- name: string
The name of the augmentation list
- project_id: string
The project containing the image data to be augmented
- feature_name: string (optional)
name of the feature that the augmentation list is associated with
- in_use: boolean
Whether this is the list that will passed in to every blueprint during blueprint generation before autopilot
- initial_list: boolean
True if this is the list to be used during training to produce augmentations
- transformation_probability: float
Probability that each transformation will be applied to an image. Value should be between 0.01 - 1.0.
- number_of_new_images: int
Number of new rows to add for each existing row
- transformations: array
List of transformations to possibly apply to each image
- samples_id: str
Id of last image augmentation sample generated for image augmentation list.
- classmethod create(name, project_id, feature_name=None, in_use=None, initial_list=False, transformation_probability=0.0, number_of_new_images=1, transformations=None, samples_id=None)¶
create a new image augmentation list
- Return type
- classmethod list(project_id, feature_name=None)¶
List Image Augmentation Lists present in a project.
- Parameters
- project_idstr
Project Id to retrieve augmentation lists for.
- feature_nameOptional[str]
If passed, the response will only include Image Augmentation Lists active for the provided feature name.
- Returns
- list[ImageAugmentationList]
- Return type
List
[ImageAugmentationList
]
- update(name=None, feature_name=None, initial_list=None, transformation_probability=None, number_of_new_images=None, transformations=None)¶
Update one or multiple attributes of the Image Augmentation List in the DataRobot backend as well on this object.
- Parameters
- nameOptional[str]
New name of the feature list.
- feature_nameOptional[str]
The new feature name for which the Image Augmentation List is effective.
- initial_listOptional[bool]
New flag that indicates whether this list will be used during Autopilot to perform image augmentation.
- transformation_probabilityOptional[float]
New probability that each enabled transformation will be applied to an image. This does not apply to Horizontal or Vertical Flip, which are always set to 50%.
- number_of_new_imagesOptional[int]
New number of new rows to add for each existing row, updating the existing augmentation list.
- transformationsOptional[list]
New list of Transformations to possibly apply to each image.
- Returns
- ImageAugmentationList
Reference to self. The passed values will be updated in place.
- Return type
- retrieve_samples()¶
Lists already computed image augmentation sample for image augmentation list. Returns samples only if they have been already computed. It does not initialize computation.
- Returns
- List of class ImageAugmentationSample
- Return type
List
[ImageAugmentationSample
]
- compute_samples(max_wait=600)¶
Initializes computation and retrieves list of image augmentation samples for image augmentation list. If samples exited prior to this call method, this will compute fresh samples and return latest version of samples.
- Returns
- List of class ImageAugmentationSample
- Return type
List
[ImageAugmentationSample
]
- class datarobot.models.visualai.ImageAugmentationSample(image_id, project_id, height, width, original_image_id=None, sample_id=None)¶
A preview of the type of images that augmentations will create during training.
- Attributes
- sample_idObjectId
The id of the augmentation sample, used to group related images together
- image_idObjectId
A reference to the Image which can be used to retrieve the image binary
- project_idObjectId
A reference to the project containing the image
- original_image_idObjectId
A reference to the original image that generated this image in the case of an augmented image. If this is None it signifies this is an original image
- heightint
Image height in pixels
- widthint
Image width in pixels
- classmethod list(auglist_id=None)¶
Return a list of ImageAugmentationSample objects.
- Parameters
- auglist_id: str
ID for augmentation list to retrieve samples for
- Returns
- List of class ImageAugmentationSample
- Return type
List
[ImageAugmentationSample
]
Word Cloud¶
- class datarobot.models.word_cloud.WordCloud(ngrams)¶
Word cloud data for the model.
Notes
WordCloudNgram
is a dict containing the following:ngram
(str) Word or ngram value.coefficient
(float) Value from [-1.0, 1.0] range, describes effect of this ngram on the target. Large negative value means strong effect toward negative class in classification and smaller target value in regression models. Large positive - toward positive class and bigger value respectively.count
(int) Number of rows in the training sample where this ngram appears.frequency
(float) Value from (0.0, 1.0] range, relative frequency of given ngram to most frequent ngram.is_stopword
(bool) True for ngrams that DataRobot evaluates as stopwords.class
(str or None) For classification - values of the target class for corresponding word or ngram. For regression - None.
- Attributes
- ngramslist of dicts
List of dicts with schema described as
WordCloudNgram
above.
- most_frequent(top_n=5)¶
Return most frequent ngrams in the word cloud.
- Parameters
- top_nint
Number of ngrams to return
- Returns
- list of dict
Up to top_n top most frequent ngrams in the word cloud. If top_n bigger then total number of ngrams in word cloud - return all sorted by frequency in descending order.
- Return type
List
[WordCloudNgram
]
- most_important(top_n=5)¶
Return most important ngrams in the word cloud.
- Parameters
- top_nint
Number of ngrams to return
- Returns
- list of dict
Up to top_n top most important ngrams in the word cloud. If top_n bigger then total number of ngrams in word cloud - return all sorted by absolute coefficient value in descending order.
- Return type
List
[WordCloudNgram
]
- ngrams_per_class()¶
Split ngrams per target class values. Useful for multiclass models.
- Returns
- dict
Dictionary in the format of (class label) -> (list of ngrams for that class)
- Return type
Dict
[Optional
[str
],List
[WordCloudNgram
]]
- class datarobot.models.word_cloud.WordCloudNgram() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
Data Slices¶
- class datarobot.models.data_slice.DataSlice(id=None, name=None, filters=None, project_id=None)¶
Definition of a data slice
- Attributes
- idstr
ID of the data slice.
- namestr
Name of the data slice definition.
- filterslist[DataSliceFiltersType]
- List of filters (dict) with params:
- operandstr
Name of the feature to use in the filter.
- operatorstr
Operator to use in the filter: ‘eq’, ‘in’, ‘<’, or ‘>’.
- valuesUnion[str, int, float]
Values to use from the feature.
- project_idstr
ID of the project that the model is part of.
- classmethod list(project, offset=0, limit=100)¶
List the data slices in the same project
- Parameters
- projectUnion[str, Project]
ID of the project or Project object from which to list data slices.
- offsetint, optional
Number of items to skip.
- limitint, optional
Number of items to return.
- Returns
- data_sliceslist[DataSlice]
Examples
>>> import datarobot as dr >>> ... # set up your Client >>> data_slices = dr.DataSlice.list("646d0ea0cd8eb2355a68b0e5") >>> data_slices [DataSlice(...), DataSlice(...), ...]
- Return type
List
[DataSlice
]
- classmethod create(name, filters, project)¶
Creates a data slice in the project with the given name and filters
- Parameters
- namestr
Name of the data slice definition.
- filterslist[DataSliceFiltersType]
- List of filters (dict) with params:
- operandstr
Name of the feature to use in filter.
- operatorstr
Operator to use: ‘eq’, ‘in’, ‘<’, or ‘>’.
- valuesUnion[str, int, float]
Values to use from the feature.
- projectUnion[str, Project]
Project ID or Project object from which to list data slices.
- Returns
- data_sliceDataSlice
The data slice object created
Examples
>>> import datarobot as dr >>> ... # set up your Client and retrieve a project >>> data_slice = dr.DataSlice.create( >>> ... name='yes', >>> ... filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}], >>> ... project=project, >>> ... ) >>> data_slice DataSlice( filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}], id=646d1296bd0c543d88923c9d, name=yes, project_id=646d0ea0cd8eb2355a68b0e5 )
- Return type
- delete()¶
Deletes the data slice from storage
Examples
>>> import datarobot as dr >>> data_slice = dr.DataSlice.get('5a8ac9ab07a57a0001be501f') >>> data_slice.delete()
>>> import datarobot as dr >>> ... # get project or project_id >>> data_slices = dr.DataSlice.list(project) # project object or project_id >>> data_slice = data_slices[0] # choose a data slice from the list >>> data_slice.delete()
- Return type
None
- request_size(source, model=None)¶
Submits a request to validate the data slice’s filters and calculate the data slice’s number of rows on a given source
- Parameters
- sourceINSIGHTS_SOURCES
Subset of data (partition or “source”) on which to apply the data slice for estimating available rows.
- modelOptional[Union[str, Model]]
Model object or ID of the model. It is only required when source is “training”.
- Returns
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
Examples
>>> import datarobot as dr >>> ... # get project or project_id >>> data_slices = dr.DataSlice.list(project) # project object or project_id >>> data_slice = data_slices[0] # choose a data slice from the list >>> status_check_job = data_slice.request_size("validation")
Model is required when source is ‘training’
>>> import datarobot as dr >>> ... # get project or project_id >>> data_slices = dr.DataSlice.list(project) # project object or project_id >>> data_slice = data_slices[0] # choose a data slice from the list >>> status_check_job = data_slice.request_size("training", model)
- Return type
- get_size_info(source, model=None)¶
Get information about the data slice applied to a source
- Parameters
- sourceINSIGHTS_SOURCES
Source (partition or subset) to which the data slice was applied
- modelOptional[Union[str, Model]]
ID for the model whose training data was sliced with this data slice. Required when the source is “training”, and not used for other sources.
- Returns
- slice_size_infoDataSliceSizeInfo
Information of the data slice applied to a source
Examples
>>> import datarobot as dr >>> ... # set up your Client >>> data_slices = dr.DataSlice.list("646d0ea0cd8eb2355a68b0e5") >>> data_slice = slices[0] # can be any slice in the list >>> data_slice_size_info = data_slice.get_size_info("validation") >>> data_slice_size_info DataSliceSizeInfo( data_slice_id=6493a1776ea78e6644382535, messages=[ { 'level': 'WARNING', 'description': 'Low Observation Count', 'additional_info': 'Insufficient number of observations to compute some insights.' } ], model_id=None, project_id=646d0ea0cd8eb2355a68b0e5, slice_size=1, source=validation, ) >>> data_slice_size_info.to_dict() { 'data_slice_id': '6493a1776ea78e6644382535', 'messages': [ { 'level': 'WARNING', 'description': 'Low Observation Count', 'additional_info': 'Insufficient number of observations to compute some insights.' } ], 'model_id': None, 'project_id': '646d0ea0cd8eb2355a68b0e5', 'slice_size': 1, 'source': 'validation', }
>>> import datarobot as dr >>> ... # set up your Client >>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535") >>> data_slice_size_info = data_slice.get_size_info("validation")
When using source=’training’, the model param is required.
>>> import datarobot as dr >>> ... # set up your Client >>> model = dr.Model.get(project_id, model_id) >>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535") >>> data_slice_size_info = data_slice.get_size_info("training", model)
>>> import datarobot as dr >>> ... # set up your Client >>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535") >>> data_slice_size_info = data_slice.get_size_info("training", model_id)
- Return type
- classmethod get(data_slice_id)¶
Retrieve a specific data slice.
- Parameters
- data_slice_idstr
The identifier of the data slice to retrieve.
- Returns
- data_slice: DataSlice
The required data slice.
Examples
>>> import datarobot as dr >>> dr.DataSlice.get('648b232b9da812a6aaa0b7a9') DataSlice(filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}], id=648b232b9da812a6aaa0b7a9, name=test, project_id=644bc575572480b565ca42cd )
- Return type
- class datarobot.models.data_slice.DataSliceSizeInfo(data_slice_id=None, project_id=None, source=None, slice_size=None, messages=None, model_id=None)¶
Definition of a data slice applied to a source
- Attributes
- data_slice_idstr
ID of the data slice
- project_idstr
ID of the project
- sourcestr
Data source used to calculate the number of rows (slice size) after applying the data slice’s filters
- model_idstr, optional
ID of the model, required when source (subset) is ‘training’
- slice_sizeint
Number of rows in the data slice for a given source
- messageslist[DataSliceSizeMessageType]
List of user-relevant messages related to a data slice
Batch Job¶
- class datarobot.models.batch_job.IntakeSettings(*args, **kwargs)¶
Intake settings typed dict
- class datarobot.models.batch_job.OutputSettings(*args, **kwargs)¶
Output settings typed dict
Key-Values¶
- class datarobot.models.key_values.KeyValue(id, created_at, entity_id, entity_type, name, value, numeric_value, boolean_value, value_type, description, creator_id, creator_name, category, artifact_size, original_file_name, is_editable, is_dataset_missing, error_message)¶
A DataRobot Key-Value.
New in version v3.4.
- Attributes
- id: str
ID of the Key-Value
- created_at: str
creation time of the Key-Value
- entity_id: str
ID of the related Entity
- entity_type: KeyValueEntityType
type of the related Entity
- name: str
Key-Value name
- value: str
Key-Value value
- numeric_value: float
Key-Value numeric value
- boolean_value: bool
Key-Value boolean value
- value_type: KeyValueType
Key-Value type
- description: str
Key-Value description
- creator_id: str
ID of the user who created the Key-Value
- creator_name: str
ID of the user who created the Key-Value
- category: KeyValueCategory
Key-Value category
- artifact_size: int
size in bytes of associated image, if applicable
- original_file_name: str
name of uploaded original image or dataset file
- is_editable: bool
true if a user with permissions can edit or delete
- is_dataset_missing: bool
true if the key-value type is “dataset” and its dataset is not visible to the user
- error_message: str
additional information if “isDataSetMissing” is true. Blank if there are no errors
- classmethod get(key_value_id)¶
Get Key-Value by id.
New in version v3.4.
- Parameters
- key_value_id: str
ID of the Key-Value
- Returns
- KeyValue
retrieved Key-Value
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- Return type
- classmethod list(entity_id, entity_type)¶
List Key-Values.
New in version v3.4.
- Parameters
- entity_id: str
ID of the related Entity
- entity_type: KeyValueEntityType
type of the related Entity
- Returns
- List[KeyValue]
a list of Key-Values
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
List
[KeyValue
]
- classmethod find(entity_id, entity_type, name)¶
Find Key-Value by name.
New in version v3.4.
- Parameters
- entity_id: str
ID of the related Entity
- entity_type: KeyValueEntityType
type of the related Entity
- name: str
name of the Key-Value
- Returns
- List[KeyValue]
a list of Key-Values
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
Optional
[KeyValue
]
- classmethod create(entity_id, entity_type, name, category, value_type, value=None, description=None)¶
Create a Key-Value.
New in version v3.4.
- Parameters
- entity_id: str
ID of the associated resource
- entity_type: KeyValueEntityType
type of the associated resource
- name: str
name of the Key-Value. Cannot contain: { } ; |
- category: KeyValueCategory
category of the Key-Value
- value_type: KeyValueType
type of the Key-Value value
- value: Optional[Union[str, float, bool]]
value of Key-Value
- description: Optional[str]
description of the Key-Value
- Returns
- KeyValue
created Key-Value
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- Return type
- update(entity_id=None, entity_type=None, name=None, category=None, value_type=None, value=None, description=None, comment=None)¶
Update Key-Value.
New in version v3.4.
- Parameters
- entity_id: Optional[str]
ID of the associated resource
- entity_type: Optional[KeyValueEntityType]
type of the associated resource
- name: Optional[str]
name of the Key-Value. Cannot contain: { } ; |
- category: Optional[KeyValueCategory]
category of the Key-Value
- value_type: Optional[KeyValueType]
type of the Key-Value value
- value: Optional[[Union[str, float, bool]]
value of Key-Value
- description: Optional[str]
description of the Key-Value
- comment: Optional[str]
user comment explaining the change
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status.
- datarobot.errors.ServerError
if the server responded with 5xx status.
- Return type
None
- refresh()¶
Update Key-Value with the latest data from server.
New in version v3.4.
- Raises
- datarobot.errors.ClientError
if the server responded with 4xx status
- datarobot.errors.ServerError
if the server responded with 5xx status
- Return type
None
- delete()¶
Delete Key-Value.
New in version v3.4.
- Raises
- datarobot.errors.ClientError
If the server responded with 4xx status.
- datarobot.errors.ServerError
If the server responded with 5xx status.
- Return type
None
- get_value()¶
Get a value of Key-Value.
New in version v3.4.
- Returns
- Union[str, float, boolean]
value depending on the value type
- Return type
Union
[str
,float
,bool
]
- class datarobot.enums.KeyValueCategory(value)¶
Key-Value category
- class datarobot.enums.KeyValueEntityType(value)¶
Key-Value entity type
- class datarobot.enums.KeyValueType(value)¶
Key-Value type
Document text extraction¶
- class datarobot.models.documentai.document.FeaturesWithSamples(model_id, feature_name, document_task)¶
- document_task¶
Alias for field number 2
- feature_name¶
Alias for field number 1
- model_id¶
Alias for field number 0
- class datarobot.models.documentai.document.DocumentPageFile(document_page_id, project_id=None, height=0, width=0, download_link=None)¶
Page of a document as an image file.
- Attributes
- project_idstr
The identifier of the project which the document page belongs to.
- document_page_idstr
The unique identifier for the document page.
- heightint
The height of the document thumbnail in pixels.
- widthint
The width of the document thumbnail in pixels.
thumbnail_bytes
bytesDocument thumbnail as bytes.
mime_type
strMime image type of the document thumbnail.
- property thumbnail_bytes: bytes¶
Document thumbnail as bytes.
- Returns
- bytes
Document thumbnail.
- Return type
bytes
- property mime_type: str¶
Mime image type of the document thumbnail. Example: ‘image/png’
- Returns
- str
Mime image type of the document thumbnail.
- Return type
str
- class datarobot.models.documentai.document.DocumentThumbnail(project_id, document_page_id, height=0, width=0, target_value=None)¶
Thumbnail of document from the project’s dataset.
If
Project.stage
isdatarobot.enums.PROJECT_STAGE.EDA2
and it is a supervised project then thetarget_*
attributes of this class will have values, otherwise the values will all be None.- Attributes
- document: Document
The document object.
- project_idstr
The identifier of the project which the document thumbnail belongs to.
- target_value: str
The target value used for filtering thumbnails.
- classmethod list(project_id, feature_name, target_value=None, offset=None, limit=None)¶
Get document thumbnails from a project.
- Parameters
- project_idstr
The identifier of the project which the document thumbnail belongs to.
- feature_namestr
The name of feature that specifies the document type.
- target_valueOptional[str], default
None
The target value to filter thumbnails.
- offsetOptional[int], default
None
The number of documents to be skipped.
- limitOptional[int], default
None
The number of document thumbnails to return.
- Returns
- documentsList[DocumentThumbnail]
A list of
DocumentThumbnail
objects, each representing a single document.
Notes
Actual document thumbnails are not fetched from the server by this method. Instead the data gets loaded lazily when
DocumentPageFile
object attributes are accessed.Examples
Fetch document thumbnails for the given
project_id
andfeature_name
.from datarobot._experimental.models.documentai.document import DocumentThumbnail # Fetch five documents from the EDA SAMPLE for the specified project and specific feature document_thumbs = DocumentThumbnail.list(project_id, feature_name, limit=5) # Fetch five documents for the specified project with target value filtering # This option is only available after selecting the project target and starting modeling target1_thumbs = DocumentThumbnail.list(project_id, feature_name, target_value='target1', limit=5)
Preview the document thumbnail.
from datarobot._experimental.models.documentai.document import DocumentThumbnail from datarobot.helpers.image_utils import get_image_from_bytes # Fetch 3 documents document_thumbs = DocumentThumbnail.list(project_id, feature_name, limit=3) for doc_thumb in document_thumbs: thumbnail = get_image_from_bytes(doc_thumb.document.thumbnail_bytes) thumbnail.show()
- Return type
List
[DocumentThumbnail
]
- class datarobot.models.documentai.document.DocumentTextExtractionSample¶
Stateless class for computing and retrieving Document Text Extraction Samples.
Notes
Actual document text extraction samples are not fetched from the server in the moment of a function call. Detailed information on the documents, the pages and the rendered images of them are fetched when accessed on demand (lazy loading).
Examples
1) Compute text extraction samples for a specific model, and fetch all existing document text extraction samples for a specific project.
from datarobot._experimental.models.documentai.document import DocumentTextExtractionSample SPECIFIC_MODEL_ID1 = "model_id1" SPECIFIC_MODEL_ID2 = "model_id2" SPECIFIC_PROJECT_ID = "project_id" # Order computation of document text extraction sample for specific model. # By default `compute` method will await for computation to end before returning DocumentTextExtractionSample.compute(SPECIFIC_MODEL_ID1, await_completion=False) DocumentTextExtractionSample.compute(SPECIFIC_MODEL_ID2) samples = DocumentTextExtractionSample.list_features_with_samples(SPECIFIC_PROJECT_ID)
2) Fetch document text extraction samples for a specific model_id and feature_name, and display all document sample pages.
from datarobot._experimental.models.documentai.document import DocumentTextExtractionSample from datarobot.helpers.image_utils import get_image_from_bytes SPECIFIC_MODEL_ID = "model_id" SPECIFIC_FEATURE_NAME = "feature_name" samples = DocumentTextExtractionSample.list_pages( model_id=SPECIFIC_MODEL_ID, feature_name=SPECIFIC_FEATURE_NAME ) for sample in samples: thumbnail = sample.document_page.thumbnail image = get_image_from_bytes(thumbnail.thumbnail_bytes) image.show()
3) Fetch document text extraction samples for specific model_id and feature_name and display text extraction details for the first page. This example displays the image of the document with bounding boxes of detected text lines. It also returns a list of all text lines extracted from page along with their coordinates.
from datarobot._experimental.models.documentai.document import DocumentTextExtractionSample SPECIFIC_MODEL_ID = "model_id" SPECIFIC_FEATURE_NAME = "feature_name" samples = DocumentTextExtractionSample.list_pages(SPECIFIC_MODEL_ID, SPECIFIC_FEATURE_NAME) # Draw bounding boxes for first document page sample and display related text data. image = samples[0].get_document_page_with_text_locations() image.show() # For each text block represented as bounding box object drawn on original image # display its coordinates (top, left, bottom, right) and extracted text value for text_line in samples[0].text_lines: print(text_line)
- classmethod compute(model_id, await_completion=True, max_wait=600)¶
Starts computation of document text extraction samples for the model and, if successful, returns computed text samples for it. This method allows calculation to continue for a specified time and, if not complete, cancels the request.
- Parameters
- model_id: str
The identifier of the project’s model that start the creation of the cluster insights.
- await_completion: bool
Determines whether the method should wait for completion before exiting or not.
- max_wait: int (default=600)
The maximum number of seconds to wait for the request to finish before raising an AsyncTimeoutError.
- Raises
- ClientError
Server rejected creation due to client error. Often, a bad model_id is causing these errors.
- AsyncFailureError
Indicates whether any of the responses from the server are unexpected.
- AsyncProcessUnsuccessfulError
Indicates whether the cluster insights computation failed or was cancelled.
- AsyncTimeoutError
Indicates whether the cluster insights computation did not resolve within the specified time limit (max_wait).
- Return type
None
- classmethod list_features_with_samples(project_id)¶
Returns a list of features, model_id pairs with computed document text extraction samples.
- Parameters
- project_id: str
The project ID to retrieve the list of computed samples for.
- Returns
- List[FeaturesWithSamples]
- Return type
List
[FeaturesWithSamples
]
- classmethod list_pages(model_id, feature_name, document_index=None, document_task=None)¶
Returns a list of document text extraction sample pages.
- Parameters
- model_id: str
The model identifier.
- feature_name: str
The specific feature name to retrieve.
- document_index: Optional[int]
The specific document index to retrieve. Defaults to None.
- document_task: Optional[str]
The document blueprint task.
- Returns
- List[DocumentTextExtractionSamplePage]
- Return type
- classmethod list_documents(model_id, feature_name)¶
Returns a list of documents used for text extraction.
- Parameters
- model_id: str
The model identifier.
- feature_name: str
The feature name.
- Returns
- List[DocumentTextExtractionSampleDocument]
- Return type
- class datarobot.models.documentai.document.DocumentTextExtractionSampleDocument(document_index, feature_name, thumbnail_id, thumbnail_width, thumbnail_height, thumbnail_link, document_task, actual_target_value=None, prediction=None)¶
Document text extraction source.
Holds data that contains feature and model prediction values, as well as the thumbnail of the document.
- Attributes
- document_index: int
The index of the document page sample.
- feature_name: str
The name of the feature that the document text extraction sample is related to.
- thumbnail_id: str
The document page ID.
- thumbnail_width: int
The thumbnail image width.
- thumbnail_height: int
The thumbnail image height.
- thumbnail_link: str
The thumbnail image download link.
- document_task: str
The document blueprint task that the document belongs to.
- actual_target_value: Optional[Union[str, int, List[str]]]
The actual target value.
- prediction: Optional[PredictionType]
Prediction values and labels.
- classmethod list(model_id, feature_name, document_task=None)¶
List available documents with document text extraction samples.
- Parameters
- model_id: str
The identifier for the model.
- feature_name: str
The name of the feature,
- document_task: Optional[str]
The document blueprint task.
- Returns
- List[DocumentTextExtractionSampleDocument]
- Return type
- class datarobot.models.documentai.document.DocumentTextExtractionSamplePage(page_index, document_index, feature_name, document_page_id, document_page_width, document_page_height, document_page_link, text_lines, document_task, actual_target_value=None, prediction=None)¶
Document text extraction sample covering one document page.
Holds data about the document page, the recognized text, and the location of the text in the document page.
- Attributes
- page_index: int
Index of the page inside the document
- document_index: int
Index of the document inside the dataset
- feature_name: str
The name of the feature that the document text extraction sample belongs to.
- document_page_id: str
The document page ID.
- document_page_width: int
Document page width.
- document_page_height: int
Document page height.
- document_page_link: str
Document page link to download the document page image.
- text_lines: List[Dict[str, Union[int, str]]]
A list of text lines and their coordinates.
- document_task: str
The document blueprint task that the page belongs to.
- actual_target_value: Optional[Union[str, int, List[str]]
Actual target value.
- prediction: Optional[PredictionType]
Prediction values and labels.
- classmethod list(model_id, feature_name, document_index=None, document_task=None)¶
Returns a list of document text extraction sample pages.
- Parameters
- model_id: str
The model identifier, used to retrieve document text extraction page samples.
- feature_name: str
The feature name, used to retrieve document text extraction page samples.
- document_index: Optional[int]
The specific document index to retrieve. Defaults to None.
- document_task: Optional[str]
Document blueprint task.
- Returns
- List[DocumentTextExtractionSamplePage]
- Return type
- get_document_page_with_text_locations(line_color='blue', line_width=3, padding=3)¶
Returns the document page with bounding boxes drawn around the text lines as a PIL.Image.
- Parameters
- line_color: str
The color used to draw a bounding box on the image page. Defaults to blue.
- line_width: int
The line width of the bounding boxes that will be drawn. Defaults to 3.
- padding: int
The additional space left between the text and the bounding box, measured in pixels. Defaults to 3.
- Returns
- Image
Returns a PIL.Image with drawn text-bounding boxes.
- Return type
Image
Binary Data Helpers¶
- datarobot.helpers.binary_data_utils.get_encoded_image_contents_from_urls(urls, custom_headers=None, image_options=None, continue_on_error=False, n_threads=None)¶
Returns base64 encoded string of images located in addresses passed in input collection. Input collection should hold data of valid image url addresses reachable from location where code is being executed. Method will retrieve image, apply specified reformatting before converting contents to base64 string. Results will in same order as specified in input collection.
- Parameters
- urls: Iterable
Iterable with url addresses to download images from
- custom_headers: dict
Dictionary containing custom headers to use when downloading files using a URL. Detailed data related to supported Headers in HTTP can be found in the RFC specification for headers: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html When used, specified passed values will overwrite default header values.
- image_options: ImageOptions class
Class holding parameters for use in image transformation and formatting.
- continue_on_error: bool
If one of rows encounters error while retrieving content (i.e. file does not exist) should this error terminate process of downloading consecutive files or should process continue skipping this file.
- n_threads: int or None
Number of threads to use for processing. If “None” is passed, the number of threads is determined automatically based on the number of available CPU cores. If this is not possible, 4 threads are used.
- Returns
- List of base64 encoded strings representing reformatted images.
- Raises
- ContentRetrievalTerminatedError:
The error is raised when the flag continue_on_error is set to` False` and processing has been terminated due to an exception while loading the contents of the file.
- Return type
List
[Optional
[str
]]
- datarobot.helpers.binary_data_utils.get_encoded_image_contents_from_paths(paths, image_options=None, continue_on_error=False, n_threads=None)¶
Returns base64 encoded string of images located in paths passed in input collection. Input collection should hold data of valid image paths reachable from location where code is being executed. Method will retrieve image, apply specified reformatting before converting contents to base64 string. Results will in same order as specified in input collection.
- Parameters
- paths: Iterable
Iterable with path locations to open images from
- image_options: ImageOptions class
Class holding parameters for image transformation and formatting
- continue_on_error: bool
If one of rows encounters error while retrieving content (i.e. file does not exist) should this error terminate process of downloading consecutive files or should process continue skipping this file.
- n_threads: int or None
Number of threads to use for processing. If “None” is passed, the number of threads is determined automatically based on the number of available CPU cores. If this is not possible, 4 threads are used.
- Returns
- List of base64 encoded strings representing reformatted images.
- Raises
- ContentRetrievalTerminatedError:
The error is raised when the flag continue_on_error is set to` False` and processing has been terminated due to an exception while loading the contents of the file.
- Return type
List
[Optional
[str
]]
- datarobot.helpers.binary_data_utils.get_encoded_file_contents_from_paths(paths, continue_on_error=False, n_threads=None)¶
Returns base64 encoded string for files located under paths passed in input collection. Input collection should hold data of valid file paths locations reachable from location where code is being executed. Method will retrieve file and convert its contents to base64 string. Results will be returned in same order as specified in input collection.
- Parameters
- paths: Iterable
Iterable with path locations to open images from
- continue_on_error: bool
If one of rows encounters error while retrieving content (i.e. file does not exist) should this error terminate process of downloading consecutive files or should process continue skipping this file.
- n_threads: int or None
Number of threads to use for processing. If “None” is passed, the number of threads is determined automatically based on the number of available CPU cores. If this is not possible, 4 threads are used.
- Returns
- List of base64 encoded strings representing files.
- Raises
- ContentRetrievalTerminatedError:
The error is raised when the flag continue_on_error is set to` False` and processing has been terminated due to an exception while loading the contents of the file.
- Return type
List
[Optional
[str
]]
- datarobot.helpers.binary_data_utils.get_encoded_file_contents_from_urls(urls, custom_headers=None, continue_on_error=False, n_threads=None)¶
Returns base64-encoded string for files located in the URL addresses passed on input. Input collection holds data of valid file URL addresses reachable from location where code is being executed. Method will retrieve file and convert its contents to base64 string. Results will be returned in same order as specified in input collection.
- Parameters
- urls: Iterable
Iterable containing URL addresses to download images from.
- custom_headers: dict
Dictionary with headers to use when downloading files using a URL. Detailed data related to supported Headers in HTTP can be found in the RFC specification: https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html. When specified, passed values will overwrite default header values.
- continue_on_error: bool
If a row encounters an error while retrieving content (i.e., file does not exist), specifies whether the error results in terminating the process of downloading consecutive files or the process continues. Skipped files will be marked as missing.
- n_threads: int or None
Number of threads to use for processing. If “None” is passed, the number of threads is determined automatically based on the number of available CPU cores. If this is not possible, 4 threads are used.
- Returns
- List of base64 encoded strings representing files.
- Raises
- ContentRetrievalTerminatedError:
The error is raised when the flag continue_on_error is set to` False` and processing has been terminated due to an exception while loading the contents of the file.
- Return type
List
[Optional
[str
]]
- class datarobot.helpers.image_utils.ImageOptions(should_resize=True, force_size=True, image_size=(224, 224), image_format=None, image_quality=75, image_subsampling=None, resample_method=1, keep_quality=True)¶
Image options class. Class holds image options related to image resizing and image reformatting.
- should_resize: bool
Whether input image should be resized to new dimensions.
- force_size: bool
Whether the image size should fully match the new requested size. If the original and new image sizes have different aspect ratios, specifying True will force a resize to exactly match the requested size. This may break the aspect ratio of the original image. If False, the resize method modifies the image to contain a thumbnail version of itself, no larger than the given size, that preserves the image’s aspect ratio.
- image_size: Tuple[int, int]
New image size (width, height). Both values (width, height) should be specified and contain a positive value. Depending on the value of force_size, the image will be resized exactly to the given image size or will be resized into a thumbnail version of itself, no larger than the given size.
- image_format: ImageFormat | str
What image format will be used to save result image after transformations. For example (ImageFormat.JPEG, ImageFormat.PNG). Values supported are in line with values supported by DataRobot. If no format is specified by passing None value original image format will be preserved.
- image_quality: int or None
The image quality used when saving image. When None is specified, a value will not be passed and Pillow library will use its default.
- resample_method: ImageResampleMethod
What resampling method should be used when resizing image.
- keep_quality: bool
Whether the image quality is kept (when possible). If True, for JPEG images quality will be preserved. For other types, the value specified in image_quality will be used.
Generative AI¶
- class datarobot.models.genai.chat.Chat(id, name, llm_blueprint_id, is_frozen, creation_date, creation_user_id)¶
Metadata for a DataRobot GenAI chat.
- Attributes
- idstr
The chat ID.
- namestr
The chat name.
- llm_blueprint_idstr
The ID of the LLM blueprint associated with the chat.
- is_frozenbool
Checks whether the chat is frozen. Prompts cannot be submitted to frozen chats.
- creation_datestr
The date when the chat was created.
- creation_user_idstr
The ID of the creating user.
- classmethod create(name, llm_blueprint)¶
Creates a new chat.
- Parameters
- namestr
The chat name.
- llm_blueprintLLMBlueprint or str
The LLM blueprint associated with the created chat, either LLM blueprint or ID.
- Returns
- chatChat
The created chat.
- Return type
- classmethod get(chat)¶
Retrieve a single chat.
- Parameters
- chatChat or str
The chat you want to retrieve. Accepts chat or chat ID.
- Returns
- chatChat
The requested chat.
- Return type
- classmethod list(llm_blueprint=None, sort=None)¶
List all chats available to the user. If the LLM blueprint is specified, results are restricted to only those chats associated with the LLM blueprint.
- Parameters
- llm_blueprintOptional[Union[LLMBlueprint, str]], optional
Returns only those chats associated with a particular LLM blueprint, specified by either the entity or the ID.
- sortstr, optional
The property to sort chats by. Prefix the attribute name with a dash ( - ) to sort responses in descending order, (for example, ‘-name’). Supported options are listed in ListChatsSortQueryParams, but the values can differ depending on platform version. The default sort parameter is None, which results in chats returning in order of creation time, descending.
- Returns
- chatslist[Chat]
Returns a list of chats.
- Return type
List
[Chat
]
- delete()¶
Delete the single chat.
- Return type
None
- class datarobot.models.genai.chat_prompt.ChatPrompt(id, text, llm_blueprint_id, llm_id, creation_date, creation_user_id, citations, execution_status, llm_settings=None, vector_database_id=None, vector_database_settings=None, result_metadata=None, result_text=None, confidence_scores=None, chat_id=None, chat_context_id=None, chat_prompt_ids_included_in_history=None)¶
Metadata for a DataRobot GenAI chat prompt.
- Attributes
- idstr
Chat prompt ID.
- textstr
The prompt text.
- llm_blueprint_idstr
ID of the LLM blueprint associated with the chat prompt.
- llm_idstr
ID of the LLM type. This must be one of the IDs returned by LLMDefinition.list for this user.
- llm_settingsdict or None
The LLM settings for the LLM blueprint. The specific keys allowed and the constraints on the values are defined in the response from LLMDefinition.list, but this typically has dict fields. Either: - system_prompt - The system prompt that influences the LLM responses. - max_completion_length - The maximum number of tokens in the completion. - temperature - Controls the variability in the LLM response. - top_p - Sets whether the model considers next tokens with top_p probability mass. or - system_prompt - The system prompt that influences the LLM responses. - validation_id - The ID of the external model LLM validation. - external_llm_context_size - The external LLM’s context size, in tokens, for external model-based LLM blueprints.
- creation_datestr
The date the chat prompt was created.
- creation_user_idstr
ID of the creating user.
- vector_database_idstr or None
ID of the vector database associated with the LLM blueprint, if any.
- vector_database_settingsVectorDatabaseSettings or None
The settings for the vector database associated with the LLM blueprint, if any.
- result_metadataResultMetadata or None
Metadata for the result of the chat prompt submission.
- result_text: str or None
The result text from the chat prompt submission.
- confidence_scores: ConfidenceScores or None
The confidence scores if there is a vector database associated with the chat prompt.
- citations: list[Citation]
List of citations from text retrieved from the vector database, if any.
- execution_status: str
The execution status of the chat prompt.
- chat_id: Optional[str]
ID of the chat associated with the chat prompt.
- chat_context_id: Optional[str]
The ID of the chat context for the chat prompt.
- chat_prompt_ids_included_in_history: Optional[list[str]]
The IDs of the chat prompts included in the chat history for this chat prompt.
- classmethod create(text, llm_blueprint=None, chat=None, llm=None, llm_settings=None, vector_database=None, vector_database_settings=None, wait_for_completion=False)¶
Create a new ChatPrompt. This submits the prompt text to the LLM. Either llm_blueprint or chat is required.
- Parameters
- textstr
The prompt text.
- llm_blueprintLLMBlueprint or str or None, optional
The LLM blueprint associated with the created chat prompt, either LLMBlueprint or LLM blueprint ID.
- chatChat or str or None, optional
The chat associated with the created chat prompt, either Chat or chat ID.
- llmLLMDefinition, str, or None, optional
LLM to use for the chat prompt. Accepts LLMDefinition or LLM ID.
- llm_settings: dict or None
LLM settings to use for the chat prompt. The specific keys allowed and the constraints on the values are defined in the response from LLMDefinition.list but this typically has dict fields: - system_prompt - The system prompt that tells the LLM how to behave. - max_completion_length - The maximum number of token in the completion. - temperature - controls the variability in the LLM response. - top_p - the model considers next tokens with top_p probability mass or - system_prompt - The system prompt that tells the LLM how to behave. - validation_id - The ID of the custom model LLM validation for custom model LLM blueprints.
- vector_database: VectorDatabase, str, or None, optional
The vector database to use with this chat prompt submission. Accepts VectorDatabase or vector database ID.
- vector_database_settings: VectorDatabaseSettings or None, optional
Settings for the vector database, if any.
- wait_for_completionbool
If set to True code will wait for the chat prompt job to complete before returning the result (up to 10 minutes, raising timeout error after that). Otherwise, you can check current status by using ChatPrompt.get with returned ID.
- Returns
- chat_promptChatPrompt
The created chat prompt.
- Return type
- classmethod get(chat_prompt)¶
Retrieve a single chat prompt.
- Parameters
- chat_promptChatPrompt or str
The chat prompt you want to retrieve, either ChatPrompt or chat prompt ID.
- Returns
- chat_promptChatPrompt
The requested chat prompt.
- Return type
- classmethod list(llm_blueprint=None, playground=None, chat=None)¶
List all chat prompts available to the user. If the llm_blueprint, playground, or chat is specified then the results are restricted to the chat prompts associated with that entity.
- Parameters
- llm_blueprintOptional[Union[LLMBlueprint, str]], optional
The returned chat prompts are filtered to those associated with a specific LLM blueprint if it is specified. Accepts either LLMBlueprint or LLM blueprint ID.
- playgroundOptional[Union[Playground, str]], optional
The returned chat prompts are filtered to those associated with a specific playground if it is specified. Accepts either Playground or playground ID.
- chatOptional[Union[Chat, str]], optional
The returned chat prompts are filtered to those associated with a specific chat if it is specified. Accepts either Chat or chat ID.
- Returns
- chat_promptslist[ChatPrompt]
A list of chat prompts available to the user.
- Return type
List
[ChatPrompt
]
- delete()¶
Delete the single chat prompt.
- Return type
None
- create_llm_blueprint(name, description='')¶
Create a new LLM blueprint from an existing chat prompt.
- Parameters
- namestr
LLM blueprint name.
- descriptionstr, optional
Description of the LLM blueprint, by default “”.
- Returns
- llm_blueprintLLMBlueprint
The created LLM blueprint.
- Return type
- class datarobot.models.genai.comparison_chat.ComparisonChat(id, name, playground_id, creation_date, creation_user_id)¶
Metadata for a DataRobot GenAI comparison chat.
- Attributes
- idstr
The comparison chat ID.
- namestr
The comparison chat name.
- playground_idstr
The ID of the playground associated with the comparison chat.
- creation_datestr
The date when the comparison chat was created.
- creation_user_idstr
The ID of the creating user.
- classmethod create(name, playground)¶
Creates a new comparison chat.
- Parameters
- namestr
The comparison chat name.
- playgroundPlayground or str
The playground associated with the created comparison chat, either Playground or playground ID.
- Returns
- comparison_chatComparisonChat
The created comparison chat.
- Return type
- classmethod get(comparison_chat)¶
Retrieve a single comparison chat.
- Parameters
- comparison_chatComparisonChat or str
The comparison chat you want to retrieve. Accepts ComparisonChat or comparison chat ID.
- Returns
- comparison_chatComparisonChat
The requested comparison chat.
- Return type
- classmethod list(playground=None, sort=None)¶
List all comparison chats available to the user. If the playground is specified, results are restricted to only those comparison chats associated with the playground.
- Parameters
- playgroundOptional[Union[Playground, str]], optional
Returns only those comparison chats associated with a particular playground, specified by either the Playground or the playground ID.
- sortstr, optional
The property to sort comparison chats by. Prefix the attribute name with a dash ( - ) to sort responses in descending order, (for example, ‘-name’). Supported options are listed in ListComparisonChatsSortQueryParams, but the values can differ depending on platform version. The default sort parameter is None, which results in comparison chats returning in order of creation time, descending.
- Returns
- comparison_chatslist[ComparisonChat]
Returns a list of comparison chats.
- Return type
List
[ComparisonChat
]
- delete()¶
Delete the single comparison chat.
- Return type
None
- update(name)¶
Update the comparison chat.
- Parameters
- namestr
The new name for the comparison chat.
- Returns
- comparison_chatComparisonChat
The updated comparison chat.
- Return type
- class datarobot.models.genai.comparison_prompt.ComparisonPrompt(id, text, results, creation_date, creation_user_id, comparison_chat_id=None)¶
Metadata for a DataRobot GenAI comparison prompt.
- Attributes
- idstr
Comparison prompt ID.
- textstr
The prompt text.
- resultslist[ComparisonPromptResult]
The list of results for individual LLM blueprints that are part of the comparison prompt.
- creation_datestr
The date when the playground was created.
- creation_user_idstr
ID of the creating user.
- comparison_chat_idstr
The ID of the comparison chat this comparison prompt is associated with.
- classmethod create(llm_blueprints, text, comparison_chat=None, wait_for_completion=False)¶
Create a new ComparisonPrompt. This submits the prompt text to the LLM blueprints that are specified.
- Parameters
- llm_blueprintslist[LLMBlueprint or str]
The LLM blueprints associated with the created comparison prompt. Accepts LLM blueprints or IDs.
- textstr
The prompt text.
- comparison_chat: Optional[ComparisonChat or str], optional
The comparison chat to add the comparison prompt to. Accepts ComparisonChat or comparison chat ID.
- wait_for_completionbool
If set to True code will wait for the chat prompt job to complete before returning the result (up to 10 minutes, raising timeout error after that). Otherwise, you can check current status by using ChatPrompt.get with returned ID.
- Returns
- comparison_promptComparisonPrompt
The created comparison prompt.
- Return type
- classmethod get(comparison_prompt)¶
Retrieve a single comparison prompt.
- Parameters
- comparison_promptstr
The comparison prompt you want to retrieve. Accepts entity or ID.
- Returns
- comparison_promptComparisonPrompt
The requested comparison prompt.
- Return type
- classmethod list(llm_blueprints=None, comparison_chat=None)¶
List all comparison prompts available to the user that include the specified LLM blueprints or from the specified comparison chat.
- Parameters
- llm_blueprintsOptional[List[Union[LLMBlueprint, str]]], optional
The returned comparison prompts are only those associated with the specified LLM blueprints. Accepts either LLMBlueprint or LLM blueprint ID.
- comparison_chatOptional[Union[ComparisonChat, str]], optional
The returned comparison prompts are only those associated with the specified comparison chat. Accepts either ComparisonChat or comparison chat ID.
- Returns
- comparison_promptslist[ComparisonPrompt]
A list of comparison prompts available to the user that use the specified LLM blueprints.
- Return type
List
[ComparisonPrompt
]
- update(additional_llm_blueprints=None, wait_for_completion=False, **kwargs)¶
Update the comparison prompt.
- Parameters
- additional_llm_blueprintslist[LLMBlueprint or str]
The additional LLM blueprints you want to submit the comparison prompt.
- Returns
- comparison_promptComparisonPrompt
The updated comparison prompt.
- Return type
- delete()¶
Delete the single comparison prompt.
- Return type
None
- class datarobot.models.genai.custom_model_validation.CustomModelValidation(id, prompt_column_name, target_column_name, deployment_id, model_id, validation_status, deployment_access_data, tenant_id, name, creation_date, user_id, error_message, deployment_name, user_name, use_case_id, prediction_timeout)¶
Validation record checking the ability of the deployment to serve as a custom model LLM or vector database.
- Attributes
- prompt_column_namestr
The column name that the deployed model expects as the input.
- target_column_namestr
The target name that the deployed model will output.
- deployment_idstr
ID of the deployment.
- model_idstr
ID of the underlying deployment model. Can be found from the API as Deployment.model[“id”].
- validation_statusstr
Can be TESTING, FAILED, or PASSED. Only PASSED is allowed for use.
- deployment_access_datadict, optional
Data that will be used for accessing deployment prediction server. Only available for deployments that pass validation. Dict fields: - prediction_api_url - URL for deployment prediction server. - datarobot_key - First of two auth headers for the prediction server. - authorization_header - Second of two auth headers for the prediction server. - input_type - Either JSON or CSV - input type model expects. - model_type - Target type of deployed custom model.
- tenant_idstr
Creating user’s tenant ID.
- error_messageOptional[str]
Additional information for errored validation.
- deployment_nameOptional[str]
The name of the deployment that is validated.
- user_nameOptional[str]
The name of the user
- use_case_idOptional[str]
The ID of the Use Case associated with the validation.
- prediction_timeout: int
The timeout, in seconds, for the prediction API used in this custom model validation.
- classmethod get(validation_id)¶
Get the validation record by id.
- Parameters
- validation_idUnion[CustomModelValidation, str]
The CustomModelValidation to retrieve, either CustomModelValidation or validation ID.
- Returns
- CustomModelValidation
- Return type
- classmethod get_by_values(prompt_column_name, target_column_name, deployment_id, model_id)¶
Get the validation record by field values.
- Parameters
- prompt_column_namestr
The column name the deployed model expect as the input.
- target_column_namestr
The target name deployed model will output.
- deployment_idstr
ID of the deployment.
- model_idstr
ID of the underlying deployment model.
- Returns
- CustomModelValidation
- Return type
- classmethod list(prompt_column_name=None, target_column_name=None, deployment=None, model=None, use_cases=None, playground=None, completed_only=False, search=None, sort=None)¶
List the validation records by field values.
- Parameters
- prompt_column_nameOptional[str], optional
The column name the deployed model expects as the input.
- target_column_nameOptional[str], optional
The target name that the deployed model will output.
- deploymentOptional[Union[Deployment, str]], optional
The returned validations are filtered to those associated with a specific deployment if specified, either Deployment or deployment ID.
- model_idOptional[Union[Model, str]], optional
The returned validations are filtered to those associated with a specific model if specified, either Model or model ID.
- use_casesOptional[list[Union[UseCase, str]]], optional
The returned validations are filtered to those associated with specific Use Cases if specified, either UseCase or Use Case IDs.
- playground_idOptional[Union[Playground, str]], optional
The returned validations are filtered to those used in a specific playground if specified, either Playground or playground ID.
- completed_onlybool, optional
Whether to retrieve only completed validations.
- searchOptional[str], optional
String for filtering validations. Validations that contain the string in name will be returned.
- sortOptional[str], optional
Property to sort validations by. Prefix the attribute name with a dash to sort in descending order, e.g. sort=’-name’. Currently supported options are listed in ListCustomModelValidationsSortQueryParams but the values can differ with different platform versions. By default, the sort parameter is None which will result in validations being returned in order of creation time descending.
- Returns
- List[CustomModelValidation]
- Return type
List
[CustomModelValidation
]
- classmethod create(prompt_column_name, target_column_name, deployment_id, model=None, use_case=None, name=None, wait_for_completion=False, prediction_timeout=None)¶
Start the validation of deployment to serve as a vector database or LLM.
- Parameters
- prompt_column_namestr
The column name the deployed model expect as the input.
- target_column_namestr
The target name that the deployed model will output.
- deployment_idUnion[Deployment, str]
The deployment to validate, either Deployment or deployment ID.
- modelOptional[Union[Model, str]], optional
The specific model within the deployment, either Model or model ID. If not specified, the underlying model ID will be derived from the deployment info automatically.
- use_caseOptional[Union[UseCase, str]], optional
The Use Case to link the validation to, either UseCase or Use Case ID.
- nameOptional[str], optional
The name of the validation.
- wait_for_completionbool
If set to True code will wait for the validation job to complete before returning the result (up to 10 minutes, raising timeout error after that). Otherwise, you can check current validation status by using CustomModelValidation.get with returned ID.
- prediction_timeoutOptional[int], optional
The timeout, in seconds, for the prediction API used in this custom model validation.
- Returns
- CustomModelValidation
- Return type
- classmethod revalidate(validation_id)¶
Revalidate an unlinked custom model vector database or LLM. This method is useful when a deployment used as vector database or LLM is accidentally replaced with another model that stopped complying with the vector database or LLM requirements. Replace the model back and call this method instead of creating a new custom model validation from scratch. Another use case for this is when the API token used to create a validation record got revoked and no longer can be used by vector database / LLM to call custom model deployment. Calling revalidate will update the validation record with the token currently in use.
- Parameters
- validation_idstr
The ID of the CustomModelValidation for revalidation.
- Returns
- CustomModelValidation
- Return type
- update(name=None, prompt_column_name=None, target_column_name=None, deployment=None, model=None, prediction_timeout=None)¶
Update a custom model validation.
- Parameters
- nameOptional[str], optional
The new name of the custom model validation.
- prompt_column_nameOptional[str], optional
The new name of the prompt column.
- target_column_nameOptional[str], optional
The new name of the target column.
- deploymentOptional[Union[Deployment, str]], optional
The new deployment to validate.
- modelOptional[Union[Model, str]], optional
The new model within the deployment to validate.
- prediction_timeoutOptional[int], optional
The new timeout, in seconds, for the prediction API used in this custom model validation.
- Returns
- CustomModelValidation
- Return type
- delete()¶
Delete the custom model validation.
- Return type
None
- class datarobot.models.genai.custom_model_llm_validation.CustomModelLLMValidation(id, prompt_column_name, target_column_name, deployment_id, model_id, validation_status, deployment_access_data, tenant_id, name, creation_date, user_id, error_message, deployment_name, user_name, use_case_id, prediction_timeout)¶
Validation record checking the ability of the deployment to serve as a custom model LLM.
- Attributes
- prompt_column_namestr
The column name the deployed model expect as the input.
- target_column_namestr
The target name that the deployed model will output.
- deployment_idstr
ID of the deployment.
- model_idstr
ID of the underlying deployment model. Can be found from the API as Deployment.model[“id”].
- validation_statusstr
Can be TESTING, FAILED, or PASSED. Only PASSED is allowed for use.
- deployment_access_datadict, optional
Data that will be used for accessing deployment prediction server. Only available for deployments that passed validation. Dict fields: - prediction_api_url - URL for deployment prediction server. - datarobot_key - first of 2 auth headers for the prediction server. - authorization_header - second of 2 auth headers for the prediction server. - input_type - Either JSON or CSV - the input type that the model expects. - model_type - Target type of the deployed custom model.
- tenant_idstr
Creating user’s tenant ID.
- error_messageOptional[str]
Additional information for errored validation.
- deployment_nameOptional[str]
The name of the deployment that is validated.
- user_nameOptional[str]
The name of the user
- use_case_idOptional[str]
The ID of the Use Case associated with the validation.
- prediction_timeout: int
The timeout in seconds for the prediction API used in this custom model validation.
- class datarobot.models.genai.vector_database.CustomModelVectorDatabaseValidation(id, prompt_column_name, target_column_name, deployment_id, model_id, validation_status, deployment_access_data, tenant_id, name, creation_date, user_id, error_message, deployment_name, user_name, use_case_id, prediction_timeout)¶
Validation record checking the ability of the deployment to serve as a vector database.
- Attributes
- prompt_column_namestr
The column name the deployed model expect as the input.
- target_column_namestr
The target name deployed model will output.
- deployment_idstr
ID of the deployment.
- model_idstr
ID of the underlying deployment model. Can be found from the API as Deployment.model[“id”].
- validation_statusstr
Can be TESTING, FAILED and PASSED. Only PASSED allowed for use.
- deployment_access_datadict, optional
Data that will be used for accessing deployment prediction server. Only available for deployments that passed validation. Dict fields: - prediction_api_url - URL for deployment prediction server. - datarobot_key - first of 2 auth headers for prediction server. - authorization_header - second of 2 auth headers for prediction server. - input_type - Either JSON or CSV - input type model expects. - model_type - Target type of deployed custom model.
- tenant_idstr
Creating user’s tenant ID.
- error_messageOptional[str]
Additional information for errored validation.
- deployment_nameOptional[str]
The name of the deployment that is validated.
- user_nameOptional[str]
The name of the user
- use_case_idOptional[str]
The ID of the use case associated with the validation.
- class datarobot.models.genai.llm_blueprint.LLMBlueprint(id, name, description, is_saved, is_starred, playground_id, creation_date, creation_user_id, creation_user_name, last_update_date, last_update_user_id, prompt_type, llm_id=None, llm_name=None, llm_settings=None, vector_database_id=None, vector_database_settings=None, vector_database_name=None, vector_database_status=None, vector_database_error_message=None, vector_database_error_resolution=None, custom_model_llm_validation_status=None, custom_model_llm_error_message=None, custom_model_llm_error_resolution=None)¶
Metadata for a DataRobot GenAI LLM blueprint.
- Attributes
- idstr
LLM blueprint ID.
- namestr
LLM blueprint name.
- descriptionstr
Description of the LLM blueprint.
- is_savedbool
Whether the LLM blueprint is saved (settings are locked and blueprint is eligible for use with ComparisonPrompts).
- is_starredbool
Whether the LLM blueprint is starred.
- playground_idstr
ID of the Gen AI playground associated with the LLM blueprint.
- llm_idstr or None
ID of the LLM type. If not None this must be one of the IDs returned by LLMDefinition.list for this user.
- llm_namestr or None
Name of the LLM.
- llm_settingsdict or None
The LLM settings for the LLM blueprint. The specific keys allowed and the constraints on the values are defined in the response from LLMDefinition.list but this typically has dict fields: - system_prompt - The system prompt that tells the LLM how to behave. - max_completion_length - The maximum number of token in the completion. - temperature - controls the variability in the LLM response. - top_p - the model considers next tokens with top_p probability mass or - system_prompt - The system prompt that tells the LLM how to behave. - validation_id - The ID of the external model LLM validation - external_llm_context_size - The external LLM’s context size in tokens for external model LLM blueprints.
- creation_datestr
The date when the playground was created.
- creation_user_idstr
The ID of the user creating the playground.
- creation_user_namestr
The name of the user creating the playground.
- last_update_datestr
The date when the playground was most recently updated.
- last_update_user_idstr
ID of the user who most recently updated the playground.
- prompt_typePromptType
The prompting strategy for the LLM Blueprint. Currently supported options are listed in PromptType.
- vector_database_idstr or None
ID of the vector database associated with the LLM blueprint, if any.
- vector_database_settingsVectorDatabaseSettings or None
The settings for the vector database associated with the LLM blueprint, if any.
- vector_database_namestr or None
The name of the vector database associated with the LLM blueprint, if any.
- vector_database_statusstr or None
The status of the vector database associated with the LLM blueprint, if any.
- vector_database_error_messagestr or None
The error message for the vector database associated with the LLM blueprint, if any.
- vector_database_error_resolutionstr or None
The resolution for the vector database error associated with the LLM blueprint, if any.
- custom_model_llm_validation_statusstr or None
The status of the custom model LLM validation if the llm_id is ‘custom-model’.
- custom_model_llm_error_messagestr or None
The error message for the custom model LLM, if any.
- custom_model_llm_error_resolutionstr or None
The resolution for the custom model LLM error, if any.
- classmethod create(playground, name, prompt_type=PromptType.CHAT_HISTORY_AWARE, description='', llm=None, llm_settings=None, vector_database=None, vector_database_settings=None)¶
Create a new LLM blueprint.
- Parameters
- playgroundPlayground or str
The playground associated with the created LLM blueprint. Accepts playground or playground ID.
- namestr
LLM blueprint name.
- prompt_typePromptType, optional
Prompting type of the LLM Blueprint, by default PromptType.CHAT_HISTORY_AWARE.
- descriptionstr, optional
Description of the LLM blueprint, by default “”.
- llmLLMDefinition, str, or None, optional
LLM to use for the blueprint. Accepts LLMDefinition or LLM ID.
- llm_settingsdict or None
The LLM settings for the LLM blueprint. The specific keys allowed and the constraints on the values are defined in the response from LLMDefinition.list but this typically has dict fields: - system_prompt - The system prompt that tells the LLM how to behave. - max_completion_length - The maximum number of token in the completion. - temperature - controls the variability in the LLM response. - top_p - the model considers next tokens with top_p probability mass or - system_prompt - The system prompt that tells the LLM how to behave. - validation_id - The ID of the custom model LLM validation for custom model LLM blueprints.
- vector_database: VectorDatabase, str, or None, optional
The vector database to use with this LLM blueprint. Accepts VectorDatabase or vector database ID.
- vector_database_settings: VectorDatabaseSettings or None, optional
Settings for the vector database, if any.
- Returns
- llm_blueprintLLMBlueprint
The created LLM blueprint.
- Return type
- classmethod create_from_llm_blueprint(llm_blueprint, name, description='')¶
Create a new LLM blueprint from an existing LLM blueprint.
- Parameters
- llm_blueprintLLMBlueprint or str
The LLM blueprint to use to create the new LLM blueprint. Accepts LLM blueprint or LLM blueprint ID.
- namestr
LLM blueprint name.
- descriptionstr, optional
Description of the LLM blueprint, by default “”.
- Returns
- llm_blueprintLLMBlueprint
The created LLM blueprint.
- Return type
- classmethod get(llm_blueprint_id)¶
Retrieve a single LLM blueprint.
- Parameters
- llm_blueprint_idstr
The ID of the LLM blueprint you want to retrieve.
- Returns
- llm_blueprintLLMBlueprint
The requested LLM blueprint.
- Return type
- classmethod list(playground=None, llms=None, vector_databases=None, is_saved=None, is_starred=None, sort=None)¶
Lists all LLM blueprints available to the user. If the playground is specified, then the results are restricted to the LLM blueprints associated with the playground. If the LLMs are specified, then the results are restricted to the LLM blueprints using those LLM types. If vector_databases are specified, then the results are restricted to the LLM blueprints using those vector databases.
- Parameters
- playgroundOptional[Union[Playground, str]], optional
The returned LLM blueprints are filtered to those associated with a specific playground if it is specified. Accepts either the entity or the ID.
- llmsOptional[list[Union[LLMDefinition, str]]], optional
The returned LLM blueprints are filtered to those associated with the LLM types specified. Accepts either the entity or the ID.
- vector_databasesOptional[list[Union[VectorDatabase, str]]], optional
The returned LLM blueprints are filtered to those associated with the vector databases specified. Accepts either the entity or the ID.
- is_saved: Optional[bool], optional
The returned LLM blueprints are filtered to those matching is_saved.
- is_starred: Optional[bool], optional
The returned LLM blueprints are filtered to those matching is_starred.
- sortstr, optional
Property to sort LLM blueprints by. Prefix the attribute name with a dash to sort in descending order, e.g. sort=’-creationDate’. Currently supported options are listed in ListLLMBlueprintsSortQueryParams but the values can differ with different platform versions. By default, the sort parameter is None which will result in LLM blueprints being returned in order of creation time descending.
- Returns
- playgroundslist[Playground]
A list of playgrounds available to the user.
- Return type
List
[LLMBlueprint
]
- update(name=None, description=None, llm=None, llm_settings=None, vector_database=None, vector_database_settings=None, is_saved=None, is_starred=None, prompt_type=None, remove_vector_database=False)¶
Update the LLM blueprint.
- Parameters
- namestr or None, optional
The new name for the LLM blueprint.
- description: str or None, optional
The new description for the LLM blueprint.
- llm: Optional[Union[LLMDefinition, str]], optional
The new LLM type for the LLM blueprint.
- llm_settings: Optional[dict], optional
The new LLM settings for the LLM blueprint. These must match the LLMSettings returned from the LLMDefinition.list method for the LLM type used for this LLM blueprint but this typically has dict fields: - system_prompt - The system prompt that tells the LLM how to behave. - max_completion_length - The maximum number of token in the completion. - temperature - controls the variability in the LLM response. - top_p - the model considers next tokens with top_p probability mass or - system_prompt - The system prompt that tells the LLM how to behave. - validation_id - The ID of the custom model LLM validation for custom model LLM blueprints.
- vector_database: Optional[Union[VectorDatabase, str]], optional
The new vector database for the LLM blueprint.
- vector_database_settings: Optional[VectorDatabaseSettings], optional
The new vector database settings for the LLM blueprint.
- is_saved: Optional[bool], optional
The new is_saved attribute for the LLM blueprint.
- is_starred: Optional[bool], optional
The new is_starred attribute for the LLM blueprint.
- prompt_typePromptType, optional
The new prompting type of the LLM Blueprint.
- remove_vector_database: Optional[bool], optional
Whether to remove the vector database from the LLM blueprint.
- Returns
- llm_blueprintLLMBlueprint
The updated LLM blueprint.
- Return type
- delete()¶
Delete the single LLM blueprint.
- Return type
None
- register_custom_model(prompt_column_name=None, target_column_name=None)¶
Create a new CustomModelVersion. This registers a custom model from the LLM blueprint.
- Parameters
- prompt_column_namestr, optional
The column name of the prompt text.
- target_column_namestr, optional
The column name of the response text.
- Returns
- custom_modelCustomModelVersion
The registered custom model.
- Return type
- class datarobot.models.genai.llm.LLMDefinition(id, name, description, vendor, license, supported_languages, settings, context_size=None)¶
Metadata for a DataRobot GenAI LLM.
- Attributes
- idstr
Language model type ID.
- namestr
Language model name.
- descriptionstr
Description of the language model.
- vendorstr
Name of the vendor for this model.
- licensestr
License for this model.
- supported_languagesstr
Languages supported by this model.
- settingslist of LLMSettingDefinition
Settings for this model
- context_sizeint
The context size for this model
- classmethod list(use_case=None, as_dict=True)¶
List all large language models (LLMs) available to the user.
- Parameters
- use_caseOptional[UseCase or str], optional
The returned LLMs, including external LLMs, available for the specified Use Case. Accepts either the entity or the Use CaseID.
- Returns
- llmslist[LLMDefinition] or list[LLMDefinitionDict]
A list of large language models (LLMs) available to the user.
- Return type
Union
[List
[LLMDefinition
],List
[LLMDefinitionDict
]]
- class datarobot.models.genai.llm.LLMDefinitionDict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)¶
- class datarobot.models.genai.playground.Playground(id, name, description, use_case_id, creation_date, creation_user_id, last_update_date, last_update_user_id, saved_llm_blueprints_count, llm_blueprints_count, user_name)¶
Metadata for a DataRobot GenAI playground.
- Attributes
- idstr
Playground ID.
- namestr
Playground name.
- descriptionstr
Description of the playground.
- use_case_idstr
Linked use case ID.
- creation_datestr
The date when the playground was created.
- creation_user_idstr
ID of the creating user.
- last_update_datestr
Date when the playground was most recently updated.
- last_update_user_idstr
ID of the user who most recently updated the playground.
- saved_llm_blueprints_countint
Number of saved LLM blueprints in the playground.
- llm_blueprints_countint
Number of LLM blueprints in the playground.
- user_namestr
The name of the user who created the playground.
- classmethod create(name, description='', use_case=None)¶
Create a new playground.
- Parameters
- namestr
Playground name.
- descriptionstr, optional
Description of the playground, by default “”.
- use_caseOptional[Union[UseCase, str]], optional
Use case to link to the created playground.
- Returns
- playgroundPlayground
The created playground.
- Return type
- classmethod get(playground_id)¶
Retrieve a single playground.
- Parameters
- playground_idstr
The ID of the playground you want to retrieve.
- Returns
- playgroundPlayground
The requested playground.
- Return type
- classmethod list(use_case=None, search=None, sort=None)¶
List all playgrounds available to the user. If the use_case is specified or can be inferred from the Context then the results are restricted to the playgrounds associated with the UseCase.
- Parameters
- use_caseOptional[UseCaseLike], optional
The returned playgrounds are filtered to those associated with a specific Use Case or Cases if specified or can be inferred from the Context. Accepts either the entity or the ID.
- searchstr, optional
String for filtering playgrounds. Playgrounds that contain the string in name will be returned. If not specified, all playgrounds will be returned.
- sortstr, optional
Property to sort playgrounds by. Prefix the attribute name with a dash to sort in descending order, e.g. sort=’-creationDate’. Currently supported options are listed in ListPlaygroundsSortQueryParams but the values can differ with different platform versions. By default, the sort parameter is None which will result in playgrounds being returned in order of creation time descending.
- Returns
- playgroundslist[Playground]
A list of playgrounds available to the user.
- Return type
List
[Playground
]
- update(name=None, description=None)¶
Update the playground.
- Parameters
- namestr
The new name for the playground.
- description: str
The new description for the playground.
- Returns
- playgroundPlayground
The updated playground.
- Return type
- delete()¶
Delete the playground.
- Return type
None
- class datarobot.enums.PromptType(value)¶
Supported LLM Blueprint prompting types.
- class datarobot.models.genai.vector_database.SupportedEmbeddings(embedding_models, default_embedding_model)¶
All supported embedding models including the recommended default model.
- Attributes
- embedding_modelslist[EmbeddingModel]
All supported embedding models.
- default_embedding_modelstr
Name of the default recommended text embedding model. Currently supported options are listed in VectorDatabaseEmbeddingModel but the values can differ with different platform versions.
- class datarobot.models.genai.vector_database.SupportedTextChunkings(text_chunking_configs)¶
Supported text chunking configurations which includes a set of recommended chunking parameters for each supported embedding model.
- Attributes
- text_chunking_configs
All supported text chunking configurations.
- class datarobot.models.genai.user_limits.UserLimits(counter)¶
Counts for user limits for LLM APIs and vector databases.
- classmethod get_vector_database_count()¶
Get the count of vector databases for the user.
- Return type
- class datarobot.models.genai.vector_database.VectorDatabase(id, name, size, use_case_id, dataset_id, embedding_model, chunking_method, chunk_size, chunk_overlap_percentage, chunks_count, separators, creation_date, creation_user_id, organization_id, tenant_id, last_update_date, execution_status, playgrounds_count, dataset_name, user_name, source, validation_id, error_message, is_separator_regex)¶
Metadata for a DataRobot vector database accessible to the user.
- Attributes
- idstr
Vector database ID.
- namestr
Vector database name.
- sizeint
Size of the vector database assets in bytes.
- use_case_idstr
Linked use case ID.
- dataset_idstr
ID of the dataset used for creation.
- embedding_modelstr
Name of the text embedding model. Currently supported options are listed in VectorDatabaseEmbeddingModel but the values can differ with different platform versions.
- chunking_methodstr
Name of the method to split dataset documents. Currently supported options are listed in VectorDatabaseChunkingMethod but the values can differ with different platform versions.
- chunk_sizeint
Size of each text chunk in number of tokens.
- chunk_overlap_percentageint
Overlap percentage between chunks.
- chunks_countint
Total number of text chunks.
- separatorslist[string]
Separators for document splitting.
- creation_datestr
Date when the database was created.
- creation_user_idstr
ID of the creating user.
- organization_idstr
Creating user’s organization ID.
- tenant_idstr
Creating user’s tenant ID.
- last_update_datestr
Last update date for the database.
- execution_statusstr
Database execution status. Currently supported options are listed in VectorDatabaseExecutionStatus but the values can differ with different platform versions.
- playgrounds_countint
Number of using playgrounds.
- dataset_namestr
Name of the used dataset.
- user_namestr
Name of the creating user.
- sourcestr
Source of the vector database. Currently supported options are listed in VectorDatabaseSource but the values can differ with different platform versions.
- validation_idOptional[str]
ID of custom model vector database validation. Only filled for external vector databases.
- error_messageOptional[str]
Additional information for errored vector database.
- is_separator_regexbool
Whether the separators should be treated as regular expressions.
- classmethod create(dataset_id, chunking_parameters, use_case=None, name=None)¶
Create a new vector database.
- Parameters
- dataset_idstr
ID of the dataset used for creation.
- chunking_parametersChunkingParameters
Parameters defining how documents are split and embedded.
- use_caseOptional[Union[UseCase, str]], optional
Use case to link to the created vector database.
- namestr, optional
Vector database name, by default None which leads to the default name ‘Vector Database for <dataset name>’.
- Returns
- vector databaseVectorDatabase
The created vector database with execution status ‘new’.
- Return type
- classmethod create_from_custom_model(name, use_case=None, validation_id=None, prompt_column_name=None, target_column_name=None, deployment_id=None, model_id=None)¶
Create a new vector database from validated custom model deployment.
- Parameters
- namestr
Vector database name.
- use_caseOptional[Union[UseCase, str]], optional
Use case to link to the created vector database.
- validation_idstr, optional
ID of CustomModelVectorDatabaseValidation for the deployment. Alternatively, you can specify ALL the following fields.
- prompt_column_namestr, optional
The column name the deployed model expect as the input.
- target_column_namestr, optional
The target name deployed model will output.
- deployment_idstr, optional
ID of the deployment.
- model_idstr, optional
ID of the underlying deployment model. Can be found from the API as Deployment.model[“id”].
- Returns
- vector databaseVectorDatabase
The created vector database.
- Return type
- classmethod get(vector_database_id)¶
Retrieve a single vector database.
- Parameters
- vector_database_idstr
The ID of the vector database you want to retrieve.
- Returns
- vector databaseVectorDatabase
The requested vector database.
- Return type
- classmethod list(use_case=None, playground=None, search=None, sort=None, completed_only=None)¶
List all vector databases associated with a specific use case available to the user.
- Parameters
- use_caseOptional[UseCaseLike], optional
The returned vector databases are filtered to those associated with a specific Use Case or Cases if specified or can be inferred from the Context. Accepts either the entity or the ID.
- playgroundOptional[Union[Playground, str]], optional
The returned vector databases are filtered to those associated with a specific playground if it is specified. Accepts either the entity or the ID.
- searchstr, optional
String for filtering vector databases. Vector databases that contain the string in name will be returned. If not specified, all vector databases will be returned.
- sortstr, optional
Property to sort vector databases by. Prefix the attribute name with a dash to sort in descending order, e.g. sort=’-creationDate’. Currently supported options are listed in ListVectorDatabasesSortQueryParams but the values can differ with different platform versions. By default, the sort parameter is None which will result in vector databases being returned in order of creation time descending.
- completed_onlybool, optional
A filter to retrieve only vector databases that have been successfully created. By default, all vector databases regardless of execution status are retrieved.
- Returns
- vectorbaseslist[VectorDatabase]
A list of vector databases available to the user.
- Return type
List
[VectorDatabase
]
- update(name)¶
Update the vector database.
- Parameters
- namestr
The new name for the vector database.
- Returns
- vector databaseVectorDatabase
The updated vector database.
- Return type
- delete()¶
Delete the vector database.
- Return type
None
- classmethod get_supported_embeddings(dataset_id=None)¶
Get all supported and the recommended embedding models.
- Parameters
- dataset_idstr, optional
ID of a dataset for which the recommended model is returned based on the detected language of that dataset.
- Returns
- supported_embeddingsSupportedEmbeddings
The supported embedding models.
- Return type
- classmethod get_supported_text_chunkings()¶
Get all supported text chunking configurations which includes a set of recommended chunking parameters for each supported embedding model.
- Returns
- supported_text_chunkingsSupportedTextChunkings
The supported text chunking configurations.
- Return type
- download_text_and_embeddings_asset(file_path=None)¶
Download a parquet file with text chunks and corresponding embeddings created by a vector database.
- Parameters
- file_pathstr, optional
File path to save the asset. By default, it saves in the current directory autogenerated by server name.
- Return type
None