Experimental APIs

These features all require special permissions to be activated on your DataRobot account, and will not work otherwise. If you want to test a feature, please ask your DataRobot CFDS or account manager about enrolling in our preview program.

Classes in this list should be considered “experimental”, not fully released, and likely to change in future releases. Do not use them for production systems or other mission-critical uses.

class datarobot._experimental.models.model.Model

Bases: Model

get_feature_effect(source)

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:

source (str) – The source Feature Effects are retrieved for.

Returns:

feature_effects – The feature effects data.

Return type:

FeatureEffects

Raises:

ClientError – If the feature effects have not been computed or source is not valid value.

get_incremental_learning_metadata()

Retrieve incremental learning metadata for this model.

Added in version v3.4.0.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Returns:

metadata – a IncrementalLearningMetadata representing incremental learning metadata

Return type:

IncrementalLearningMetadata

start_incremental_learning(early_stopping_rounds=None)

Start incremental learning for this model.

Added in version v3.4.0.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters:

early_stopping_rounds (Optional[int]) – The number of chunks in which no improvement is observed that triggers the early stopping mechanism.

Return type:

None

Raises:

ClientError – if the server responded with 4xx status

train_first_incremental_from_sample()

Submit a job to the queue to perform the first incremental learning iteration training on an existing sample model. This functionality requires the SAMPLE_DATA_TO_START_PROJECT feature flag to be enabled.

Returns:

job – The created job that is retraining the model

Return type:

ModelJob

class datarobot._experimental.models.model.DatetimeModel

Bases: DatetimeModel

get_feature_effect(source, backtest_index)

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information of source, backtest_index.

Parameters:
  • source (string) – The source Feature Effects are retrieved for. One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the available sources for feature effect.

  • backtest_index (string, FeatureEffectMetadataDatetime.backtest_index.) – The backtest index to retrieve Feature Effects for.

Returns:

feature_effects – The feature effects data.

Return type:

FeatureEffects

Raises:

ClientError – If the feature effects have not been computed or source is not valid value.

datarobot._experimental.models.data_store.get_spark_session(self, db_token)

Returns a Spark session

Parameters:

db_token (str) – A personal access token.

Returns:

A spark session initialized with connection parameters taken from DataStore and provided db_token.

Return type:

SparkSession

Examples

>>> from datarobot._experimental.models.data_store import DataStore
>>> data_stores = DataStore.list(typ=DataStoreListTypes.DR_DATABASE_V1)
>>> data_stores
[DataStore('my_databricks_store_1')]
>>> db_connection = data_stores[0].get_spark_session('<token>')
>>> db_connection
<pyspark.sql.connect.session.SparkSession at 0x7f386068fbb0>
>>> df = session.read.table("samples.nyctaxi.trips")
>>> df.show()
class datarobot._experimental.models.data_store.DataStore

Bases: DataStore

A data store. Represents database

Variables:
  • id (str) – The ID of the data store.

  • data_store_type (str) – The type of data store.

  • canonical_name (str) – The user-friendly name of the data store.

  • creator (str) – The ID of the user who created the data store.

  • updated (datetime.datetime) – The time of the last update.

  • params (DataStoreParameters) – A list specifying data store parameters.

  • role (str) – Your access role for this data store.

  • driver_class_type (str) – Your access role for this data store.

class datarobot._experimental.models.retraining.RetrainingPolicy

Bases: APIObject

Retraining Policy.

Variables:
  • policy_id (str) – ID of the retraining policy

  • name (str) – Name of the retraining policy

  • description (str) – Description of the retraining policy

classmethod list(deployment_id)

Lists all retraining policies associated with a deployment

Parameters:

deployment_id (str) – Id of the deployment

Returns:

policies – List of retraining policies associated with a deployment

Return type:

list

Examples

from datarobot import Deployment
from datarobot._experimental.models.retraining import RetrainingPolicy
deployment = Deployment.get(deployment_id='620ed0e37b6ce03244f19631')
RetrainingPolicy.list(deployment.id)
>>> [RetrainingPolicy('620ed248bb0a1f5889eb6aa7'), RetrainingPolicy('624f68be8828ed81bf487d8d')]
classmethod get(deployment_id, retraining_policy_id)

Retrieves a retraining policy associated with a deployment

Parameters:
  • deployment_id (str) – Id of the deployment

  • retraining_policy_id (str) – Id of the policy

Returns:

retraining_policy – Retraining policy

Return type:

Retraining Policy

Examples

from datarobot._experimental.models.retraining import RetrainingPolicy
policy = RetrainingPolicy.get(
    deployment_id='620ed0e37b6ce03244f19631',
    retraining_policy_id='624f68be8828ed81bf487d8d'
)
policy.id
>>>'624f68be8828ed81bf487d8d'
policy.name
>>>'PolicyA'
classmethod delete(deployment_id, retraining_policy_id)

Deletes a retraining policy associated with a deployment

Parameters:
  • deployment_id (str) – Id of the deployment

  • retraining_policy_id (str) – Id of the policy

Return type:

None

Examples

from datarobot._experimental.models.retraining import RetrainingPolicy
RetrainingPolicy.delete(
    deployment_id='620ed0e37b6ce03244f19631',
    retraining_policy_id='624f68be8828ed81bf487d8d'
)
class datarobot._experimental.models.retraining.RetrainingPolicyRun

Bases: APIObject

Retraining policy run.

Variables:
  • policy_run_id (str) – ID of the retraining policy run

  • status (str) – Status of the retraining policy run

  • challenger_id (str) – ID of the challenger model retrieved after running the policy

  • error_message (str) – The error message if an error occurs during the policy run

  • model_package_id (str) – ID of the model package (version) retrieved after the policy is run

  • project_id (str) – ID of the project the deployment is associated with

  • start_time (datetime.datetime) – Timestamp of when the policy run starts

  • finish_time (datetime.datetime) – Timestamp of when the policy run finishes

classmethod list(deployment_id, retraining_policy_id)

Lists all the retraining policy runs of a retraining policy that is associated with a deployment.

Parameters:
  • deployment_id (str) – ID of the deployment

  • retraining_policy_id (str) – ID of the policy

Returns:

policy runs – List of retraining policy runs

Return type:

list

Examples

from datarobot._experimental.models.retraining import RetrainingPolicyRun
RetrainingPolicyRun.list(
    deployment_id='620ed0e37b6ce03244f19631',
    retraining_policy_id='62f4448f0dfd5699feae3e6e'
)
>>> [RetrainingPolicyRun('620ed248bb0a1f5889eb6aa7'), RetrainingPolicyRun('624f68be8828ed81bf487d8d')]
class datarobot._experimental.models.data_matching.DataMatching

Bases: APIObject

Retrieves the closest data points for the input data.

This functionality is more than the simple lookup. In order to retrieve the closest data points data matching functionality will leverage DataRobot preprocessing pipeline first and then search for the closest data points. The returned values will be the closest data points at the point of entry to the model.

There are three sets of methods supported:
  1. Methods to build the index (for project, model, featurelist). The index needs to be built first in order to search for the closest data points. Once the index is built it will be reused.

  2. Methods to search for the closest data points (for project, model, featurelist). These methods will initialize the query, await its completion and then save the result as csv file with in the specified location.

  3. Additional methods to manually list history of queries and retrieve results for them.

get_query_url(url, number_of_data=None)

Returns formatted data matching query url

Return type:

str

get_closest_data(query_file_path, number_of_data=None, max_wait=600, build_index_if_missing=True)

Retrieves closest data points to the data point in input file. If the index is missing by default the method will try to build it.

Parameters:
  • query_file_path (str) – Path to file with the data point to search closest data points

  • number_of_data (int or None) – Number of results to search for. If no value specified, the default is 10.

  • max_wait (int) – Number of seconds to wait for the result. Default is 600.

  • build_index_if_missing (Optional[bool]) – Should the index be created if it is missing. If False is specified and the index is missing, an exception is thrown. Default True.

Returns:

df – Dataframe with query result

Return type:

pd.DataFrame

get_closest_data_for_model(model_id, query_file_path, number_of_data=None, max_wait=600, build_index_if_missing=True)

Retrieves closest data points to the data point in input file. If the index is missing by default the method will try to build it.

Parameters:
  • model_id (str) – Id of the model to search for the closest data points

  • query_file_path (str) – Path to file with the data point to search closest data points

  • number_of_data (int or None) – Number of results to search for. If no value specified, the default is 10.

  • max_wait (int) – Number of seconds to wait for the result. Default is 600.

  • build_index_if_missing (Optional[bool]) – Should the index be created if it is missing. If False is specified and the index is missing, an exception is thrown. Default True.

Returns:

df – Dataframe with query result

Return type:

pd.DataFrame

get_closest_data_for_featurelist(featurelist_id, query_file_path, number_of_data=None, max_wait=600, build_index_if_missing=True)

Retrieves closest data points to the data point in input file. If the index is missing by default the method will try to build it.

Parameters:
  • featurelist_id (str) – Id of the featurelist to search for the closest data points

  • query_file_path (str) – Path to file with the data point to search closest data points

  • number_of_data (int or None) – Number of results to search for. If no value specified, the default is 10.

  • max_wait (int) – Number of seconds to wait for the result. Default is 600.

  • build_index_if_missing (bool) – Should the index be created if it is missing. If False is specified and the index is missing, the exception is thrown. Default True.

Returns:

df – Dataframe with query result

Return type:

pd.DataFrame

build_index(max_wait=600)

Builds data matching index and waits for its completion.

Parameters:

max_wait (int or None) – Seconds to wait for the completion of build index operation. Default is 600. When the 0 or None value is passed then the method will exit without awaiting for the build index operation to complete.

Return type:

None

build_index_for_featurelist(featurelist_id, max_wait=600)

Builds data matching index for featurelist and waits for its completion.

Parameters:
  • featurelist_id (str) – Id of the featurelist to build the index for

  • max_wait (int or None) – Seconds to wait for the completion of build index operation. Default is 600. When the 0 or None value is passed then the method will exit without awaiting for the build index operation to complete.

Return type:

None

build_index_for_model(model_id, max_wait=600)

Builds data matching index for feature list and waits for its completion.

Parameters:
  • model_id (str) – Id of the model to build index for

  • max_wait (int or None) – Seconds to wait for the completion of build index operation. Default is 600. When the 0 or None value is passed then the method will exit without awaiting for the build index operation to complete.

Return type:

None

list()

Lists all data matching queries for the project. Results are sorted in descending order starting from the latest to the oldest.

Return type:

List[DataMatchingQuery]

class datarobot._experimental.models.data_matching.DataMatchingQuery

Bases: APIObject

Data Matching Query object.

Represents single query for the closest data points. Once related query job is completed, its result can be retrieved and saved as csv file in specified location.

classmethod list(project_id)

Retrieves the list of queries.

Parameters:

project_id (str) – Project ID to retrieve data matching queries for

Return type:

List[DataMatchingQuery]

save_result(file_path)

Downloads the query result and saves it in file_path location.

Parameters:

file_path (str) – Path location where to save the query result

Return type:

None

get_result()

Returns the query result as dataframe.

Parameters:

df (pd.DataFrame) – Dataframe with query result

Return type:

DataFrame

class datarobot._experimental.models.model_lineage.FeatureCountByType

Bases: object

Contains information about a feature type and how many features in the dataset are of this type.

Variables:
  • feature_type (str) – The feature type grouped in this count.

  • count (int) – The number of features of this type.

feature_type: str
count: int
class datarobot._experimental.models.model_lineage.User

Bases: object

Contains information about a user.

Variables:
  • Id (str) – Id of the user.

  • full_name (Optional[str]) – Full name of the user.

  • email (Optional[str]) – Email address of the user.

  • user_hash (Optional[str]) – User’s gravatar hash.

  • user_name (Optional[str]) – Username of the user.

id: str
full_name: Optional[str]
email: Optional[str] = None
user_hash: Optional[str] = None
user_name: Optional[str] = None
class datarobot._experimental.models.model_lineage.ReferencedInUseCase

Bases: object

Contains information about the reference of a dataset in an Use Case.

Variables:
  • added_to_use_case_by (User) – User who added the dataset to the Use Case.

  • added_to_use_case_at (datetime.datetime) – Time when the dataset was added to the Use Case.

added_to_use_case_by: User
added_to_use_case_at: datetime
class datarobot._experimental.models.model_lineage.DatasetInfo

Bases: object

Contains information about the dataset.

Variables:
  • dataset_name (str) – Dataset name.

  • dataset_version_id (str) – Dataset version Id.

  • dataset_id (str) – Dataset Id.

  • number_of_rows (int) – Number of rows in the dataset.

  • file_size (int) – Size of the dataset as a CSV file, in bytes.

  • number_of_features (int) – Number of features in the dataset.

  • number_of_feature_by_type (List[FeatureCountByType]) – Number of features in the dataset, grouped by feature type.

  • referenced_in_use_case (Optional[ReferencedInUseCase]) – Information about the reference of this dataset in the Use Case. This information will only be present if the use_case_id was passed to ModelLineage.get.

dataset_name: str
dataset_version_id: str
dataset_id: str
number_of_rows: int
file_size: int
number_of_features: int
number_of_feature_by_type: List[FeatureCountByType]
referenced_in_use_case: Optional[ReferencedInUseCase] = None
class datarobot._experimental.models.model_lineage.FeatureWithMissingValues

Bases: object

Contains information about the number of missing values for one feature.

Variables:
  • feature_name (str) – Name of the feature.

  • number_of_missing_values (int) – Number of missing values for this feature.

feature_name: str
number_of_missing_values: int
class datarobot._experimental.models.model_lineage.FeaturelistInfo

Bases: object

Contains information about the featurelist.

Variables:
  • featurelist_name (str) – Featurelist name.

  • featurelist_id (str) – Featurelist Id.

  • number_of_features (int) – Number of features in the featurelist.

  • number_of_feature_by_type (List[FeatureCountByType]) – Number of features in the featurelist, grouped by feature type.

  • number_of_features_with_missing_values (int) – Number of features in the featurelist with at least one missing value.

  • number_of_missing_values (int) – Number of missing values across all features of the featurelist.

  • features_with_most_missing_values (List[FeatureWithMissingValues]) – List of features with the most missing values.

  • description (str) – Description of the featurelist.

featurelist_name: str
featurelist_id: str
number_of_features: int
number_of_feature_by_type: List[FeatureCountByType]
number_of_features_with_missing_values: int
number_of_missing_values: int
features_with_most_missing_values: List[FeatureWithMissingValues]
description: str
class datarobot._experimental.models.model_lineage.TargetInfo

Bases: object

Contains information about the target.

Variables:
  • name (str) – Name of the target feature.

  • target_type (str) – Project type resulting from selected target.

  • positive_class_label (Optional[Union[str, int, float]]) – Positive class label. For every project type except Binary Classification, this value will be null.

  • mean (Optional[float]) – Mean of the target. This field will only be available for Binary Classification, Regression, and Min Inflated projects.

name: str
target_type: str
positive_class_label: Union[str, int, float, None] = None
mean: Optional[float] = None
class datarobot._experimental.models.model_lineage.PartitionInfo

Bases: object

Contains information about project partitioning.

Variables:
  • validation_type (str) – Either CV for cross-validation or TVH for train-validation-holdout split.

  • cv_method (str) – Partitioning method used.

  • holdout_pct (float) – Percentage of the dataset reserved for the holdout set.

  • datetime_col (Optional[str]) – If a date partition column was used, the name of the column. Note that datetime_col applies to an old partitioning method no longer supported for new projects, as of API version v2.0.

  • datetime_partition_column (Optional[str]) – If a datetime partition column was used, the name of the column.

  • validation_pct (Optional[float]) – If train-validation-holdout split was used, the percentage of the dataset used for the validation set.

  • reps (Optional[float]) – If cross validation was used, the number of folds to use.

  • cv_holdout_level (Optional[Union[str, float, int]]) – If a user partition column was used with cross validation, the value assigned to the holdout set.

  • holdout_level (Optional[Union[str, float, int]]) – If a user partition column was used with train-validation-holdout split, the value assigned to the holdout set.

  • user_partition_col (Optional[str]) – If a user partition column was used, the name of the column.

  • training_level (Optional[Union[str, float, int]]) – If a user partition column was used with train-validation-holdout split, the value assigned to the training set.

  • partition_key_cols (Optional[List[str]]) – A list containing a single string - the name of the group partition column.

  • validation_level (Optional[Union[str, float, int]]) – If a user partition column was used with train-validation-holdout split, the value assigned to the validation set.

  • use_time_series (Optional[bool]) – A boolean value indicating whether a time series project was created by using datetime partitioning. Otherwise, datetime partitioning created an OTV project.

validation_type: str
cv_method: str
holdout_pct: float
datetime_col: Optional[str] = None
datetime_partition_column: Optional[str] = None
validation_pct: Optional[float] = None
reps: Optional[float] = None
cv_holdout_level: Union[str, int, float, None] = None
holdout_level: Union[str, int, float, None] = None
user_partition_col: Optional[str] = None
training_level: Union[str, int, float, None] = None
partition_key_cols: Optional[List[str]] = None
validation_level: Union[str, int, float, None] = None
use_time_series: Optional[bool] = None
class datarobot._experimental.models.model_lineage.ProjectInfo

Bases: object

Contains information about the project.

Variables:
  • project_name (str) – Name of the project.

  • project_id (str) – Project Id.

  • partition (PartitionInfo) – Partitioning settings of the project.

  • metric (str) – Project metric used to select the best-performing models.

  • created_by (User) – User who created the project.

  • created_at (Optional[datetime.datetime]) – Time when the project was created.

  • target (Optional[TargetInfo]) – Information about the target.

project_name: str
project_id: str
partition: PartitionInfo
metric: str
created_by: User
created_at: Optional[datetime] = None
target: Optional[TargetInfo] = None
class datarobot._experimental.models.model_lineage.ModelInfo

Bases: object

Contains information about the model.

Variables:
  • blueprint_tasks (List[str]) – Tasks that make up the blueprint.

  • blueprint_id (str) – Blueprint Id.

  • model_type (str) – Model type.

  • sample_size (Optional[int]) – Number of rows this model was trained on.

  • sample_percentage (Optional[float]) – Percentage of the dataset the model was trained on.

  • milliseconds_to_predict_1000_rows (Optional[float]) – Estimate of how many millisecond it takes to predict 1000 rows. The estimate is based on the time it took to predict the holdout set.’

  • serialized_blueprint_file_size (Optional[int]) – Size of the serialized blueprint, in bytes.

blueprint_tasks: List[str]
blueprint_id: str
model_type: str
sample_size: Optional[int]
sample_percentage: Optional[float] = None
milliseconds_to_predict_1000_rows: Optional[float] = None
serialized_blueprint_file_size: Optional[int] = None
class datarobot._experimental.models.model_lineage.ModelLineage

Bases: APIObject

Contains information about the lineage of a model.

Variables:
  • dataset (DatasetInfo) – Information about the dataset this model was created with.

  • featurelist (FeaturelistInfo) – Information about the featurelist used to train this model.

  • project (ProjectInfo) – Information about the project this model was created in.

  • model (ModelInfo) – Information about the model itself.

classmethod get(model_id, use_case_id=None)

Retrieve lineage information about a trained model. If you pass the optional use_case_id parameter, this class will contain additional information.

Parameters:
  • model_id (str) – Model Id.

  • use_case_id (Optional[str]) – Use Case Id.

Return type:

ModelLineage

class datarobot._experimental.models.notebooks.ManualRunPayload

Bases: dict

notebook_id: str
title: Optional[str]
notebook_path: Optional[str]
parameters: Optional[List[Dict[str, str]]]
class datarobot._experimental.models.notebooks.NotebookUser

Bases: APIObject

A user associated with a Notebook.

Variables:
  • id (str) – The ID of the user.

  • activated (bool) – Whether or not the user is enabled.

  • username (str) – The username of the user, usually their email address.

  • first_name (str) – The first name of the user.

  • last_name (str) – The last name of the user.

  • gravatar_hash (Optional[str]) – The gravatar hash of the user. Optional.

  • tenant_phase (Optional[str]) – The phase that the user’s tenant is in. Optional.

class datarobot._experimental.models.notebooks.NotebookSession

Bases: APIObject

Information about the current status of a Notebook.

Variables:
  • status (NotebookStatus) – The current status of the Notebook kernel.

  • notebook_id (str) – The ID of the Notebook.

  • started_at (Optional[str]) – The date and time when the notebook was started. Optional.

  • notebook_type (Optional[NotebookType]) – The type of the notebook - either plain or codespace. Optional.

  • session_type (Optional[RunType]) – The type of the run - either manual (triggered via UI or API) or scheduled. Optional.

class datarobot._experimental.models.notebooks.NotebookActivity

Bases: APIObject

A record of activity (i.e. last run, updated, etc.) in a Notebook.

Variables:
  • at (str) – The time of the activity in the notebook.

  • by (NotebookUser) – The user who performed the activity.

class datarobot._experimental.models.notebooks.NotebookSettings

Bases: APIObject

Settings for a DataRobot Notebook.

Variables:
  • show_line_numbers (bool) – Whether line numbers in cells should be displayed.

  • hide_cell_titles (bool) – Whether cell titles should be displayed.

  • hide_cell_outputs (bool) – Whether the cell outputs should be displayed.

  • show_scrollers (bool) – Whether scroll bars should be shown on cells.

  • hide_cell_footers (bool) – Whether footers should be shown on cells.

class datarobot._experimental.models.notebooks.ScheduledRunRevisionMetadata

Bases: APIObject

DataRobot Notebook Revision Metadata specifically for a scheduled run.

Both id and name can be null if for example the job is still running or has failed.

Variables:
  • id (Optional[str]) – The ID of the Notebook Revision. Optional.

  • name (Optional[str]) – The name of the Notebook Revision. Optional.

class datarobot._experimental.models.notebooks.NotebookScheduledRun

Bases: APIObject

DataRobot Notebook Scheduled Run. A historical run of a notebook schedule.

Variables:
  • id (str) – The ID of the Notebook Scheduled Job.

  • use_case_id (str) – The Use Case ID of the Notebook Scheduled Job.

  • status (str) – The status of the run.

  • payload (ScheduledJobPayload) – The payload used for the background job.

  • title (Optional[str]) – The title of the job. Optional.

  • start_time (Optional[str]) – The start time of the job. Optional.

  • end_time (Optional[str]) – The end time of the job. Optional.

  • revision (ScheduledRunRevisionMetadata) – Notebook revision data - ID and name.

  • duration (Optional[int]) – The job duration in seconds. May be None for example while the job is running. Optional.

  • run_type (Optional[RunType]) – The type of the run - either manual (triggered via UI or API) or scheduled. Optional.

  • notebook_type (Optional[NotebookType]) – The type of the notebook - either plain or codespace. Optional.

class datarobot._experimental.models.notebooks.ScheduledJobParam

Bases: APIObject

DataRobot Schedule Job Parameter.

Variables:
  • name (str) – The name of the parameter.

  • value (str) – The value of the parameter.

class datarobot._experimental.models.notebooks.ScheduledJobPayload

Bases: APIObject

DataRobot Schedule Job Payload.

Variables:
  • uid (str) – The ID of the user who created the Notebook Schedule.

  • org_id (str) – The ID of the user’s organization who created the Notebook Schedule.

  • use_case_id (str) – The ID of the Use Case that the Notebook belongs to.

  • notebook_id (str) – The ID of Notebook being run on a schedule.

  • notebook_name (str) – The name of Notebook being run on a schedule.

  • run_type (RunType) – The type of the run - either manual (triggered via UI or API) or scheduled.

  • notebook_type (NotebookType) – The type of the notebook - either plain or codespace.

  • parameters (List[ScheduledJobParam]) – The parameters being used in the Notebook Schedule. Can be an empty list.

  • notebook_path (Optional[str]) – The path of the notebook to execute within the Codespace. Optional. Required if notebook is in a Codespace.

class datarobot._experimental.models.notebooks.NotebookScheduledJob

Bases: APIObject

DataRobot Notebook Schedule. A scheduled job that runs a notebook.

Variables:
  • id (str) – The ID of the Notebook Scheduled Job.

  • enabled (bool) – Whether job is enabled or not.

  • next_run_time (str) – The next time the job is scheduled to run (assuming it is enabled).

  • run_type (RunType) – The type of the run - either manual (triggered via UI or API) or scheduled.

  • notebook_type (NotebookType) – The type of the notebook - either plain or codespace.

  • job_payload (ScheduledJobPayload) – The payload used for the background job.

  • title (Optional[str]) – The title of the job. Optional.

  • schedule (Optional[str]) – Cron-like string to define how frequently job should be run. Optional.

  • schedule_localized (Optional[str]) – A human-readable localized version of the schedule. Example in English is ‘At 42 minutes past the hour’. Optional.

  • last_successful_run (Optional[str]) – The last time the job was run successfully. Optional.

  • last_failed_run (Optional[str]) – The last time the job failed. Optional.

  • last_run_time (Optional[str]) – The last time the job was run (failed or successful). Optional.

property use_case_id: str
classmethod get(use_case_id, scheduled_job_id)

Retrieve a single notebook schedule.

Parameters:

scheduled_job_id (str) – The ID of the notebook schedule you want to retrieve.

Returns:

notebook_schedule – The requested notebook schedule.

Return type:

NotebookScheduledJob

Examples

from datarobot._experimental.models.notebooks import NotebookScheduledJob

notebook_schedule = NotebookScheduledJob.get(
    use_case_id="654ad653c6c1e889e8eab12e",
    scheduled_job_id="65734fe637157200e28bf688",
)
get_job_history()

Retrieve list of historical runs for the notebook schedule.

Returns:

notebook_scheduled_runs – The list of historical runs for the notebook schedule.

Return type:

List[NotebookScheduledRun]

Examples

from datarobot._experimental.models.notebooks import NotebookScheduledJob

notebook_schedule = NotebookScheduledJob.get(
    use_case_id="654ad653c6c1e889e8eab12e",
    scheduled_job_id="65734fe637157200e28bf688",
)
notebook_scheduled_runs = notebook_schedule.get_job_history()
wait_for_completion(max_wait=600)

Wait for the completion of a scheduled notebook and return the revision ID corresponding to the run’s output.

Parameters:

max_wait (int) – The number of seconds to wait before giving up.

Returns:

revision_id – Returns either revision ID or message describing current state.

Return type:

str

Examples

from datarobot._experimental.models.notebooks import Notebook

notebook = Notebook.get(notebook_id='6556b00dcc4ea0bb7ea48121')
manual_run = notebook.run()
revision_id = manual_run.wait_for_completion()
class datarobot._experimental.models.notebooks.Notebook

Bases: APIObject, BrowserMixin

Metadata for a DataRobot Notebook accessible to the user.

Variables:
  • id (str) – The ID of the Notebook.

  • name (str) – The name of the Notebook.

  • type (NotebookType) – The type of the Notebook. Can be “plain” or “codespace”.

  • permissions (List[NotebookPermissions]) – The permissions the user has for the Notebook.

  • tags (List[str]) – Any tags that have been added to the Notebook. Default is an empty list.

  • created (NotebookActivity) – Information on when the Notebook was created and who created it.

  • updated (NotebookActivity) – Information on when the Notebook was updated and who updated it.

  • last_viewed (NotebookActivity) – Information on when the Notebook was last viewed and who viewed it.

  • settings (NotebookSettings) – Information on global settings applied to the Notebook.

  • org_id (Optional[str]) – The organization ID associated with the Notebook.

  • tenant_id (Optional[str]) – The tenant ID associated with the Notebook.

  • description (Optional[str]) – The description of the Notebook. Optional.

  • session (Optional[NotebookSession]) – Metadata on the current status of the Notebook and its kernel. Optional.

  • use_case_id (Optional[str]) – The ID of the Use Case the Notebook is associated with. Optional.

  • use_case_name (Optional[str]) – The name of the Use Case the Notebook is associated with. Optional.

  • has_schedule (bool) – Whether or not the notebook has a schedule.

  • has_enabled_schedule (bool) – Whether or not the notebook has a currently enabled schedule.

property is_standalone: bool
property is_codespace: bool
get_uri()
Returns:

url – Permanent static hyperlink to this Notebook in its Use Case or standalone.

Return type:

str

classmethod get(notebook_id)

Retrieve a single notebook.

Parameters:

notebook_id (str) – The ID of the notebook you want to retrieve.

Returns:

notebook – The requested notebook.

Return type:

Notebook

Examples

from datarobot._experimental.models.notebooks import Notebook

notebook = Notebook.get(notebook_id='6556b00dcc4ea0bb7ea48121')
download_revision(revision_id, file_path=None, filelike=None)

Downloads the notebook as a JSON (.ipynb) file for the specified revision.

Parameters:
  • file_path (string, optional) – The destination to write the file to.

  • filelike (file, optional) – A file-like object to write to. The object must be able to write bytes. The user is responsible for closing the object.

Return type:

None

Examples

from datarobot._experimental.models.notebooks import Notebook

notebook = Notebook.get(notebook_id='6556b00dcc4ea0bb7ea48121')
manual_run = notebook.run()
revision_id = manual_run.wait_for_completion()
notebook.download_revision(revision_id=revision_id, file_path="./results.ipynb")
delete()

Delete a single notebook :rtype: None

Examples

from datarobot._experimental.models.notebooks import Notebook

notebook = Notebook.get(notebook_id='6556b00dcc4ea0bb7ea48121')
notebook.delete()
classmethod list(created_before=None, created_after=None, order_by=None, tags=None, owners=None, query=None, use_cases=None)

List all Notebooks available to the user.

Parameters:
  • created_before (Optional[str]) – List Notebooks created before a certain date. Optional.

  • created_after (Optional[str]) – List Notebooks created after a certain date. Optional.

  • order_by (Optional[str]) – Property to sort returned Notebooks. Optional. Supported properties are “name”, “created”, “updated”, “tags”, and “lastViewed”. Prefix the attribute name with a dash to sort in descending order, e.g. order_by=’-created’. By default, the order_by parameter is None.

  • tags (Optional[List[str]]) – A list of tags that returned Notebooks should be associated with. Optional.

  • owners (Optional[List[str]]) – A list of user IDs used to filter returned Notebooks. The respective users share ownership of the Notebooks. Optional.

  • query (Optional[str]) – A specific regex query to use when filtering Notebooks. Optional.

  • use_cases (Optional[UseCase or List[UseCase] or str or List[str]]) – Filters returned Notebooks by a specific Use Case or Cases. Accepts either the entity or the ID. Optional. If set to [None], the method filters the notebook’s datasets by those not linked to a UseCase.

Returns:

notebooks – A list of Notebooks available to the user.

Return type:

List[Notebook]

Examples

from datarobot._experimental.models.notebooks import Notebook

notebooks = Notebook.list()
run(title=None, notebook_path=None, parameters=None)

Create a manual scheduled job that runs the notebook.

Notes

The notebook must be part of a Use Case. If the notebook is in a Codespace then notebook_path is required.

Parameters:
  • title (Optional[str]) – The title of the background job. Optional.

  • notebook_path (Optional[str]) – The path of the notebook to execute within the Codespace. Required if notebook is in a Codespace.

  • parameters (Optional[List[Dict[str, str]]]) – A list of dictionaries of key value pairs representing environment variables predefined in the notebook. Optional.

Returns:

notebook_scheduled_job – The created notebook schedule job.

Return type:

NotebookScheduledJob

Raises:

InvalidUsageError – If attempting to create a manual scheduled run for a Codespace without a notebook path.

Examples

from datarobot._experimental.models.notebooks import Notebook

notebook = Notebook.get(notebook_id='6556b00dcc4ea0bb7ea48121')
manual_run = notebook.run()

# Alternatively, with title and parameters:
# manual_run = notebook.run(title="My Run", parameters=[{"FOO": "bar"}])

revision_id = manual_run.wait_for_completion()
class datarobot._experimental.models.incremental_learning.IncrementalLearningItem

Bases: TypedDict

chunk_index: int
data_stage_id: str
status: str
model_id: str
parent_model_id: str
sample_pct: Optional[float]
training_row_count: Optional[int]
score: Optional[float]
class datarobot._experimental.models.incremental_learning.IncrementalLearningMetadata

Bases: APIObject

Incremental learning metadata for an incremental model.

Added in version v3.4.0.

Variables:
  • project_id (str) – The project ID.

  • model_id (str) – The model ID.

  • user_id (str) – The ID of the user who started incremental learning.

  • featurelist_id (str) – The ID of the featurelist the model is using.

  • status (str) – The status of incremental training. One of datarobot._experimental.models.enums.IncrementalLearningStatus.

  • items (List[IncrementalLearningItemDoc]) – An array of incremental learning items associated with the sequential order of chunks. See incremental item info in Notes for more details.

  • sample_pct (float) – The sample size in percents (1 to 100) to use in training.

  • training_row_count (int) – The number of rows used to train a model.

  • score (float) – The validation score of the model.

  • metric (str) – The name of the scoring metric.

  • early_stopping_rounds (int) – The number of chunks in which no improvement is observed that triggers the early stopping mechanism.

  • total_number_of_chunks (int) – The total number of chunks.

  • model_number (int) – The number of the model in the project.

Notes

Incremental item is a dict containing the following:

  • chunk_index: int

    The incremental learning order in which chunks are trained.

  • status: str

    The status of training current chunk. One of datarobot._experimental.models.enums.IncrementalLearningItemStatus

  • model_id: str

    The ID of the model associated with the current item (chunk).

  • parent_model_id: str

    The ID of the model based on which the current item (chunk) is trained.

  • data_stage_id: str

    The ID of the data stage.

  • sample_pct: float

    The cumulative percentage of the base dataset size used for training the model.

  • training_row_count: int

    The number of rows used to train a model.

  • score: float

    The validation score of the current model

class datarobot._experimental.models.chunking_service.ChunkStorage

Bases: APIObject

The chunk storage location for the data chunks.

Variables:
  • storage_reference_id (str) – The ID of the storage entity.

  • chunk_storage_type (str) – The type of the chunk storage.

  • version_id (str) – The catalog version ID. This will only be used if the storage type is “AI Catalog”.

class datarobot._experimental.models.chunking_service.Chunk

Bases: APIObject

Data chunk object that holds metadata about a chunk.

Variables:
  • id (str) – The ID of the chunk entity.

  • chunk_definition_id (str) – The ID of the dataset chunk definition the chunk belongs to.

  • limit (int) – The number of rows in the chunk.

  • offset (int) – The offset in the dataset to create the chunk.

  • chunk_index (str) – The index of the chunk if chunks are divided uniformly. Otherwise, it is None.

  • data_source_id (str) – The ID of the data request used to create the chunk.

  • chunk_storage (ChunkStorage) – A list of storage locations where the chunk is stored.

get_chunk_storage_id(storage_type)

Get storage location ID for the chunk.

Parameters:

storage_type (ChunkStorageType) – The storage type where the chunk is stored.

Returns:

storage_reference_id – An ID that references the storage location for the chunk.

Return type:

str

get_chunk_storage_version_id(storage_type)

Get storage version ID for the chunk.

Parameters:

storage_type (ChunkStorageType) – The storage type where the chunk is stored.

Returns:

storage_reference_id – A catalog version ID associated with the AI Catalog dataset ID.

Return type:

str

class datarobot._experimental.models.chunking_service.DatasourceDefinition

Bases: APIObject

Data source definition that holds information of data source for API responses. Do not use this to ‘create’ DatasourceDefinition objects directly, use DatasourceAICatalogInfo and DatasourceDataWarehouseInfo.

Variables:
  • id (str) – The ID of the data source definition.

  • data_store_id (str) – The ID of the data store.

  • credentials_id (str) – The ID of the credentials.

  • table (str) – The data source table name.

  • schema (str) – The offset into the dataset to create the chunk.

  • catalog (str) – The database or catalog name.

  • storage_origin (str) – The origin data source or data warehouse (e.g., Snowflake, BigQuery).

  • data_source_id (str) – The ID of the data request used to generate sampling and metadata.

  • total_rows (str) – The total number of rows in the dataset.

  • source_size (str) – The size of the dataset.

  • estimated_size_per_row (str) – The estimated size per row.

  • columns (str) – The list of column names in the dataset.

  • order_by_columns (List[str]) – A list of columns used to sort the dataset.

  • is_descending_order (bool) – Orders the direction of the data. Defaults to False, ordering from smallest to largest.

  • select_columns (List[str]) – A list of columns to select from the dataset.

  • datetime_partition_column (str) – The datetime partition column name used in OTV projects.

  • validation_pct (float) – The percentage threshold between 0.1 and 1.0 for the first chunk validation.

  • validation_limit_pct (float) – The percentage threshold between 0.1 and 1.0 for the validation kept.

  • validation_start_date (datetime.datetime) – The start date for validation.

  • validation_end_date (datetime.datetime) – The end date for validation.

  • training_end_date (datetime.datetime) – The end date for training.

  • latest_timestamp (datetime.datetime) – The latest timestamp.

  • earliest_timestamp (datetime.datetime) – The earliest timestamp.

class datarobot._experimental.models.chunking_service.DatasourceDataWarehouseInfo

Bases: APIObject

Data source information used at creation time with dataset chunk definition. Data warehouses supported: Snowflake, BigQuery, Databricks

Variables:
  • name (str) – The optional custom name of the data source.

  • table (str) – The data source table name or AI Catalog dataset name.

  • storage_origin (str) – The origin data source or data warehouse (e.g., Snowflake, BigQuery).

  • data_store_id (str) – The ID of the data store.

  • credentials_id (str) – The ID of the credentials.

  • schema (str) – The offset into the dataset to create the chunk.

  • catalog (str) – The database or catalog name.

  • data_source_id (str) – The ID of the data request used to generate sampling and metadata.

  • order_by_columns (List[str]) – A list of columns used to sort the dataset.

  • is_descending_order (bool) – Orders the direction of the data. Defaults to False, ordering from smallest to largest.

  • select_columns (List[str]) – A list of columns to select from the dataset.

  • datetime_partition_column (str) – The datetime partition column name used in OTV projects.

  • validation_pct (float) – The percentage threshold between 0.1 and 1.0 for the first chunk validation.

  • validation_limit_pct (float) – The percentage threshold between 0.1 and 1.0 for the validation kept.

  • validation_start_date (datetime.datetime) – The start date for validation.

  • validation_end_date (datetime.datetime) – The end date for validation.

  • training_end_date (datetime.datetime) – The end date for training.

  • latest_timestamp (datetime.datetime) – The latest timestamp.

  • earliest_timestamp (datetime.datetime) – The earliest timestamp.

to_dict()
Return type:

Dict[str, Any]

class datarobot._experimental.models.chunking_service.DatasourceAICatalogInfo

Bases: APIObject

AI Catalog data source information used at creation time with dataset chunk definition.

Variables:
  • name (str) – The optional custom name of the data source.

  • table (str) – The data source table name or AI Catalog dataset name.

  • storage_origin (str) – The origin data source, always AI Catalog type.

  • catalog_id (str) – The ID of the AI Catalog dataset.

  • catalog_version_id (str) – The ID of the AI Catalog dataset version.

  • order_by_columns (List[str]) – A list of columns used to sort the dataset.

  • is_descending_order (bool) – Orders the direction of the data. Defaults to False, ordering from smallest to largest.

  • select_columns (List[str]) – A list of columns to select from the dataset.

  • datetime_partition_column (str) – The datetime partition column name used in OTV projects.

  • validation_pct (float) – The percentage threshold between 0.1 and 1.0 for the first chunk validation.

  • validation_limit_pct (float) – The percentage threshold between 0.1 and 1.0 for the validation kept.

  • validation_start_date (datetime.datetime) – The start date for validation.

  • validation_end_date (datetime.datetime) – The end date for validation.

  • training_end_date (datetime.datetime) – The end date for training.

  • latest_timestamp (datetime.datetime) – The latest timestamp.

  • earliest_timestamp (datetime.datetime) – The earliest timestamp.

to_dict()
Return type:

Dict[str, Any]

class datarobot._experimental.models.chunking_service.DatasetChunkDefinition

Bases: APIObject

Dataset chunking definition that holds information about how to chunk the dataset.

Variables:
  • id (str) – The ID of the dataset chunk definition.

  • user_id (str) – The ID of the user who created the definition.

  • name (str) – The name of the dataset chunk definition.

  • project_starter_chunk_size (int) – The size, in bytes, of the project starter chunk.

  • user_chunk_size (int) – Chunk size in bytes.

  • datasource_definition_id (str) – The data source definition ID associated with the dataset chunk definition.

  • chunking_type (ChunkingType) –

    The type of chunk creation from the dataset. All possible chunking types can be found under ChunkingType enum, that can be imported from datarobot._experimental.models.enums Types include:

    • INCREMENTAL_LEARNING for non-time aware projects that use a chunk index to create chunks.

    • INCREMENTAL_LEARNING_OTV for OTV projects that use a chunk index to create chunks.

    • SLICED_OFFSET_LIMIT for any dataset in which user provides offset and limit to create chunks.

    SLICED_OFFSET_LIMIT has no indexed based chunks aka method create_by_index() not supported.

classmethod get(dataset_chunk_definition_id)

Retrieve a specific dataset chunk definition metadata.

Parameters:

dataset_chunk_definition_id (str) – The ID of the dataset chunk definition.

Returns:

dataset_chunk_definition – The queried instance.

Return type:

DatasetChunkDefinition

classmethod list(limit=50, offset=0)

Retrieves a list of dataset chunk definitions

Parameters:
  • limit (int) – The maximum number of objects to return. Default is 50.

  • offset (int) – The starting offset of the results. Default is 0.

Returns:

dataset_chunk_definitions – The list of dataset chunk definitions.

Return type:

List[DatasetChunkDefinition]

classmethod create(name, project_starter_chunk_size, user_chunk_size, datasource_info, chunking_type=ChunkingType.INCREMENTAL_LEARNING)

Create a dataset chunk definition. Required for both index-based and custom chunks.

In order to create a dataset chunk definition, you must first:

  • Create a data connection to the target data source via dr.DataStore.create()

  • Create credentials that must be attached to the data connection via dr.Credential.create()

If you have an existing data connections and credentials:

  • Retrieve the data store ID by the canonical name via:

    • [ds for ds in dr.DataStore.list() if ds.canonical_name == <name>][0].id

  • Retrieve the credential ID by the name via:

    • [cr for cr in dr.Credential.list() if ds.name == <name>][0].id

You must create the required ‘datasource_info’ object with the datasource information that corresponds to your use case:

  • DatasourceAICatalogInfo for AI catalog datasets.

  • DatasourceDataWarehouseInfo for Snowflake, BigQuery, or other data warehouse.

Parameters:
  • name (str) – The name of the dataset chunk definition.

  • project_starter_chunk_size (int) – The size, in bytes, of the first chunk. Used to start a DataRobot project.

  • user_chunk_size (int) – The size, in bytes, of the user-defined incremental chunk.

  • datasource_info (Union[DatasourceDataWarehouseInfo, DatasourceAICatalogInfo]) – The object that contains the information of the data source.

  • chunking_type (ChunkingType) –

    The type of chunk creation from the dataset. All possible chunking types can be found under ChunkingType enum, that can be imported from datarobot._experimental.models.enums Types include:

    • INCREMENTAL_LEARNING for non-time aware projects that use a chunk index to create chunks.

    • INCREMENTAL_LEARNING_OTV for OTV projects that use a chunk index to create chunks.

    • SLICED_OFFSET_LIMIT for any dataset in which user provides offset and limit to create chunks.

    SLICED_OFFSET_LIMIT has no indexed based chunks aka method create_by_index() not supported. The default type is ChunkingType.INCREMENTAL_LEARNING

Returns:

dataset_chunk_definition – An instance of a created dataset chunk definition.

Return type:

DatasetChunkDefinition

classmethod get_datasource_definition(dataset_chunk_definition_id)

Retrieves the data source definition associated with a dataset chunk definition.

Parameters:

dataset_chunk_definition_id (str) – id of the dataset chunk definition

Returns:

datasource_definition – an instance of created datasource definition

Return type:

DatasourceDefinition

classmethod get_chunk(dataset_chunk_definition_id, chunk_id)

Retrieves a specific data chunk associated with a dataset chunk definition

Parameters:
  • dataset_chunk_definition_id (str) – id of the dataset chunk definition

  • chunk_id (str) – id of the chunk

Returns:

chunk – an instance of created chunk

Return type:

Chunk

classmethod list_chunks(dataset_chunk_definition_id)

Retrieves all data chunks associated with a dataset chunk definition

Parameters:

dataset_chunk_definition_id (str) – id of the dataset chunk definition

Returns:

chunks – a list of chunks

Return type:

List[Chunk]

analyze_dataset(max_wait_time=600)

Analyzes the data source to retrieve and compute metadata about the dataset.

Depending on the size of the data set, adding order_by_columns to the dataset chunking definition will increase the execution time to create the data chunk. Set the max_wait_time for the appropriate wait time.

Parameters:

max_wait_time (int) – maximum time to wait for completion

Returns:

datasource_definition – an instance of created datasource definition

Return type:

DatasourceDefinition

create_chunk(limit, offset=0, storage_type=ChunkStorageType.DATASTAGE, max_wait_time=600)

Creates a data chunk using the limit and offset. By default, the data chunk is stored in data stages.

Depending on the size of the data set, adding order_by_columns to the dataset chunking definition will increase the execution time to retrieve or create the data chunk. Set the max_wait_time for the appropriate wait time.

Parameters:
  • limit (int) – The maximum number of rows.

  • offset (int) – The offset into the dataset (where reading begins).

  • storage_type (ChunkStorageType) – The storage location of the chunk.

  • max_wait_time (int) – maximum time to wait for completion

Returns:

chunk – An instance of a created or updated chunk.

Return type:

Chunk

create_chunk_by_index(index, storage_type=ChunkStorageType.DATASTAGE, max_wait_time=600)

Creates a data chunk using the limit and offset. By default, the data chunk is stored in data stages.

Depending on the size of the data set, adding order_by_columns to the dataset chunking definition will increase the execution time to retrieve or create the data chunk. Set the max_wait_time for the appropriate wait time.

Parameters:
  • index (int) – The index of the chunk.

  • storage_type (ChunkStorageType) – The storage location of the chunk.

  • max_wait_time (int) – maximum time to wait for completion

Returns:

chunk – An instance of a created or updated chunk.

Return type:

Chunk

classmethod patch_validation_dates(dataset_chunk_definition_id, validation_start_date, validation_end_date)

Updates the data source definition validation dates associated with a dataset chunk definition. In order to set the validation dates appropriately, both start and end dates should be specified. This method can only be used for INCREMENTAL_LEARNING_OTV dataset chunk definitions and its associated datasource definition.

Parameters:
  • dataset_chunk_definition_id (str) – The ID of the dataset chunk definition.

  • validation_start_date (datetime.datetime) – The start date of validation scoring data. Internally converted to format ‘%Y-%m-%d %H:%M:%S’, the timezone defaults to UTC.

  • validation_end_date (datetime.datetime) – The end date of validation scoring data. Internally converted to format ‘%Y-%m-%d %H:%M:%S’, the timezone defaults to UTC.

Returns:

datasource_definition – An instance of created datasource definition.

Return type:

DatasourceDefinition

class datarobot._experimental.models.chunking_service_v2.DatasetProps

Bases: APIObject

The dataset props for a catalog dataset.

Variables:
  • dataset_id (str) – The ID of the AI Catalog dataset.

  • dataset_version_id (str) – The ID of the AI Catalog dataset version.

class datarobot._experimental.models.chunking_service_v2.DatasetInfo

Bases: APIObject

The dataset information.

Variables:
  • total_rows (str) – The total number of rows in the dataset.

  • source_size (str) – The size of the dataset.

  • estimated_size_per_row (str) – The estimated size per row.

  • columns (str) – The list of column names in the dataset.

  • dialect (str) – The sql dialect associated with the dataset (e.g., Snowflake, BigQuery, Spark).

  • data_store_id (str) – The ID of the data store.

  • data_source_id (str) – The ID of the data request used to generate sampling and metadata.

class datarobot._experimental.models.chunking_service_v2.DynamicDatasetProps

Bases: APIObject

The dataset props for a dynamic dataset.

Variables:

credentials_id (str) – The ID of the credentials.

class datarobot._experimental.models.chunking_service_v2.DatasetDefinition

Bases: APIObject

Dataset definition that holds information of dataset for API responses.

Variables:
  • id (str) – The ID of the data source definition.

  • creator_user_id (str) – The ID of the user.

  • dataset_props (DatasetProps) – The properties of the dataset in catalog.

  • dynamic_dataset_props (DynamicDatasetProps) – The properties of the dynamic dataset.

  • dataset_info (DatasetInfo) – The information about the dataset.

  • name (str) – The optional custom name of the dataset definition.

classmethod from_data(data)

Properly convert composition classes.

Return type:

DatasetDefinition

classmethod create(dataset_id, dataset_version_id, name=None)

Create a dataset definition.

In order to create a dataset definition, you must first have an existing dataset in the Data Registry. A dataset can be uploaded using dr.Dataset.create_from_file if you have a file for example

If you have an existing dataset in the Data Registry:

  • Retrieve the dataset ID by the canonical name via:

    • [cr for cr in dr.Dataset.list() if cr.name == <name>][0].id

  • Retrieve the dataset version ID by the name via:

    • [cr for cr in dr.Dataset.list() if cr.name == <name>][0].version_id

Parameters:
  • dataset_id (str) – The ID of the AI Catalog dataset.

  • dataset_version_id (str) – The optional ID of the AI Catalog dataset version.

  • name (str) – The optional custom name of the dataset definition.

Returns:

dataset_definition – An instance of a created dataset definition.

Return type:

DatasetDefinition

classmethod get(dataset_definition_id)

Retrieve a specific dataset definition metadata.

Parameters:

dataset_definition_id (str) – The ID of the dataset definition.

Returns:

dataset_definition_id – The queried instance.

Return type:

DatasetDefinition

classmethod delete(dataset_definition_id)

Delete a specific dataset definition

Parameters:

dataset_definition_id (str) – The ID of the dataset definition.

Return type:

None

classmethod list()

List all dataset definitions

Return type:

A list of DatasetDefinition

classmethod analyze(dataset_definition_id, max_wait=600)

Analyze a specific dataset definition

Parameters:
  • dataset_definition_id (str) – The ID of the dataset definition.

  • max_wait (Optional[int]) – Time in seconds after which analyze is considered unsuccessful

Return type:

None

class datarobot._experimental.models.chunking_service_v2.RowsChunkDefinition

Bases: APIObject

The rows chunk information.

Variables:
  • order_by_columns (List[str]) – List of the sorting column names.

  • is_descending_order (bool) – The sorting order. Defaults to False, ordering from smallest to largest.

  • target_column (str) – The target column.

  • target_class (str) – For binary target, one of the possible values. For zero inflated, will be ‘0’.

  • user_group_column (str) – The user group column.

  • datetime_partition_column (str) – The datetime partition column name used in OTV projects.

  • otv_validation_start_date (datetime.datetime) – The start date for the validation set.

  • otv_validation_end_date (datetime.datetime) – The end date for the validation set.

  • otv_training_end_date (datetime.datetime) – The end date for the training set.

  • otv_latest_timestamp (datetime.datetime) – The latest timestamp, this field is auto generated.

  • otv_earliest_timestamp (datetime.datetime) – The earliest timestamp, this field is auto generated.

  • otv_validation_downsampling_pct (float) – The percentage of the validation set to downsample, this field is auto generated.

class datarobot._experimental.models.chunking_service_v2.FeaturesChunkDefinition

Bases: APIObject

The features chunk information.

class datarobot._experimental.models.chunking_service_v2.ChunkDefinitionStats

Bases: APIObject

The chunk stats information.

Variables:
  • expected_chunk_size (int) – The expected chunk size, this field is auto generated.

  • number_of_rows_per_chunk (int) – The number of rows per chunk, this field is auto generated.

  • total_number_of_chunks (int) – The total number of chunks, this field is auto generated.

class datarobot._experimental.models.chunking_service_v2.ChunkDefinition

Bases: APIObject

The chunk information.

Variables:
  • id (str) – The ID of the chunk entity.

  • dataset_definition_id (str) – The ID of the dataset definition.

  • name (str) – The name of the chunk entity.

  • is_readonly (bool) – The read only flag.

  • partition_method (str) – The partition method used to create chunks, either ‘random’, ‘stratified’, or ‘date’.

  • chunking_strategy_type (str) – The chunking strategy type, either ‘features’ or ‘rows’.

  • chunk_definition_stats (ChunkDefinitionStats) – The chunk stats information.

  • rows_chunk_definition (RowsChunkDefinition) – The rows chunk information.

  • features_chunk_definition (FeaturesChunkDefinition) – The features chunk information.

classmethod from_data(data)

Properly convert composition classes.

Return type:

ChunkDefinition

classmethod create(dataset_definition_id, name=None, partition_method=ChunkingPartitionMethod.RANDOM, chunking_strategy_type=ChunkingStrategy.ROWS, order_by_columns=None, is_descending_order=False, target_column=None, target_class=None, user_group_column=None, datetime_partition_column=None, otv_validation_start_date=None, otv_validation_end_date=None, otv_training_end_date=None)

Create a chunk definition.

Parameters:
  • dataset_definition_id (str) – The ID of the dataset definition.

  • name (str) – The optional custom name of the chunk definition.

  • partition_method (str) – The partition method used to create chunks, either ‘random’, ‘stratified’, or ‘date’.

  • chunking_strategy_type (str) – The chunking strategy type, either ‘features’ or ‘rows’.

  • order_by_columns (List[str]) – List of the sorting column names.

  • is_descending_order (bool) – The sorting order. Defaults to False, ordering from smallest to largest.

  • target_column (str) – The target column.

  • target_class (str) – For binary target, one of the possible values. For zero inflated, will be ‘0’.

  • user_group_column (str) – The user group column.

  • datetime_partition_column (str) – The datetime partition column name used in OTV projects.

  • otv_validation_start_date (datetime.datetime) – The start date for the validation set.

  • otv_validation_end_date (datetime.datetime) – The end date for the validation set.

  • otv_training_end_date (datetime.datetime) – The end date for the training set.

Returns:

chunk_definition – An instance of a created chunk definition.

Return type:

ChunkDefinition

classmethod get(dataset_definition_id, chunk_definition_id)

Retrieve a specific chunk definition metadata.

Parameters:
  • dataset_definition_id (str) – The ID of the dataset definition.

  • chunk_definition_id (str) – The ID of the chunk definition.

Returns:

chunk_definition – The queried instance.

Return type:

ChunkDefinition

classmethod delete(dataset_definition_id, chunk_definition_id)

Delete a specific chunk definition

Parameters:
  • dataset_definition_id (str) – The ID of the dataset definition.

  • chunk_definition_id (str) – The ID of the chunk definition.

Return type:

None

classmethod list(dataset_definition_id)

List all chunk definitions

Parameters:

dataset_definition_id (str) – The ID of the dataset definition.

Return type:

A list of ChunkDefinition

classmethod analyze(dataset_definition_id, chunk_definition_id, max_wait=600)

Analyze a specific chunk definition

Parameters:
  • dataset_definition_id (str) – The ID of the dataset definition.

  • chunk_definition_id (str) – The ID of the chunk definition

  • max_wait (Optional[int]) – Time in seconds after which analyze is considered unsuccessful

Return type:

None

classmethod update(chunk_definition_id, dataset_definition_id, name=None, order_by_columns=None, is_descending_order=None, target_column=None, target_class=None, user_group_column=None, datetime_partition_column=None, otv_validation_start_date=None, otv_validation_end_date=None, otv_training_end_date=None, force_update=False)

Update a chunk definition.

Parameters:
  • chunk_definition_id (str) – The ID of the chunk definition.

  • dataset_definition_id (str) – The ID of the dataset definition.

  • name (str) – The optional custom name of the chunk definition.

  • order_by_columns (List[str]) – List of the sorting column names.

  • is_descending_order (bool) – The sorting order. Defaults to False, ordering from smallest to largest.

  • target_column (str) – The target column.

  • target_class (str) – For binary target, one of the possible values. For zero inflated, will be ‘0’.

  • user_group_column (str) – The user group column.

  • datetime_partition_column (str) – The datetime partition column name used in OTV projects.

  • otv_validation_start_date (datetime.datetime) – The start date for the validation set.

  • otv_validation_end_date (datetime.datetime) – The end date for the validation set.

  • otv_training_end_date (datetime.datetime) – The end date for the training set.

  • force_update (bool) – If True, the update will be forced in some cases. For example, update after analysis is done.

Returns:

chunk_definition – An update instance of a created chunk definition.

Return type:

ChunkDefinition