:::{only} include_experimental_docs

Experimental API

These features all require special permissions to be activated on your DataRobot account, and will not work otherwise. If you want to test a feature, please ask your DataRobot CFDS or account manager about enrolling in our preview program.

Classes in this list should be considered “experimental”, not fully released, and likely to change in future releases. Do not use them for production systems or other mission-critical uses.

class datarobot._experimental.models.model.Model(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, is_n_clusters_dynamically_determined=None, blueprint_id=None, metrics=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, n_clusters=None, has_empty_clusters=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)
get_feature_effect(source)

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information the available sources.

Parameters:
sourcestring

The source Feature Effects are retrieved for.

Returns:
feature_effectsFeatureEffects

The feature effects data.

Raises:
ClientError (404)

If the feature effects have not been computed or source is not valid value.

get_incremental_learning_metadata()

Retrieve incremental learning metadata for this model.

Added in version v3.4.0.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Returns:
metadataIncrementalLearningMetadata

a IncrementalLearningMetadata representing incremental learning metadata

start_incremental_learning(early_stopping_rounds=None)

Start incremental learning for this model. :rtype: None

Added in version v3.4.0.

This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.

Parameters:
early_stopping_rounds: Optional[int]

The number of chunks in which no improvement is observed that triggers the early stopping mechanism.

Returns:
None
Raises:
ClientError

if the server responded with 4xx status

train_first_incremental_from_sample()

Submit a job to the queue to perform the first incremental learning iteration training on an existing sample model. This functionality requires the SAMPLE_DATA_TO_START_PROJECT feature flag to be enabled.

Returns:
jobModelJob

The created job that is retraining the model

class datarobot._experimental.models.model.DatetimeModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, training_info=None, holdout_score=None, holdout_status=None, data_selection_method=None, backtests=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, effective_feature_derivation_window_start=None, effective_feature_derivation_window_end=None, forecast_window_start=None, forecast_window_end=None, windows_basis_unit=None, model_number=None, parent_model_id=None, supports_composable_ml=None, n_clusters=None, is_n_clusters_dynamically_determined=None, has_empty_clusters=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None, **kwargs)
get_feature_effect(source, backtest_index)

Retrieve Feature Effects for the model.

Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.

The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.

Requires that Feature Effects has already been computed with request_feature_effect.

See get_feature_effect_metadata for retrieving information of source, backtest_index.

Parameters:
source: string

The source Feature Effects are retrieved for. One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the available sources for feature effect.

backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.

The backtest index to retrieve Feature Effects for.

Returns:
feature_effects: FeatureEffects

The feature effects data.

Raises:
ClientError (404)

If the feature effects have not been computed or source is not valid value.

class datarobot._experimental.models.data_store.DataStore(data_store_id=None, data_store_type=None, canonical_name=None, creator=None, updated=None, params=None, role=None, driver_class_type=None)

A data store. Represents database

Attributes:
idstr

The ID of the data store.

data_store_typestr

The type of data store.

canonical_namestr

The user-friendly name of the data store.

creatorstr

The ID of the user who created the data store.

updateddatetime.datetime

The time of the last update.

paramsDataStoreParameters

A list specifying data store parameters.

rolestr

Your access role for this data store.

driver_class_typestr

Your access role for this data store.

class datarobot._experimental.models.retraining.RetrainingPolicy(id, name, description=None)

Retraining Policy.

Attributes:
policy_idstr

ID of the retraining policy

namestr

Name of the retraining policy

descriptionstr

Description of the retraining policy

classmethod list(deployment_id)

Lists all retraining policies associated with a deployment

Parameters:
deployment_idstr

Id of the deployment

Returns:
policieslist

List of retraining policies associated with a deployment

Return type:

List[RetrainingPolicy]

Examples

from datarobot import Deployment
from datarobot._experimental.models.retraining import RetrainingPolicy
deployment = Deployment.get(deployment_id='620ed0e37b6ce03244f19631')
RetrainingPolicy.list(deployment.id)
>>> [RetrainingPolicy('620ed248bb0a1f5889eb6aa7'), RetrainingPolicy('624f68be8828ed81bf487d8d')]
classmethod get(deployment_id, retraining_policy_id)

Retrieves a retraining policy associated with a deployment

Parameters:
deployment_idstr

Id of the deployment

retraining_policy_idstr

Id of the policy

Returns:
retraining_policyRetraining Policy

Retraining policy

Return type:

RetrainingPolicy

Examples

from datarobot._experimental.models.retraining import RetrainingPolicy
policy = RetrainingPolicy.get(
    deployment_id='620ed0e37b6ce03244f19631',
    retraining_policy_id='624f68be8828ed81bf487d8d'
)
policy.id
>>>'624f68be8828ed81bf487d8d'
policy.name
>>>'PolicyA'
classmethod delete(deployment_id, retraining_policy_id)

Deletes a retraining policy associated with a deployment

Parameters:
deployment_idstr

Id of the deployment

retraining_policy_idstr

Id of the policy

Return type:

None

Examples

from datarobot._experimental.models.retraining import RetrainingPolicy
RetrainingPolicy.delete(
    deployment_id='620ed0e37b6ce03244f19631',
    retraining_policy_id='624f68be8828ed81bf487d8d'
)
class datarobot._experimental.models.retraining.RetrainingPolicyRun(id, status, start_time, finish_time, challenger_id=None, error_message=None, model_package_id=None, project_id=None)

Retraining policy run.

Attributes:
policy_run_idstr

ID of the retraining policy run

statusstr

Status of the retraining policy run

challenger_idstr

ID of the challenger model retrieved after running the policy

error_message: str

The error message if an error occurs during the policy run

model_package_id: str

ID of the model package (version) retrieved after the policy is run

project_id: str

ID of the project the deployment is associated with

start_time: datetime

Timestamp of when the policy run starts

finish_time: datetime

Timestamp of when the policy run finishes

classmethod list(deployment_id, retraining_policy_id)

Lists all the retraining policy runs of a retraining policy that is associated with a deployment.

Parameters:
deployment_idstr

ID of the deployment

retraining_policy_idstr

ID of the policy

Returns:
policy runslist

List of retraining policy runs

Return type:

List[RetrainingPolicyRun]

Examples

from datarobot._experimental.models.retraining import RetrainingPolicyRun
RetrainingPolicyRun.list(
    deployment_id='620ed0e37b6ce03244f19631',
    retraining_policy_id='62f4448f0dfd5699feae3e6e'
)
>>> [RetrainingPolicyRun('620ed248bb0a1f5889eb6aa7'), RetrainingPolicyRun('624f68be8828ed81bf487d8d')]
class datarobot._experimental.models.data_matching.DataMatching(project_id)

Retrieves the closest data points for the input data.

This functionality is more than the simple lookup. In order to retrieve the closest data points data matching functionality will leverage DataRobot preprocessing pipeline first and then search for the closest data points. The returned values will be the closest data points at the point of entry to the model.

There are three sets of methods supported:
  1. Methods to build the index (for project, model, featurelist). The index needs to be built first in order to search for the closest data points. Once the index is built it will be reused.

  2. Methods to search for the closest data points (for project, model, featurelist). These methods will initialize the query, await its completion and then save the result as csv file with in the specified location.

  3. Additional methods to manually list history of queries and retrieve results for them.

get_query_url(url, number_of_data=None)

Returns formatted data matching query url

Return type:

str

get_closest_data(query_file_path, number_of_data=None, max_wait=600, build_index_if_missing=True)

Retrieves closest data points to the data point in input file. If the index is missing by default the method will try to build it.

Parameters:
query_file_path: str

Path to file with the data point to search closest data points

number_of_data: int or None

Number of results to search for. If no value specified, the default is 10.

max_wait: int

Number of seconds to wait for the result. Default is 600.

build_index_if_missing: Optional[bool]

Should the index be created if it is missing. If False is specified and the index is missing, an exception is thrown. Default True.

Returns:
df: pd.DataFrame

Dataframe with query result

Return type:

DataFrame

get_closest_data_for_model(model_id, query_file_path, number_of_data=None, max_wait=600, build_index_if_missing=True)

Retrieves closest data points to the data point in input file. If the index is missing by default the method will try to build it.

Parameters:
model_id: str

Id of the model to search for the closest data points

query_file_path: str

Path to file with the data point to search closest data points

number_of_data: int or None

Number of results to search for. If no value specified, the default is 10.

max_wait: int

Number of seconds to wait for the result. Default is 600.

build_index_if_missing: Optional[bool]

Should the index be created if it is missing. If False is specified and the index is missing, an exception is thrown. Default True.

Returns:
df: pd.DataFrame

Dataframe with query result

Return type:

DataFrame

get_closest_data_for_featurelist(featurelist_id, query_file_path, number_of_data=None, max_wait=600, build_index_if_missing=True)

Retrieves closest data points to the data point in input file. If the index is missing by default the method will try to build it.

Parameters:
featurelist_id: str

Id of the featurelist to search for the closest data points

query_file_path: str

Path to file with the data point to search closest data points

number_of_data: int or None

Number of results to search for. If no value specified, the default is 10.

max_wait: int

Number of seconds to wait for the result. Default is 600.

build_index_if_missing: bool

Should the index be created if it is missing. If False is specified and the index is missing, the exception is thrown. Default True.

Returns:
df: pd.DataFrame

Dataframe with query result

Return type:

DataFrame

build_index(max_wait=600)

Builds data matching index and waits for its completion.

Parameters:
max_wait: int or None

Seconds to wait for the completion of build index operation. Default is 600. When the 0 or None value is passed then the method will exit without awaiting for the build index operation to complete.

Return type:

None

build_index_for_featurelist(featurelist_id, max_wait=600)

Builds data matching index for featurelist and waits for its completion.

Parameters:
featurelist_id: str

Id of the featurelist to build the index for

max_wait: int or None

Seconds to wait for the completion of build index operation. Default is 600. When the 0 or None value is passed then the method will exit without awaiting for the build index operation to complete.

Return type:

None

build_index_for_model(model_id, max_wait=600)

Builds data matching index for feature list and waits for its completion.

Parameters:
model_id: str

Id of the model to build index for

max_wait: int or None

Seconds to wait for the completion of build index operation. Default is 600. When the 0 or None value is passed then the method will exit without awaiting for the build index operation to complete.

Return type:

None

list()

Lists all data matching queries for the project. Results are sorted in descending order starting from the latest to the oldest.

Returns:
List[DataMatchingQuery]
Return type:

List[DataMatchingQuery]

class datarobot._experimental.models.data_matching.DataMatchingQuery(data_matching_id, project_id, **kwargs)

Data Matching Query object.

Represents single query for the closest data points. Once related query job is completed, its result can be retrieved and saved as csv file in specified location.

classmethod list(project_id)

Retrieves the list of queries.

Parameters:
project_id: str

Project ID to retrieve data matching queries for

Returns:
List[DataMatchingQuery]
Return type:

List[DataMatchingQuery]

save_result(file_path)

Downloads the query result and saves it in file_path location.

Parameters:
file_path: str

Path location where to save the query result

Return type:

None

get_result()

Returns the query result as dataframe.

Parameters:
df: pd.DataFrame

Dataframe with query result

Return type:

DataFrame

class datarobot._experimental.models.model_lineage.ModelLineage(featurelist, project, model, dataset=None)

Contains information about the lineage of a model.

Attributes:
datasetDatasetInfo

Information about the dataset this model was created with.

featurelistFeaturelistInfo

Information about the featurelist used to train this model.

projectProjectInfo

Information about the project this model was created in.

modelModelInfo

Information about the model itself.

classmethod get(model_id, use_case_id=None)

Retrieve lineage information about a trained model. If you pass the optional use_case_id parameter, this class will contain additional information.

Parameters:
model_idstr

Model Id.

use_case_idOptional[str]

Use Case Id.

Returns:
ModelLineage
Return type:

ModelLineage

class datarobot._experimental.models.notebooks.Notebook(id, name, type, permissions, tags, created, last_viewed, settings, has_schedule, has_enabled_schedule, updated=None, org_id=None, tenant_id=None, description=None, session=None, use_case_id=None, use_case_name=None)

Metadata for a DataRobot Notebook accessible to the user.

Attributes:
idstr

The ID of the Notebook.

namestr

The name of the Notebook.

typeNotebookType

The type of the Notebook. Can be “plain” or “codespace”.

permissionsList[NotebookPermission]

The permissions the user has for the Notebook.

tagsList[str]

Any tags that have been added to the Notebook. Default is an empty list.

createdNotebookActivity

Information on when the Notebook was created and who created it.

updatedNotebookActivity

Information on when the Notebook was updated and who updated it.

last_viewedNotebookActivity

Information on when the Notebook was last viewed and who viewed it.

settingsNotebookSettings

Information on global settings applied to the Notebook.

org_idOptional[str]

The organization ID associated with the Notebook.

tenant_idOptional[str]

The tenant ID associated with the Notebook.

descriptionOptional[str]

The description of the Notebook. Optional.

sessionOptional[NotebookSession]

Metadata on the current status of the Notebook and its kernel. Optional.

use_case_idOptional[str]

The ID of the Use Case the Notebook is associated with. Optional.

use_case_nameOptional[str]

The name of the Use Case the Notebook is associated with. Optional.

has_schedulebool

Whether or not the notebook has a schedule.

has_enabled_schedulebool

Whether or not the notebook has a currently enabled schedule.

get_uri()
Returns:
urlstr

Permanent static hyperlink to this Notebook in its Use Case or standalone.

Return type:

str

classmethod get(notebook_id)

Retrieve a single notebook.

Parameters:
notebook_idstr

The ID of the notebook you want to retrieve.

Returns:
notebookNotebook

The requested notebook.

Return type:

Notebook

Examples

from datarobot._experimental.models.notebooks import Notebook

notebook = Notebook.get(notebook_id='6556b00dcc4ea0bb7ea48121')
download_revision(revision_id, file_path=None, filelike=None)

Downloads the notebook as a JSON (.ipynb) file for the specified revision.

Parameters:
file_path: string, optional

The destination to write the file to.

filelike: file, optional

A file-like object to write to. The object must be able to write bytes. The user is responsible for closing the object.

Returns:
None
Return type:

None

Examples

from datarobot._experimental.models.notebooks import Notebook

notebook = Notebook.get(notebook_id='6556b00dcc4ea0bb7ea48121')
manual_run = notebook.run()
revision_id = manual_run.wait_for_completion()
notebook.download_revision(revision_id=revision_id, file_path="./results.ipynb")
delete()

Delete a single notebook

Return type:

None

Examples

from datarobot._experimental.models.notebooks import Notebook

notebook = Notebook.get(notebook_id='6556b00dcc4ea0bb7ea48121')
notebook.delete()
classmethod list(created_before=None, created_after=None, order_by=None, tags=None, owners=None, query=None, use_cases=None)

List all Notebooks available to the user.

Parameters:
created_beforeOptional[str]

List Notebooks created before a certain date. Optional.

created_afterOptional[str]

List Notebooks created after a certain date. Optional.

order_byOptional[str]

Property to sort returned Notebooks. Optional. Supported properties are “name”, “created”, “updated”, “tags”, and “lastViewed”. Prefix the attribute name with a dash to sort in descending order, e.g. order_by=’-created’. By default, the order_by parameter is None.

tagsOptional[List[str]]

A list of tags that returned Notebooks should be associated with. Optional.

ownersOptional[List[str]]

A list of user IDs used to filter returned Notebooks. The respective users share ownership of the Notebooks. Optional.

queryOptional[str]

A specific regex query to use when filtering Notebooks. Optional.

use_casesOptional[UseCase or List[UseCase] or str or List[str]]

Filters returned Notebooks by a specific Use Case or Cases. Accepts either the entity or the ID. Optional. If set to [None], the method filters the notebook’s datasets by those not linked to a UseCase.

Returns:
notebooksList[Notebook]

A list of Notebooks available to the user.

Return type:

List[Notebook]

Examples

from datarobot._experimental.models.notebooks import Notebook

notebooks = Notebook.list()
run(title=None, notebook_path=None, parameters=None)

Create a manual scheduled job that runs the notebook.

Parameters:
titleOptional[str]

The title of the background job. Optional.

notebook_pathOptional[str]

The path of the notebook to execute within the Codespace. Required if notebook is in a Codespace.

parametersOptional[List[Dict[str, str]]]

A list of dictionaries of key value pairs representing environment variables predefined in the notebook. Optional.

Returns:
notebook_scheduled_jobNotebookScheduledJob

The created notebook schedule job.

Raises:
InvalidUsageError

If attempting to create a manual scheduled run for a Codespace without a notebook path.

Return type:

NotebookScheduledJob

Notes

The notebook must be part of a Use Case. If the notebook is in a Codespace then notebook_path is required.

Examples

from datarobot._experimental.models.notebooks import Notebook

notebook = Notebook.get(notebook_id='6556b00dcc4ea0bb7ea48121')
manual_run = notebook.run()

# Alternatively, with title and parameters:
# manual_run = notebook.run(title="My Run", parameters=[{"FOO": "bar"}])

revision_id = manual_run.wait_for_completion()
class datarobot._experimental.models.notebooks.NotebookScheduledRun(id, use_case_id, status, payload, title=None, start_time=None, end_time=None, revision=None, duration=None, run_type=None, notebook_type=None)

DataRobot Notebook Scheduled Run. A historical run of a notebook schedule.

Attributes:
idstr

The ID of the Notebook Scheduled Job.

use_case_idstr

The Use Case ID of the Notebook Scheduled Job.

statusstr

The status of the run.

payloadScheduledJobPayload

The payload used for the background job.

titleOptional[str]

The title of the job. Optional.

start_timeOptional[str]

The start time of the job. Optional.

end_timeOptional[str]

The end time of the job. Optional.

revisionScheduledRunRevisionMetadata

Notebook revision data - ID and name.

durationOptional[int]

The job duration in seconds. May be None for example while the job is running. Optional.

run_typeOptional[RunType]

The type of the run - either manual (triggered via UI or API) or scheduled. Optional.

notebook_type: Optional[NotebookType]

The type of the notebook - either plain or codespace. Optional.

class datarobot._experimental.models.notebooks.NotebookScheduledJob(id, enabled, next_run_time, run_type, notebook_type, job_payload, title=None, schedule=None, schedule_localized=None, last_successful_run=None, last_failed_run=None, last_run_time=None)

DataRobot Notebook Schedule. A scheduled job that runs a notebook.

Attributes:
idstr

The ID of the Notebook Scheduled Job.

enabledbool

Whether job is enabled or not.

next_run_timestr

The next time the job is scheduled to run (assuming it is enabled).

run_typeRunType

The type of the run - either manual (triggered via UI or API) or scheduled.

notebook_type: NotebookType

The type of the notebook - either plain or codespace.

job_payloadScheduledJobPayload

The payload used for the background job.

titleOptional[str]

The title of the job. Optional.

scheduleOptional[str]

Cron-like string to define how frequently job should be run. Optional.

schedule_localizedOptional[str]

A human-readable localized version of the schedule. Example in English is ‘At 42 minutes past the hour’. Optional.

last_successful_runOptional[str]

The last time the job was run successfully. Optional.

last_failed_runOptional[str]

The last time the job failed. Optional.

last_run_timeOptional[str]

The last time the job was run (failed or successful). Optional.

classmethod get(use_case_id, scheduled_job_id)

Retrieve a single notebook schedule.

Parameters:
scheduled_job_idstr

The ID of the notebook schedule you want to retrieve.

Returns:
notebook_scheduleNotebookScheduledJob

The requested notebook schedule.

Return type:

NotebookScheduledJob

Examples

from datarobot._experimental.models.notebooks import NotebookScheduledJob

notebook_schedule = NotebookScheduledJob.get(
    use_case_id="654ad653c6c1e889e8eab12e",
    scheduled_job_id="65734fe637157200e28bf688",
)
get_job_history()

Retrieve list of historical runs for the notebook schedule.

Returns:
notebook_scheduled_runsList[NotebookScheduledRun]

The list of historical runs for the notebook schedule.

Return type:

List[NotebookScheduledRun]

Examples

from datarobot._experimental.models.notebooks import NotebookScheduledJob

notebook_schedule = NotebookScheduledJob.get(
    use_case_id="654ad653c6c1e889e8eab12e",
    scheduled_job_id="65734fe637157200e28bf688",
)
notebook_scheduled_runs = notebook_schedule.get_job_history()
wait_for_completion(max_wait=600)

Wait for the completion of a scheduled notebook and return the revision ID corresponding to the run’s output.

Parameters:
max_waitint

The number of seconds to wait before giving up.

Returns:
revision_idstr

Returns either revision ID or message describing current state.

Return type:

str

Examples

from datarobot._experimental.models.notebooks import Notebook

notebook = Notebook.get(notebook_id='6556b00dcc4ea0bb7ea48121')
manual_run = notebook.run()
revision_id = manual_run.wait_for_completion()
class datarobot._experimental.models.notebooks.ScheduledRunRevisionMetadata(id=None, name=None)

DataRobot Notebook Revision Metadata specifically for a scheduled run.

Both id and name can be null if for example the job is still running or has failed.

Attributes:
idOptional[str]

The ID of the Notebook Revision. Optional.

nameOptional[str]

The name of the Notebook Revision. Optional.

class datarobot._experimental.models.notebooks.ScheduledJobParam(name, value)

DataRobot Schedule Job Parameter.

Attributes:
namestr

The name of the parameter.

valuestr

The value of the parameter.

class datarobot._experimental.models.notebooks.ScheduledJobPayload(uid, org_id, use_case_id, notebook_id, notebook_name, run_type, notebook_type, parameters, notebook_path=None)

DataRobot Schedule Job Payload.

Attributes:
uidstr

The ID of the user who created the Notebook Schedule.

org_idstr

The ID of the user’s organization who created the Notebook Schedule.

use_case_idstr

The ID of the Use Case that the Notebook belongs to.

notebook_idstr

The ID of Notebook being run on a schedule.

notebook_namestr

The name of Notebook being run on a schedule.

run_typeRunType

The type of the run - either manual (triggered via UI or API) or scheduled.

notebook_type: NotebookType

The type of the notebook - either plain or codespace.

parametersList[ScheduledJobParam]

The parameters being used in the Notebook Schedule. Can be an empty list.

notebook_pathOptional[str]

The path of the notebook to execute within the Codespace. Optional. Required if notebook is in a Codespace.

class datarobot._experimental.models.incremental_learning.IncrementalLearningMetadata(project_id, model_id, user_id, featurelist_id, status, items, early_stopping_rounds, sample_pct=None, training_row_count=None, score=None, metric=None, total_number_of_chunks=None, model_number=None)

Incremental learning metadata for an incremental model.

Added in version v3.4.0.

Notes

Incremental item is a dict containing the following:

  • chunk_index: int

    The incremental learning order in which chunks are trained.

  • status: str

    The status of training current chunk. One of datarobot._experimental.models.enums.IncrementalLearningItemStatus

  • model_id: str

    The ID of the model associated with the current item (chunk).

  • parent_model_id: str

    The ID of the model based on which the current item (chunk) is trained.

  • data_stage_id: str

    The ID of the data stage.

  • sample_pct: float

    The cumulative percentage of the base dataset size used for training the model.

  • training_row_count: int

    The number of rows used to train a model.

  • score: float

    The validation score of the current model

Attributes:
project_id: string

The project ID.

model_id: string

The model ID.

user_id: string

The ID of the user who started incremental learning.

featurelist_id: string

The ID of the featurelist the model is using.

status: string

The status of incremental training. One of datarobot._experimental.models.enums.IncrementalLearningStatus.

items: List[IncrementalLearningItemDoc]

An array of incremental learning items associated with the sequential order of chunks. See incremental item info in Notes for more details.

sample_pct: float

The sample size in percents (1 to 100) to use in training.

training_row_count: int

The number of rows used to train a model.

score: float

The validation score of the model.

metric: str

The name of the scoring metric.

early_stopping_rounds: int

The number of chunks in which no improvement is observed that triggers the early stopping mechanism.

total_number_of_chunks: int

The total number of chunks.

model_number: int

The number of the model in the project.

class datarobot._experimental.models.chunking_service.DatasetChunkDefinition(id, user_id, name, project_starter_chunk_size, user_chunk_size, datasource_definition_id=None, chunking_type=None)

Dataset chunking definition that holds information about how to chunk the dataset.

Attributes:
idstr

The ID of the dataset chunk definition.

user_idstr

The ID of the user who created the definition.

namestr

The name of the dataset chunk definition.

project_starter_chunk_sizeint

The size, in bytes, of the project starter chunk.

user_chunk_sizeint

Chunk size in bytes.

datasource_definition_idstr

The data source definition ID associated with the dataset chunk definition.

chunking_typeChunkingType

The type of chunk creation from the dataset. All possible chunking types can be found under ChunkingType enum, that can be imported from datarobot._experimental.models.enums Types include:

  • INCREMENTAL_LEARNING for non-time aware projects that use a chunk index to create chunks.

  • INCREMENTAL_LEARNING_OTV for OTV projects that use a chunk index to create chunks.

  • SLICED_OFFSET_LIMIT for any dataset in which user provides offset and limit to create chunks.

SLICED_OFFSET_LIMIT has no indexed based chunks aka method create_by_index() not supported.

classmethod get(dataset_chunk_definition_id)

Retrieve a specific dataset chunk definition metadata.

Parameters:
dataset_chunk_definition_id: str

The ID of the dataset chunk definition.

Returns:
dataset_chunk_definitionDatasetChunkDefinition

The queried instance.

Return type:

DatasetChunkDefinition

classmethod list(limit=50, offset=0)

Retrieves a list of dataset chunk definitions

Parameters:
limit: int

The maximum number of objects to return. Default is 50.

offset: int

The starting offset of the results. Default is 0.

Returns
——-
dataset_chunk_definitionsList[DatasetChunkDefinition]

The list of dataset chunk definitions.

Return type:

List[DatasetChunkDefinition]

classmethod create(name, project_starter_chunk_size, user_chunk_size, datasource_info, chunking_type=ChunkingType.INCREMENTAL_LEARNING)

Create a dataset chunk definition. Required for both index-based and custom chunks.

In order to create a dataset chunk definition, you must first: :rtype: DatasetChunkDefinition

  • Create a data connection to the target data source via dr.DataStore.create()

  • Create credentials that must be attached to the data connection via dr.Credential.create()

If you have an existing data connections and credentials:

  • Retrieve the data store ID by the canonical name via:

    • [ds for ds in dr.DataStore.list() if ds.canonical_name == <name>][0].id

  • Retrieve the credential ID by the name via:

    • [cr for cr in dr.Credential.list() if ds.name == <name>][0].id

You must create the required ‘datasource_info’ object with the datasource information that corresponds to your use case:

  • DatasourceAICatalogInfo for AI catalog datasets.

  • DatasourceDataWarehouseInfo for Snowflake, BigQuery, or other data warehouse.

Parameters:
namestr

The name of the dataset chunk definition.

project_starter_chunk_sizeint

The size, in bytes, of the first chunk. Used to start a DataRobot project.

user_chunk_sizeint

The size, in bytes, of the user-defined incremental chunk.

datasource_infoUnion[DatasourceDataWarehouseInfo, DatasourceAICatalogInfo]

The object that contains the information of the data source.

chunking_typeChunkingType

The type of chunk creation from the dataset. All possible chunking types can be found under ChunkingType enum, that can be imported from datarobot._experimental.models.enums Types include:

  • INCREMENTAL_LEARNING for non-time aware projects that use a chunk index to create chunks.

  • INCREMENTAL_LEARNING_OTV for OTV projects that use a chunk index to create chunks.

  • SLICED_OFFSET_LIMIT for any dataset in which user provides offset and limit to create chunks.

SLICED_OFFSET_LIMIT has no indexed based chunks aka method create_by_index() not supported. The default type is ChunkingType.INCREMENTAL_LEARNING

Returns:
dataset_chunk_definition: DatasetChunkDefinition

An instance of a created dataset chunk definition.

classmethod get_datasource_definition(dataset_chunk_definition_id)

Retrieves the data source definition associated with a dataset chunk definition.

Parameters:
dataset_chunk_definition_id: str

id of the dataset chunk definition

Returns:
datasource_definition: DatasourceDefinition

an instance of created datasource definition

Return type:

DatasourceDefinition

classmethod get_chunk(dataset_chunk_definition_id, chunk_id)

Retrieves a specific data chunk associated with a dataset chunk definition

Parameters:
dataset_chunk_definition_id: str

id of the dataset chunk definition

chunk_id:

id of the chunk

Returns:
chunk: Chunk

an instance of created chunk

Return type:

Chunk

classmethod list_chunks(dataset_chunk_definition_id)

Retrieves all data chunks associated with a dataset chunk definition

Parameters:
dataset_chunk_definition_id: str

id of the dataset chunk definition

Returns:
chunks: List[Chunk]

a list of chunks

Return type:

List[Chunk]

analyze_dataset(max_wait_time=600)

Analyzes the data source to retrieve and compute metadata about the dataset.

Depending on the size of the data set, adding order_by_columns to the dataset chunking definition will increase the execution time to create the data chunk. Set the max_wait_time for the appropriate wait time.

Parameters:
max_wait_time

maximum time to wait for completion

Returns
——-
datasource_definition: DatasourceDefinition

an instance of created datasource definition

Return type:

DatasourceDefinition

create_chunk(limit, offset=0, storage_type=ChunkStorageType.DATASTAGE, max_wait_time=600)

Creates a data chunk using the limit and offset. By default, the data chunk is stored in data stages.

Depending on the size of the data set, adding order_by_columns to the dataset chunking definition will increase the execution time to retrieve or create the data chunk. Set the max_wait_time for the appropriate wait time.

Parameters:
limit: int

The maximum number of rows.

offset: int

The offset into the dataset (where reading begins).

storage_type: ChunkStorageType

The storage location of the chunk.

max_wait_time

maximum time to wait for completion

Returns
——-
chunk: Chunk

An instance of a created or updated chunk.

Return type:

Chunk

create_chunk_by_index(index, storage_type=ChunkStorageType.DATASTAGE, max_wait_time=600)

Creates a data chunk using the limit and offset. By default, the data chunk is stored in data stages.

Depending on the size of the data set, adding order_by_columns to the dataset chunking definition will increase the execution time to retrieve or create the data chunk. Set the max_wait_time for the appropriate wait time.

Parameters:
index: int

The index of the chunk.

storage_type: ChunkStorageType

The storage location of the chunk.

max_wait_time

maximum time to wait for completion

Returns:
chunk: Chunk

An instance of a created or updated chunk.

Return type:

Chunk

classmethod patch_validation_dates(dataset_chunk_definition_id, validation_start_date, validation_end_date)

Updates the data source definition validation dates associated with a dataset chunk definition. In order to set the validation dates appropriately, both start and end dates should be specified. This method can only be used for INCREMENTAL_LEARNING_OTV dataset chunk definitions and its associated datasource definition.

Parameters:
dataset_chunk_definition_id: str

The ID of the dataset chunk definition.

validation_start_date: datetime

The start date of validation scoring data. Internally converted to format ‘%Y-%m-%d %H:%M:%S’, the timezone defaults to UTC.

validation_end_date: datetime

The end date of validation scoring data. Internally converted to format ‘%Y-%m-%d %H:%M:%S’, the timezone defaults to UTC.

Returns:
datasource_definition: DatasourceDefinition

An instance of created datasource definition.

Return type:

DatasourceDefinition

class datarobot._experimental.models.chunking_service.DatasourceAICatalogInfo(catalog_version_id, catalog_id=None, table=None, name=None, order_by_columns=None, is_descending_order=False, select_columns=None, datetime_partition_column=None, validation_pct=None, validation_limit_pct=None, validation_start_date=None, validation_end_date=None, training_end_date=None, latest_timestamp=None, earliest_timestamp=None)

AI Catalog data source information used at creation time with dataset chunk definition.

Attributes:
name: str

The optional custom name of the data source.

tablestr

The data source table name or AI Catalog dataset name.

storage_originstr

The origin data source, always AI Catalog type.

catalog_idstr

The ID of the AI Catalog dataset.

catalog_version_idstr

The ID of the AI Catalog dataset version.

order_by_columnsList[str]

A list of columns used to sort the dataset.

is_descending_orderbool

Orders the direction of the data. Defaults to False, ordering from smallest to largest.

select_columns: List[str]

A list of columns to select from the dataset.

datetime_partition_columnstr

The datetime partition column name used in OTV projects.

validation_pctfloat

The percentage threshold between 0.1 and 1.0 for the first chunk validation.

validation_limit_pctfloat

The percentage threshold between 0.1 and 1.0 for the validation kept.

validation_start_datedatetime

The start date for validation.

validation_end_datedatetime

The end date for validation.

training_end_datedatetime

The end date for training.

latest_timestampdatetime

The latest timestamp.

earliest_timestampdatetime

The earliest timestamp.

class datarobot._experimental.models.chunking_service.DatasourceDataWarehouseInfo(data_store_id, credentials_id, table, storage_origin, order_by_columns, is_descending_order=False, schema=None, catalog=None, name=None, data_source_id=None, select_columns=None, datetime_partition_column=None, validation_pct=None, validation_limit_pct=None, validation_start_date=None, validation_end_date=None, training_end_date=None, latest_timestamp=None, earliest_timestamp=None)

Data source information used at creation time with dataset chunk definition. Data warehouses supported: Snowflake, BigQuery, Databricks

Attributes:
name: str

The optional custom name of the data source.

tablestr

The data source table name or AI Catalog dataset name.

storage_originstr

The origin data source or data warehouse (e.g., Snowflake, BigQuery).

data_store_idstr

The ID of the data store.

credentials_idstr

The ID of the credentials.

schemastr

The offset into the dataset to create the chunk.

catalogstr

The database or catalog name.

data_source_idstr

The ID of the data request used to generate sampling and metadata.

order_by_columnsList[str]

A list of columns used to sort the dataset.

is_descending_orderbool

Orders the direction of the data. Defaults to False, ordering from smallest to largest.

select_columns: List[str]

A list of columns to select from the dataset.

datetime_partition_columnstr

The datetime partition column name used in OTV projects.

validation_pctfloat

The percentage threshold between 0.1 and 1.0 for the first chunk validation.

validation_limit_pctfloat

The percentage threshold between 0.1 and 1.0 for the validation kept.

validation_start_datedatetime

The start date for validation.

validation_end_datedatetime

The end date for validation.

training_end_datedatetime

The end date for training.

latest_timestampdatetime

The latest timestamp.

earliest_timestampdatetime

The earliest timestamp.

class datarobot._experimental.models.chunking_service.DatasourceDefinition(id, storage_origin, order_by_columns=None, is_descending_order=False, table=None, data_store_id=None, credentials_id=None, schema=None, catalog=None, name=None, data_source_id=None, total_rows=None, source_size=None, estimated_size_per_row=None, columns=None, catalog_id=None, catalog_version_id=None, select_columns=None, datetime_partition_column=None, validation_pct=None, validation_limit_pct=None, validation_start_date=None, validation_end_date=None, training_end_date=None, latest_timestamp=None, earliest_timestamp=None)

Data source definition that holds information of data source for API responses. Do not use this to ‘create’ DatasourceDefinition objects directly, use DatasourceAICatalogInfo and DatasourceDataWarehouseInfo.

Attributes:
idstr

The ID of the data source definition.

data_store_idstr

The ID of the data store.

credentials_idstr

The ID of the credentials.

tablestr

The data source table name.

schemastr

The offset into the dataset to create the chunk.

catalogstr

The database or catalog name.

storage_originstr

The origin data source or data warehouse (e.g., Snowflake, BigQuery).

data_source_idstr

The ID of the data request used to generate sampling and metadata.

total_rowsstr

The total number of rows in the dataset.

source_sizestr

The size of the dataset.

estimated_size_per_rowstr

The estimated size per row.

columnsstr

The list of column names in the dataset.

order_by_columnsList[str]

A list of columns used to sort the dataset.

is_descending_orderbool

Orders the direction of the data. Defaults to False, ordering from smallest to largest.

select_columnsList[str]

A list of columns to select from the dataset.

datetime_partition_columnstr

The datetime partition column name used in OTV projects.

validation_pctfloat

The percentage threshold between 0.1 and 1.0 for the first chunk validation.

validation_limit_pctfloat

The percentage threshold between 0.1 and 1.0 for the validation kept.

validation_start_datedatetime

The start date for validation.

validation_end_datedatetime

The end date for validation.

training_end_datedatetime

The end date for training.

latest_timestampdatetime

The latest timestamp.

earliest_timestampdatetime

The earliest timestamp.

class datarobot._experimental.models.chunking_service.Chunk(id, chunk_definition_id, limit, offset, chunk_index=None, data_source_id=None, chunk_storage=None)

Data chunk object that holds metadata about a chunk.

Attributes:
idstr

The ID of the chunk entity.

chunk_definition_idstr

The ID of the dataset chunk definition the chunk belongs to.

limitint

The number of rows in the chunk.

offsetint

The offset in the dataset to create the chunk.

chunk_indexstr

The index of the chunk if chunks are divided uniformly. Otherwise, it is None.

data_source_idstr

The ID of the data request used to create the chunk.

chunk_storageChunkStorage

A list of storage locations where the chunk is stored.

get_chunk_storage_id(storage_type)

Get storage location ID for the chunk.

Parameters:
storage_type: ChunkStorageType

The storage type where the chunk is stored.

Returns:
storage_reference_id: str

An ID that references the storage location for the chunk.

Return type:

Optional[str]

get_chunk_storage_version_id(storage_type)

Get storage version ID for the chunk.

Parameters:
storage_type: ChunkStorageType

The storage type where the chunk is stored.

Returns:
storage_reference_id: str

A catalog version ID associated with the AI Catalog dataset ID.

Return type:

Optional[str]

class datarobot._experimental.models.chunking_service.ChunkStorageType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Supported chunk storage.

class datarobot._experimental.models.chunking_service.OriginStorageType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Supported data sources.

class datarobot._experimental.models.chunking_service.ChunkStorage(storage_reference_id, chunk_storage_type, version_id=None)

The chunk storage location for the data chunks.

Attributes:
storage_reference_idstr

The ID of the storage entity.

chunk_storage_typestr

The type of the chunk storage.

version_idstr

The catalog version ID. This will only be used if the storage type is “AI Catalog”.

class datarobot._experimental.models.chunking_service_v2.DatasetDefinition(id, creator_user_id, dataset_props, dynamic_dataset_props=None, dataset_info=None, name=None)

Dataset definition that holds information of dataset for API responses.

Attributes:
idstr

The ID of the data source definition.

creator_user_idstr

The ID of the user.

dataset_propsDatasetProps

The properties of the dataset in catalog.

dynamic_dataset_propsDynamicDatasetProps

The properties of the dynamic dataset.

dataset_infoDatasetInfo

The information about the dataset.

name: str

The optional custom name of the dataset defintion.

classmethod from_data(data)

Properly convert composition classes.

Return type:

DatasetDefinition

classmethod create(dataset_id, dataset_version_id, name=None)

Create a dataset definition.

In order to create a dataset definition, you must first have an existing dataset in the Data Registry. A dataset can be uploaded using dr.Dataset.create_from_file if you have a file for example

If you have an existing dataset in the Data Registry: :rtype: DatasetDefinition

  • Retrieve the dataset ID by the canonical name via:

    • [cr for cr in dr.Dataset.list() if cr.name == <name>][0].id

  • Retrieve the dataset version ID by the name via:

    • [cr for cr in dr.Dataset.list() if cr.name == <name>][0].version_id

Parameters:
dataset_idstr

The ID of the AI Catalog dataset.

dataset_version_idstr

The optional ID of the AI Catalog dataset version.

name: str

The optional custom name of the dataset definition.

Returns:
dataset_definition: DatasetDefinition

An instance of a created dataset definition.

classmethod get(dataset_definition_id)

Retrieve a specific dataset definition metadata.

Parameters:
dataset_definition_id: str

The ID of the dataset definition.

Returns:
dataset_definition_idDatasetDefinition

The queried instance.

Return type:

DatasetDefinition

classmethod delete(dataset_definition_id)

Delete a specific dataset definition

Parameters:
dataset_definition_id: str

The ID of the dataset definition.

Return type:

None

classmethod list()

List all dataset defintions

Returns:
A list of DatasetDefinition
Return type:

List[DatasetDefinition]

classmethod analyze(dataset_definition_id, max_wait=600)

Analyze a specific dataset definition

Parameters:
dataset_definition_id: str

The ID of the dataset definition.

max_wait: int, optional

Time in seconds after which analyze is considered unsuccessful

Return type:

None

class datarobot._experimental.models.chunking_service_v2.ChunkDefinitionStats(expected_chunk_size, number_of_rows_per_chunk, total_number_of_chunks)

The chunk stats information.

Attributes:
expected_chunk_size: int

The expected chunk size, this field is auto generated.

number_of_rows_per_chunk: int

The number of rows per chunk, this field is auto generated.

total_number_of_chunks: int

The total number of chunks, this field is auto generated.

class datarobot._experimental.models.chunking_service_v2.FeaturesChunkDefinition

The features chunk information.

class datarobot._experimental.models.chunking_service_v2.RowsChunkDefinition(order_by_columns, is_descending_order=False, target_column=None, target_class=None, user_group_column=None, datetime_partition_column=None, otv_validation_start_date=None, otv_validation_end_date=None, otv_training_end_date=None, otv_latest_timestamp=None, otv_earliest_timestamp=None, otv_validation_downsampling_pct=None)

The rows chunk information.

Attributes:
order_by_columnsList[str]

List of the sorting column names.

is_descending_orderbool

The sorting order. Defaults to False, ordering from smallest to largest.

target_columnstr

The target column.

target_classstr

For binary target, one of the possible values. For zero inflated, will be ‘0’.

user_group_columnstr

The user group column.

datetime_partition_columnstr

The datetime partition column name used in OTV projects.

otv_validation_start_datedatetime

The start date for the validation set.

otv_validation_end_datedatetime

The end date for the validation set.

otv_training_end_datedatetime

The end date for the training set.

otv_latest_timestampdatetime

The latest timestamp, this field is auto generated.

otv_earliest_timestampdatetime

The earliest timestamp, this field is auto generated.

otv_validation_downsampling_pctfloat

The percentage of the validation set to downsample, this field is auto generated.

class datarobot._experimental.models.chunking_service_v2.DynamicDatasetProps(credentials_id)

The dataset props for a dynamic dataset.

Attributes:
credentials_idstr

The ID of the credentials.

class datarobot._experimental.models.chunking_service_v2.DatasetInfo(total_rows, source_size, estimated_size_per_row, columns, dialect, data_store_id=None, data_source_id=None)

The dataset information.

Attributes:
total_rowsstr

The total number of rows in the dataset.

source_sizestr

The size of the dataset.

estimated_size_per_rowstr

The estimated size per row.

columnsstr

The list of column names in the dataset.

dialectstr

The sql dialect associated with the dataset (e.g., Snowflake, BigQuery, Spark).

data_store_idstr

The ID of the data store.

data_source_idstr

The ID of the data request used to generate sampling and metadata.

class datarobot._experimental.models.chunking_service_v2.DatasetProps(dataset_id, dataset_version_id)

The dataset props for a catalog dataset.

Attributes:
dataset_idstr

The ID of the AI Catalog dataset.

dataset_version_idstr

The ID of the AI Catalog dataset version.

class datarobot._experimental.models.chunking_service_v2.ChunkDefinition(id, dataset_definition_id, name, is_readonly, partition_method, chunking_strategy_type, chunk_definition_stats=None, rows_chunk_definition=None, features_chunk_definition=None)

The chunk information.

Attributes:
idstr

The ID of the chunk entity.

dataset_definition_idstr

The ID of the dataset definition.

namestr

The name of the chunk entity.

is_readonlybool

The read only flag.

partition_methodstr

The partition method used to create chunks, either ‘random’, ‘stratified’, or ‘date’.

chunking_strategy_typestr

The chunking strategy type, either ‘features’ or ‘rows’.

chunk_definition_statsChunkDefinitionStats

The chunk stats information.

rows_chunk_definitionRowsChunkDefinition

The rows chunk information.

features_chunk_definitionFeaturesChunkDefinition

The features chunk information.

classmethod from_data(data)

Properly convert composition classes.

Return type:

ChunkDefinition

classmethod create(dataset_definition_id, name=None, partition_method=ChunkingPartitionMethod.RANDOM, chunking_strategy_type=ChunkingStrategy.ROWS, order_by_columns=None, is_descending_order=False, target_column=None, target_class=None, user_group_column=None, datetime_partition_column=None, otv_validation_start_date=None, otv_validation_end_date=None, otv_training_end_date=None)

Create a chunk definition.

Parameters:
dataset_definition_idstr

The ID of the dataset definition.

name: str

The optional custom name of the chunk definition.

partition_method: str

The partition method used to create chunks, either ‘random’, ‘stratified’, or ‘date’.

chunking_strategy_type: str

The chunking strategy type, either ‘features’ or ‘rows’.

order_by_columns: List[str]

List of the sorting column names.

is_descending_order: bool

The sorting order. Defaults to False, ordering from smallest to largest.

target_column: str

The target column.

target_class: str

For binary target, one of the possible values. For zero inflated, will be ‘0’.

user_group_column: str

The user group column.

datetime_partition_column: str

The datetime partition column name used in OTV projects.

otv_validation_start_date: datetime

The start date for the validation set.

otv_validation_end_date: datetime

The end date for the validation set.

otv_training_end_date: datetime

The end date for the training set.

Returns:
chunk_definition: ChunkDefinition

An instance of a created chunk definition.

Return type:

ChunkDefinition

classmethod get(dataset_definition_id, chunk_definition_id)

Retrieve a specific chunk definition metadata.

Parameters:
dataset_definition_id: str

The ID of the dataset definition.

chunk_definition_id: str

The ID of the chunk definition.

Returns:
chunk_definitionChunkDefinition

The queried instance.

Return type:

ChunkDefinition

classmethod delete(dataset_definition_id, chunk_definition_id)

Delete a specific chunk definition

Parameters:
dataset_definition_id: str

The ID of the dataset definition.

chunk_definition_id: str

The ID of the chunk definition.

Return type:

None

classmethod list(dataset_definition_id)

List all chunk defintions

Parameters:
dataset_definition_id: str

The ID of the dataset definition.

Returns:
A list of ChunkDefinition
Return type:

List[ChunkDefinition]

classmethod analyze(dataset_definition_id, chunk_definition_id, max_wait=600)

Analyze a specific chunk definition

Parameters:
dataset_definition_id: str

The ID of the dataset definition.

chunk_definition_id: str

The ID of the chunk definition

max_wait: int, optional

Time in seconds after which analyze is considered unsuccessful

Return type:

None

classmethod update(chunk_definition_id, dataset_definition_id, name=None, order_by_columns=None, is_descending_order=None, target_column=None, target_class=None, user_group_column=None, datetime_partition_column=None, otv_validation_start_date=None, otv_validation_end_date=None, otv_training_end_date=None, force_update=False)

Update a chunk definition.

Parameters:
chunk_definition_id: str

The ID of the chunk definition.

dataset_definition_idstr

The ID of the dataset definition.

name: str

The optional custom name of the chunk definition.

order_by_columns: List[str]

List of the sorting column names.

is_descending_order: bool

The sorting order. Defaults to False, ordering from smallest to largest.

target_column: str

The target column.

target_class: str

For binary target, one of the possible values. For zero inflated, will be ‘0’.

user_group_column: str

The user group column.

datetime_partition_column: str

The datetime partition column name used in OTV projects.

otv_validation_start_date: datetime

The start date for the validation set.

otv_validation_end_date: datetime

The end date for the validation set.

otv_training_end_date: datetime

The end date for the training set.

force_update: bool

If True, the update will be forced in some cases. For example, update after analysis is done.

Returns:
chunk_definition: ChunkDefinition

An update instance of a created chunk definition.

Return type:

ChunkDefinition

:::