:::{only} include_experimental_docs
Experimental API
These features all require special permissions to be activated on your DataRobot account, and will not work otherwise. If you want to test a feature, please ask your DataRobot CFDS or account manager about enrolling in our preview program.
Classes in this list should be considered “experimental”, not fully released, and likely to change in future releases. Do not use them for production systems or other mission-critical uses.
- class datarobot._experimental.models.model.Model(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, model_type=None, model_category=None, is_frozen=None, is_n_clusters_dynamically_determined=None, blueprint_id=None, metrics=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, n_clusters=None, has_empty_clusters=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, model_number=None, parent_model_id=None, supports_composable_ml=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, data_selection_method=None, time_window_sample_pct=None, sampling_method=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None)
- get_feature_effect(source)
Retrieve Feature Effects for the model.
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information the available sources.- Parameters:
- sourcestring
The source Feature Effects are retrieved for.
- Returns:
- feature_effectsFeatureEffects
The feature effects data.
- Raises:
- ClientError (404)
If the feature effects have not been computed or source is not valid value.
- get_incremental_learning_metadata()
Retrieve incremental learning metadata for this model.
Added in version v3.4.0.
This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.
- Returns:
- metadataIncrementalLearningMetadata
a
IncrementalLearningMetadata
representing incremental learning metadata
- start_incremental_learning(early_stopping_rounds=None)
Start incremental learning for this model. :rtype:
None
Added in version v3.4.0.
This functionality requires the INCREMENTAL_LEARNING feature flag to be enabled.
- Parameters:
- early_stopping_rounds: Optional[int]
The number of chunks in which no improvement is observed that triggers the early stopping mechanism.
- Returns:
- None
- Raises:
- ClientError
if the server responded with 4xx status
- train_first_incremental_from_sample()
Submit a job to the queue to perform the first incremental learning iteration training on an existing sample model. This functionality requires the SAMPLE_DATA_TO_START_PROJECT feature flag to be enabled.
- Returns:
- jobModelJob
The created job that is retraining the model
- class datarobot._experimental.models.model.DatetimeModel(id=None, processes=None, featurelist_name=None, featurelist_id=None, project_id=None, sample_pct=None, training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None, model_type=None, model_category=None, is_frozen=None, blueprint_id=None, metrics=None, training_info=None, holdout_score=None, holdout_status=None, data_selection_method=None, backtests=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, supports_monotonic_constraints=None, is_starred=None, prediction_threshold=None, prediction_threshold_read_only=None, effective_feature_derivation_window_start=None, effective_feature_derivation_window_end=None, forecast_window_start=None, forecast_window_end=None, windows_basis_unit=None, model_number=None, parent_model_id=None, supports_composable_ml=None, n_clusters=None, is_n_clusters_dynamically_determined=None, has_empty_clusters=None, model_family_full_name=None, is_trained_into_validation=None, is_trained_into_holdout=None, **kwargs)
- get_feature_effect(source, backtest_index)
Retrieve Feature Effects for the model.
Feature Effects provides partial dependence and predicted vs actual values for top-500 features ordered by feature impact score.
The partial dependence shows marginal effect of a feature on the target variable after accounting for the average effects of all other predictive features. It indicates how, holding all other variables except the feature of interest as they were, the value of this feature affects your prediction.
Requires that Feature Effects has already been computed with
request_feature_effect
.See
get_feature_effect_metadata
for retrieving information of source, backtest_index.- Parameters:
- source: string
The source Feature Effects are retrieved for. One value of [FeatureEffectMetadataDatetime.sources]. To retrieve the available sources for feature effect.
- backtest_index: string, FeatureEffectMetadataDatetime.backtest_index.
The backtest index to retrieve Feature Effects for.
- Returns:
- feature_effects: FeatureEffects
The feature effects data.
- Raises:
- ClientError (404)
If the feature effects have not been computed or source is not valid value.
- class datarobot._experimental.models.data_store.DataStore(data_store_id=None, data_store_type=None, canonical_name=None, creator=None, updated=None, params=None, role=None, driver_class_type=None)
A data store. Represents database
- Attributes:
- idstr
The ID of the data store.
- data_store_typestr
The type of data store.
- canonical_namestr
The user-friendly name of the data store.
- creatorstr
The ID of the user who created the data store.
- updateddatetime.datetime
The time of the last update.
- paramsDataStoreParameters
A list specifying data store parameters.
- rolestr
Your access role for this data store.
- driver_class_typestr
Your access role for this data store.
- class datarobot._experimental.models.retraining.RetrainingPolicy(id, name, description=None)
Retraining Policy.
- Attributes:
- policy_idstr
ID of the retraining policy
- namestr
Name of the retraining policy
- descriptionstr
Description of the retraining policy
- classmethod list(deployment_id)
Lists all retraining policies associated with a deployment
- Parameters:
- deployment_idstr
Id of the deployment
- Returns:
- policieslist
List of retraining policies associated with a deployment
- Return type:
List
[RetrainingPolicy
]
Examples
from datarobot import Deployment from datarobot._experimental.models.retraining import RetrainingPolicy deployment = Deployment.get(deployment_id='620ed0e37b6ce03244f19631') RetrainingPolicy.list(deployment.id) >>> [RetrainingPolicy('620ed248bb0a1f5889eb6aa7'), RetrainingPolicy('624f68be8828ed81bf487d8d')]
- classmethod get(deployment_id, retraining_policy_id)
Retrieves a retraining policy associated with a deployment
- Parameters:
- deployment_idstr
Id of the deployment
- retraining_policy_idstr
Id of the policy
- Returns:
- retraining_policyRetraining Policy
Retraining policy
- Return type:
Examples
from datarobot._experimental.models.retraining import RetrainingPolicy policy = RetrainingPolicy.get( deployment_id='620ed0e37b6ce03244f19631', retraining_policy_id='624f68be8828ed81bf487d8d' ) policy.id >>>'624f68be8828ed81bf487d8d' policy.name >>>'PolicyA'
- classmethod delete(deployment_id, retraining_policy_id)
Deletes a retraining policy associated with a deployment
- Parameters:
- deployment_idstr
Id of the deployment
- retraining_policy_idstr
Id of the policy
- Return type:
None
Examples
from datarobot._experimental.models.retraining import RetrainingPolicy RetrainingPolicy.delete( deployment_id='620ed0e37b6ce03244f19631', retraining_policy_id='624f68be8828ed81bf487d8d' )
- class datarobot._experimental.models.retraining.RetrainingPolicyRun(id, status, start_time, finish_time, challenger_id=None, error_message=None, model_package_id=None, project_id=None)
Retraining policy run.
- Attributes:
- policy_run_idstr
ID of the retraining policy run
- statusstr
Status of the retraining policy run
- challenger_idstr
ID of the challenger model retrieved after running the policy
- error_message: str
The error message if an error occurs during the policy run
- model_package_id: str
ID of the model package (version) retrieved after the policy is run
- project_id: str
ID of the project the deployment is associated with
- start_time: datetime
Timestamp of when the policy run starts
- finish_time: datetime
Timestamp of when the policy run finishes
- classmethod list(deployment_id, retraining_policy_id)
Lists all the retraining policy runs of a retraining policy that is associated with a deployment.
- Parameters:
- deployment_idstr
ID of the deployment
- retraining_policy_idstr
ID of the policy
- Returns:
- policy runslist
List of retraining policy runs
- Return type:
List
[RetrainingPolicyRun
]
Examples
from datarobot._experimental.models.retraining import RetrainingPolicyRun RetrainingPolicyRun.list( deployment_id='620ed0e37b6ce03244f19631', retraining_policy_id='62f4448f0dfd5699feae3e6e' ) >>> [RetrainingPolicyRun('620ed248bb0a1f5889eb6aa7'), RetrainingPolicyRun('624f68be8828ed81bf487d8d')]
- class datarobot._experimental.models.data_matching.DataMatching(project_id)
Retrieves the closest data points for the input data.
This functionality is more than the simple lookup. In order to retrieve the closest data points data matching functionality will leverage DataRobot preprocessing pipeline first and then search for the closest data points. The returned values will be the closest data points at the point of entry to the model.
- There are three sets of methods supported:
Methods to build the index (for project, model, featurelist). The index needs to be built first in order to search for the closest data points. Once the index is built it will be reused.
Methods to search for the closest data points (for project, model, featurelist). These methods will initialize the query, await its completion and then save the result as csv file with in the specified location.
Additional methods to manually list history of queries and retrieve results for them.
- get_query_url(url, number_of_data=None)
Returns formatted data matching query url
- Return type:
str
- get_closest_data(query_file_path, number_of_data=None, max_wait=600, build_index_if_missing=True)
Retrieves closest data points to the data point in input file. If the index is missing by default the method will try to build it.
- Parameters:
- query_file_path: str
Path to file with the data point to search closest data points
- number_of_data: int or None
Number of results to search for. If no value specified, the default is 10.
- max_wait: int
Number of seconds to wait for the result. Default is 600.
- build_index_if_missing: Optional[bool]
Should the index be created if it is missing. If False is specified and the index is missing, an exception is thrown. Default True.
- Returns:
- df: pd.DataFrame
Dataframe with query result
- Return type:
DataFrame
- get_closest_data_for_model(model_id, query_file_path, number_of_data=None, max_wait=600, build_index_if_missing=True)
Retrieves closest data points to the data point in input file. If the index is missing by default the method will try to build it.
- Parameters:
- model_id: str
Id of the model to search for the closest data points
- query_file_path: str
Path to file with the data point to search closest data points
- number_of_data: int or None
Number of results to search for. If no value specified, the default is 10.
- max_wait: int
Number of seconds to wait for the result. Default is 600.
- build_index_if_missing: Optional[bool]
Should the index be created if it is missing. If False is specified and the index is missing, an exception is thrown. Default True.
- Returns:
- df: pd.DataFrame
Dataframe with query result
- Return type:
DataFrame
- get_closest_data_for_featurelist(featurelist_id, query_file_path, number_of_data=None, max_wait=600, build_index_if_missing=True)
Retrieves closest data points to the data point in input file. If the index is missing by default the method will try to build it.
- Parameters:
- featurelist_id: str
Id of the featurelist to search for the closest data points
- query_file_path: str
Path to file with the data point to search closest data points
- number_of_data: int or None
Number of results to search for. If no value specified, the default is 10.
- max_wait: int
Number of seconds to wait for the result. Default is 600.
- build_index_if_missing: bool
Should the index be created if it is missing. If False is specified and the index is missing, the exception is thrown. Default True.
- Returns:
- df: pd.DataFrame
Dataframe with query result
- Return type:
DataFrame
- build_index(max_wait=600)
Builds data matching index and waits for its completion.
- Parameters:
- max_wait: int or None
Seconds to wait for the completion of build index operation. Default is 600. When the 0 or None value is passed then the method will exit without awaiting for the build index operation to complete.
- Return type:
None
- build_index_for_featurelist(featurelist_id, max_wait=600)
Builds data matching index for featurelist and waits for its completion.
- Parameters:
- featurelist_id: str
Id of the featurelist to build the index for
- max_wait: int or None
Seconds to wait for the completion of build index operation. Default is 600. When the 0 or None value is passed then the method will exit without awaiting for the build index operation to complete.
- Return type:
None
- build_index_for_model(model_id, max_wait=600)
Builds data matching index for feature list and waits for its completion.
- Parameters:
- model_id: str
Id of the model to build index for
- max_wait: int or None
Seconds to wait for the completion of build index operation. Default is 600. When the 0 or None value is passed then the method will exit without awaiting for the build index operation to complete.
- Return type:
None
- list()
Lists all data matching queries for the project. Results are sorted in descending order starting from the latest to the oldest.
- Returns:
- List[DataMatchingQuery]
- Return type:
List
[DataMatchingQuery
]
- class datarobot._experimental.models.data_matching.DataMatchingQuery(data_matching_id, project_id, **kwargs)
Data Matching Query object.
Represents single query for the closest data points. Once related query job is completed, its result can be retrieved and saved as csv file in specified location.
- classmethod list(project_id)
Retrieves the list of queries.
- Parameters:
- project_id: str
Project ID to retrieve data matching queries for
- Returns:
- List[DataMatchingQuery]
- Return type:
List
[DataMatchingQuery
]
- save_result(file_path)
Downloads the query result and saves it in file_path location.
- Parameters:
- file_path: str
Path location where to save the query result
- Return type:
None
- get_result()
Returns the query result as dataframe.
- Parameters:
- df: pd.DataFrame
Dataframe with query result
- Return type:
DataFrame
- class datarobot._experimental.models.model_lineage.ModelLineage(featurelist, project, model, dataset=None)
Contains information about the lineage of a model.
- Attributes:
- datasetDatasetInfo
Information about the dataset this model was created with.
- featurelistFeaturelistInfo
Information about the featurelist used to train this model.
- projectProjectInfo
Information about the project this model was created in.
- modelModelInfo
Information about the model itself.
- classmethod get(model_id, use_case_id=None)
Retrieve lineage information about a trained model. If you pass the optional
use_case_id
parameter, this class will contain additional information.- Parameters:
- model_idstr
Model Id.
- use_case_idOptional[str]
Use Case Id.
- Returns:
- ModelLineage
- Return type:
- class datarobot._experimental.models.notebooks.Notebook(id, name, type, permissions, tags, created, last_viewed, settings, has_schedule, has_enabled_schedule, updated=None, org_id=None, tenant_id=None, description=None, session=None, use_case_id=None, use_case_name=None)
Metadata for a DataRobot Notebook accessible to the user.
- Attributes:
- idstr
The ID of the Notebook.
- namestr
The name of the Notebook.
- typeNotebookType
The type of the Notebook. Can be “plain” or “codespace”.
- permissionsList[NotebookPermission]
The permissions the user has for the Notebook.
- tagsList[str]
Any tags that have been added to the Notebook. Default is an empty list.
- createdNotebookActivity
Information on when the Notebook was created and who created it.
- updatedNotebookActivity
Information on when the Notebook was updated and who updated it.
- last_viewedNotebookActivity
Information on when the Notebook was last viewed and who viewed it.
- settingsNotebookSettings
Information on global settings applied to the Notebook.
- org_idOptional[str]
The organization ID associated with the Notebook.
- tenant_idOptional[str]
The tenant ID associated with the Notebook.
- descriptionOptional[str]
The description of the Notebook. Optional.
- sessionOptional[NotebookSession]
Metadata on the current status of the Notebook and its kernel. Optional.
- use_case_idOptional[str]
The ID of the Use Case the Notebook is associated with. Optional.
- use_case_nameOptional[str]
The name of the Use Case the Notebook is associated with. Optional.
- has_schedulebool
Whether or not the notebook has a schedule.
- has_enabled_schedulebool
Whether or not the notebook has a currently enabled schedule.
- get_uri()
- Returns:
- urlstr
Permanent static hyperlink to this Notebook in its Use Case or standalone.
- Return type:
str
- classmethod get(notebook_id)
Retrieve a single notebook.
- Parameters:
- notebook_idstr
The ID of the notebook you want to retrieve.
- Returns:
- notebookNotebook
The requested notebook.
- Return type:
Examples
from datarobot._experimental.models.notebooks import Notebook notebook = Notebook.get(notebook_id='6556b00dcc4ea0bb7ea48121')
- download_revision(revision_id, file_path=None, filelike=None)
Downloads the notebook as a JSON (.ipynb) file for the specified revision.
- Parameters:
- file_path: string, optional
The destination to write the file to.
- filelike: file, optional
A file-like object to write to. The object must be able to write bytes. The user is responsible for closing the object.
- Returns:
- None
- Return type:
None
Examples
from datarobot._experimental.models.notebooks import Notebook notebook = Notebook.get(notebook_id='6556b00dcc4ea0bb7ea48121') manual_run = notebook.run() revision_id = manual_run.wait_for_completion() notebook.download_revision(revision_id=revision_id, file_path="./results.ipynb")
- delete()
Delete a single notebook
- Return type:
None
Examples
from datarobot._experimental.models.notebooks import Notebook notebook = Notebook.get(notebook_id='6556b00dcc4ea0bb7ea48121') notebook.delete()
- classmethod list(created_before=None, created_after=None, order_by=None, tags=None, owners=None, query=None, use_cases=None)
List all Notebooks available to the user.
- Parameters:
- created_beforeOptional[str]
List Notebooks created before a certain date. Optional.
- created_afterOptional[str]
List Notebooks created after a certain date. Optional.
- order_byOptional[str]
Property to sort returned Notebooks. Optional. Supported properties are “name”, “created”, “updated”, “tags”, and “lastViewed”. Prefix the attribute name with a dash to sort in descending order, e.g. order_by=’-created’. By default, the order_by parameter is None.
- tagsOptional[List[str]]
A list of tags that returned Notebooks should be associated with. Optional.
- ownersOptional[List[str]]
A list of user IDs used to filter returned Notebooks. The respective users share ownership of the Notebooks. Optional.
- queryOptional[str]
A specific regex query to use when filtering Notebooks. Optional.
- use_casesOptional[UseCase or List[UseCase] or str or List[str]]
Filters returned Notebooks by a specific Use Case or Cases. Accepts either the entity or the ID. Optional. If set to [None], the method filters the notebook’s datasets by those not linked to a UseCase.
- Returns:
- notebooksList[Notebook]
A list of Notebooks available to the user.
- Return type:
List
[Notebook
]
Examples
from datarobot._experimental.models.notebooks import Notebook notebooks = Notebook.list()
- run(title=None, notebook_path=None, parameters=None)
Create a manual scheduled job that runs the notebook.
- Parameters:
- titleOptional[str]
The title of the background job. Optional.
- notebook_pathOptional[str]
The path of the notebook to execute within the Codespace. Required if notebook is in a Codespace.
- parametersOptional[List[Dict[str, str]]]
A list of dictionaries of key value pairs representing environment variables predefined in the notebook. Optional.
- Returns:
- notebook_scheduled_jobNotebookScheduledJob
The created notebook schedule job.
- Raises:
- InvalidUsageError
If attempting to create a manual scheduled run for a Codespace without a notebook path.
- Return type:
Notes
The notebook must be part of a Use Case. If the notebook is in a Codespace then notebook_path is required.
Examples
from datarobot._experimental.models.notebooks import Notebook notebook = Notebook.get(notebook_id='6556b00dcc4ea0bb7ea48121') manual_run = notebook.run() # Alternatively, with title and parameters: # manual_run = notebook.run(title="My Run", parameters=[{"FOO": "bar"}]) revision_id = manual_run.wait_for_completion()
- class datarobot._experimental.models.notebooks.NotebookScheduledRun(id, use_case_id, status, payload, title=None, start_time=None, end_time=None, revision=None, duration=None, run_type=None, notebook_type=None)
DataRobot Notebook Scheduled Run. A historical run of a notebook schedule.
- Attributes:
- idstr
The ID of the Notebook Scheduled Job.
- use_case_idstr
The Use Case ID of the Notebook Scheduled Job.
- statusstr
The status of the run.
- payloadScheduledJobPayload
The payload used for the background job.
- titleOptional[str]
The title of the job. Optional.
- start_timeOptional[str]
The start time of the job. Optional.
- end_timeOptional[str]
The end time of the job. Optional.
- revisionScheduledRunRevisionMetadata
Notebook revision data - ID and name.
- durationOptional[int]
The job duration in seconds. May be None for example while the job is running. Optional.
- run_typeOptional[RunType]
The type of the run - either manual (triggered via UI or API) or scheduled. Optional.
- notebook_type: Optional[NotebookType]
The type of the notebook - either plain or codespace. Optional.
- class datarobot._experimental.models.notebooks.NotebookScheduledJob(id, enabled, next_run_time, run_type, notebook_type, job_payload, title=None, schedule=None, schedule_localized=None, last_successful_run=None, last_failed_run=None, last_run_time=None)
DataRobot Notebook Schedule. A scheduled job that runs a notebook.
- Attributes:
- idstr
The ID of the Notebook Scheduled Job.
- enabledbool
Whether job is enabled or not.
- next_run_timestr
The next time the job is scheduled to run (assuming it is enabled).
- run_typeRunType
The type of the run - either manual (triggered via UI or API) or scheduled.
- notebook_type: NotebookType
The type of the notebook - either plain or codespace.
- job_payloadScheduledJobPayload
The payload used for the background job.
- titleOptional[str]
The title of the job. Optional.
- scheduleOptional[str]
Cron-like string to define how frequently job should be run. Optional.
- schedule_localizedOptional[str]
A human-readable localized version of the schedule. Example in English is ‘At 42 minutes past the hour’. Optional.
- last_successful_runOptional[str]
The last time the job was run successfully. Optional.
- last_failed_runOptional[str]
The last time the job failed. Optional.
- last_run_timeOptional[str]
The last time the job was run (failed or successful). Optional.
- classmethod get(use_case_id, scheduled_job_id)
Retrieve a single notebook schedule.
- Parameters:
- scheduled_job_idstr
The ID of the notebook schedule you want to retrieve.
- Returns:
- notebook_scheduleNotebookScheduledJob
The requested notebook schedule.
- Return type:
Examples
from datarobot._experimental.models.notebooks import NotebookScheduledJob notebook_schedule = NotebookScheduledJob.get( use_case_id="654ad653c6c1e889e8eab12e", scheduled_job_id="65734fe637157200e28bf688", )
- get_job_history()
Retrieve list of historical runs for the notebook schedule.
- Returns:
- notebook_scheduled_runsList[NotebookScheduledRun]
The list of historical runs for the notebook schedule.
- Return type:
List
[NotebookScheduledRun
]
Examples
from datarobot._experimental.models.notebooks import NotebookScheduledJob notebook_schedule = NotebookScheduledJob.get( use_case_id="654ad653c6c1e889e8eab12e", scheduled_job_id="65734fe637157200e28bf688", ) notebook_scheduled_runs = notebook_schedule.get_job_history()
- wait_for_completion(max_wait=600)
Wait for the completion of a scheduled notebook and return the revision ID corresponding to the run’s output.
- Parameters:
- max_waitint
The number of seconds to wait before giving up.
- Returns:
- revision_idstr
Returns either revision ID or message describing current state.
- Return type:
str
Examples
from datarobot._experimental.models.notebooks import Notebook notebook = Notebook.get(notebook_id='6556b00dcc4ea0bb7ea48121') manual_run = notebook.run() revision_id = manual_run.wait_for_completion()
- class datarobot._experimental.models.notebooks.ScheduledRunRevisionMetadata(id=None, name=None)
DataRobot Notebook Revision Metadata specifically for a scheduled run.
Both id and name can be null if for example the job is still running or has failed.
- Attributes:
- idOptional[str]
The ID of the Notebook Revision. Optional.
- nameOptional[str]
The name of the Notebook Revision. Optional.
- class datarobot._experimental.models.notebooks.ScheduledJobParam(name, value)
DataRobot Schedule Job Parameter.
- Attributes:
- namestr
The name of the parameter.
- valuestr
The value of the parameter.
- class datarobot._experimental.models.notebooks.ScheduledJobPayload(uid, org_id, use_case_id, notebook_id, notebook_name, run_type, notebook_type, parameters, notebook_path=None)
DataRobot Schedule Job Payload.
- Attributes:
- uidstr
The ID of the user who created the Notebook Schedule.
- org_idstr
The ID of the user’s organization who created the Notebook Schedule.
- use_case_idstr
The ID of the Use Case that the Notebook belongs to.
- notebook_idstr
The ID of Notebook being run on a schedule.
- notebook_namestr
The name of Notebook being run on a schedule.
- run_typeRunType
The type of the run - either manual (triggered via UI or API) or scheduled.
- notebook_type: NotebookType
The type of the notebook - either plain or codespace.
- parametersList[ScheduledJobParam]
The parameters being used in the Notebook Schedule. Can be an empty list.
- notebook_pathOptional[str]
The path of the notebook to execute within the Codespace. Optional. Required if notebook is in a Codespace.
- class datarobot._experimental.models.incremental_learning.IncrementalLearningMetadata(project_id, model_id, user_id, featurelist_id, status, items, early_stopping_rounds, sample_pct=None, training_row_count=None, score=None, metric=None, total_number_of_chunks=None, model_number=None)
Incremental learning metadata for an incremental model.
Added in version v3.4.0.
Notes
Incremental item is a dict containing the following:
- chunk_index: int
The incremental learning order in which chunks are trained.
- status: str
The status of training current chunk. One of
datarobot._experimental.models.enums.IncrementalLearningItemStatus
- model_id: str
The ID of the model associated with the current item (chunk).
- parent_model_id: str
The ID of the model based on which the current item (chunk) is trained.
- data_stage_id: str
The ID of the data stage.
- sample_pct: float
The cumulative percentage of the base dataset size used for training the model.
- training_row_count: int
The number of rows used to train a model.
- score: float
The validation score of the current model
- Attributes:
- project_id: string
The project ID.
- model_id: string
The model ID.
- user_id: string
The ID of the user who started incremental learning.
- featurelist_id: string
The ID of the featurelist the model is using.
- status: string
The status of incremental training. One of
datarobot._experimental.models.enums.IncrementalLearningStatus
.- items: List[IncrementalLearningItemDoc]
An array of incremental learning items associated with the sequential order of chunks. See incremental item info in Notes for more details.
- sample_pct: float
The sample size in percents (1 to 100) to use in training.
- training_row_count: int
The number of rows used to train a model.
- score: float
The validation score of the model.
- metric: str
The name of the scoring metric.
- early_stopping_rounds: int
The number of chunks in which no improvement is observed that triggers the early stopping mechanism.
- total_number_of_chunks: int
The total number of chunks.
- model_number: int
The number of the model in the project.
- class datarobot._experimental.models.chunking_service.DatasetChunkDefinition(id, user_id, name, project_starter_chunk_size, user_chunk_size, datasource_definition_id=None, chunking_type=None)
Dataset chunking definition that holds information about how to chunk the dataset.
- Attributes:
- idstr
The ID of the dataset chunk definition.
- user_idstr
The ID of the user who created the definition.
- namestr
The name of the dataset chunk definition.
- project_starter_chunk_sizeint
The size, in bytes, of the project starter chunk.
- user_chunk_sizeint
Chunk size in bytes.
- datasource_definition_idstr
The data source definition ID associated with the dataset chunk definition.
- chunking_typeChunkingType
The type of chunk creation from the dataset. All possible chunking types can be found under ChunkingType enum, that can be imported from datarobot._experimental.models.enums Types include:
INCREMENTAL_LEARNING for non-time aware projects that use a chunk index to create chunks.
INCREMENTAL_LEARNING_OTV for OTV projects that use a chunk index to create chunks.
SLICED_OFFSET_LIMIT for any dataset in which user provides offset and limit to create chunks.
SLICED_OFFSET_LIMIT has no indexed based chunks aka method create_by_index() not supported.
- classmethod get(dataset_chunk_definition_id)
Retrieve a specific dataset chunk definition metadata.
- Parameters:
- dataset_chunk_definition_id: str
The ID of the dataset chunk definition.
- Returns:
- dataset_chunk_definitionDatasetChunkDefinition
The queried instance.
- Return type:
- classmethod list(limit=50, offset=0)
Retrieves a list of dataset chunk definitions
- Parameters:
- limit: int
The maximum number of objects to return. Default is 50.
- offset: int
The starting offset of the results. Default is 0.
- Returns
- ——-
- dataset_chunk_definitionsList[DatasetChunkDefinition]
The list of dataset chunk definitions.
- Return type:
List
[DatasetChunkDefinition
]
- classmethod create(name, project_starter_chunk_size, user_chunk_size, datasource_info, chunking_type=ChunkingType.INCREMENTAL_LEARNING)
Create a dataset chunk definition. Required for both index-based and custom chunks.
In order to create a dataset chunk definition, you must first: :rtype:
DatasetChunkDefinition
Create a data connection to the target data source via
dr.DataStore.create()
Create credentials that must be attached to the data connection via
dr.Credential.create()
If you have an existing data connections and credentials:
Retrieve the data store ID by the canonical name via:
[ds for ds in dr.DataStore.list() if ds.canonical_name == <name>][0].id
Retrieve the credential ID by the name via:
[cr for cr in dr.Credential.list() if ds.name == <name>][0].id
You must create the required ‘datasource_info’ object with the datasource information that corresponds to your use case:
DatasourceAICatalogInfo for AI catalog datasets.
DatasourceDataWarehouseInfo for Snowflake, BigQuery, or other data warehouse.
- Parameters:
- namestr
The name of the dataset chunk definition.
- project_starter_chunk_sizeint
The size, in bytes, of the first chunk. Used to start a DataRobot project.
- user_chunk_sizeint
The size, in bytes, of the user-defined incremental chunk.
- datasource_infoUnion[DatasourceDataWarehouseInfo, DatasourceAICatalogInfo]
The object that contains the information of the data source.
- chunking_typeChunkingType
The type of chunk creation from the dataset. All possible chunking types can be found under ChunkingType enum, that can be imported from datarobot._experimental.models.enums Types include:
INCREMENTAL_LEARNING for non-time aware projects that use a chunk index to create chunks.
INCREMENTAL_LEARNING_OTV for OTV projects that use a chunk index to create chunks.
SLICED_OFFSET_LIMIT for any dataset in which user provides offset and limit to create chunks.
SLICED_OFFSET_LIMIT has no indexed based chunks aka method create_by_index() not supported. The default type is ChunkingType.INCREMENTAL_LEARNING
- Returns:
- dataset_chunk_definition: DatasetChunkDefinition
An instance of a created dataset chunk definition.
- classmethod get_datasource_definition(dataset_chunk_definition_id)
Retrieves the data source definition associated with a dataset chunk definition.
- Parameters:
- dataset_chunk_definition_id: str
id of the dataset chunk definition
- Returns:
- datasource_definition: DatasourceDefinition
an instance of created datasource definition
- Return type:
- classmethod get_chunk(dataset_chunk_definition_id, chunk_id)
Retrieves a specific data chunk associated with a dataset chunk definition
- Parameters:
- dataset_chunk_definition_id: str
id of the dataset chunk definition
- chunk_id:
id of the chunk
- Returns:
- chunk: Chunk
an instance of created chunk
- Return type:
- classmethod list_chunks(dataset_chunk_definition_id)
Retrieves all data chunks associated with a dataset chunk definition
- Parameters:
- dataset_chunk_definition_id: str
id of the dataset chunk definition
- Returns:
- chunks: List[Chunk]
a list of chunks
- Return type:
List
[Chunk
]
- analyze_dataset(max_wait_time=600)
Analyzes the data source to retrieve and compute metadata about the dataset.
Depending on the size of the data set, adding
order_by_columns
to the dataset chunking definition will increase the execution time to create the data chunk. Set themax_wait_time
for the appropriate wait time.- Parameters:
- max_wait_time
maximum time to wait for completion
- Returns
- ——-
- datasource_definition: DatasourceDefinition
an instance of created datasource definition
- Return type:
- create_chunk(limit, offset=0, storage_type=ChunkStorageType.DATASTAGE, max_wait_time=600)
Creates a data chunk using the limit and offset. By default, the data chunk is stored in data stages.
Depending on the size of the data set, adding
order_by_columns
to the dataset chunking definition will increase the execution time to retrieve or create the data chunk. Set themax_wait_time
for the appropriate wait time.- Parameters:
- limit: int
The maximum number of rows.
- offset: int
The offset into the dataset (where reading begins).
- storage_type: ChunkStorageType
The storage location of the chunk.
- max_wait_time
maximum time to wait for completion
- Returns
- ——-
- chunk: Chunk
An instance of a created or updated chunk.
- Return type:
- create_chunk_by_index(index, storage_type=ChunkStorageType.DATASTAGE, max_wait_time=600)
Creates a data chunk using the limit and offset. By default, the data chunk is stored in data stages.
Depending on the size of the data set, adding
order_by_columns
to the dataset chunking definition will increase the execution time to retrieve or create the data chunk. Set themax_wait_time
for the appropriate wait time.- Parameters:
- index: int
The index of the chunk.
- storage_type: ChunkStorageType
The storage location of the chunk.
- max_wait_time
maximum time to wait for completion
- Returns:
- chunk: Chunk
An instance of a created or updated chunk.
- Return type:
- classmethod patch_validation_dates(dataset_chunk_definition_id, validation_start_date, validation_end_date)
Updates the data source definition validation dates associated with a dataset chunk definition. In order to set the validation dates appropriately, both start and end dates should be specified. This method can only be used for INCREMENTAL_LEARNING_OTV dataset chunk definitions and its associated datasource definition.
- Parameters:
- dataset_chunk_definition_id: str
The ID of the dataset chunk definition.
- validation_start_date: datetime
The start date of validation scoring data. Internally converted to format ‘%Y-%m-%d %H:%M:%S’, the timezone defaults to UTC.
- validation_end_date: datetime
The end date of validation scoring data. Internally converted to format ‘%Y-%m-%d %H:%M:%S’, the timezone defaults to UTC.
- Returns:
- datasource_definition: DatasourceDefinition
An instance of created datasource definition.
- Return type:
- class datarobot._experimental.models.chunking_service.DatasourceAICatalogInfo(catalog_version_id, catalog_id=None, table=None, name=None, order_by_columns=None, is_descending_order=False, select_columns=None, datetime_partition_column=None, validation_pct=None, validation_limit_pct=None, validation_start_date=None, validation_end_date=None, training_end_date=None, latest_timestamp=None, earliest_timestamp=None)
AI Catalog data source information used at creation time with dataset chunk definition.
- Attributes:
- name: str
The optional custom name of the data source.
- tablestr
The data source table name or AI Catalog dataset name.
- storage_originstr
The origin data source, always AI Catalog type.
- catalog_idstr
The ID of the AI Catalog dataset.
- catalog_version_idstr
The ID of the AI Catalog dataset version.
- order_by_columnsList[str]
A list of columns used to sort the dataset.
- is_descending_orderbool
Orders the direction of the data. Defaults to False, ordering from smallest to largest.
- select_columns: List[str]
A list of columns to select from the dataset.
- datetime_partition_columnstr
The datetime partition column name used in OTV projects.
- validation_pctfloat
The percentage threshold between 0.1 and 1.0 for the first chunk validation.
- validation_limit_pctfloat
The percentage threshold between 0.1 and 1.0 for the validation kept.
- validation_start_datedatetime
The start date for validation.
- validation_end_datedatetime
The end date for validation.
- training_end_datedatetime
The end date for training.
- latest_timestampdatetime
The latest timestamp.
- earliest_timestampdatetime
The earliest timestamp.
- class datarobot._experimental.models.chunking_service.DatasourceDataWarehouseInfo(data_store_id, credentials_id, table, storage_origin, order_by_columns, is_descending_order=False, schema=None, catalog=None, name=None, data_source_id=None, select_columns=None, datetime_partition_column=None, validation_pct=None, validation_limit_pct=None, validation_start_date=None, validation_end_date=None, training_end_date=None, latest_timestamp=None, earliest_timestamp=None)
Data source information used at creation time with dataset chunk definition. Data warehouses supported: Snowflake, BigQuery, Databricks
- Attributes:
- name: str
The optional custom name of the data source.
- tablestr
The data source table name or AI Catalog dataset name.
- storage_originstr
The origin data source or data warehouse (e.g., Snowflake, BigQuery).
- data_store_idstr
The ID of the data store.
- credentials_idstr
The ID of the credentials.
- schemastr
The offset into the dataset to create the chunk.
- catalogstr
The database or catalog name.
- data_source_idstr
The ID of the data request used to generate sampling and metadata.
- order_by_columnsList[str]
A list of columns used to sort the dataset.
- is_descending_orderbool
Orders the direction of the data. Defaults to False, ordering from smallest to largest.
- select_columns: List[str]
A list of columns to select from the dataset.
- datetime_partition_columnstr
The datetime partition column name used in OTV projects.
- validation_pctfloat
The percentage threshold between 0.1 and 1.0 for the first chunk validation.
- validation_limit_pctfloat
The percentage threshold between 0.1 and 1.0 for the validation kept.
- validation_start_datedatetime
The start date for validation.
- validation_end_datedatetime
The end date for validation.
- training_end_datedatetime
The end date for training.
- latest_timestampdatetime
The latest timestamp.
- earliest_timestampdatetime
The earliest timestamp.
- class datarobot._experimental.models.chunking_service.DatasourceDefinition(id, storage_origin, order_by_columns=None, is_descending_order=False, table=None, data_store_id=None, credentials_id=None, schema=None, catalog=None, name=None, data_source_id=None, total_rows=None, source_size=None, estimated_size_per_row=None, columns=None, catalog_id=None, catalog_version_id=None, select_columns=None, datetime_partition_column=None, validation_pct=None, validation_limit_pct=None, validation_start_date=None, validation_end_date=None, training_end_date=None, latest_timestamp=None, earliest_timestamp=None)
Data source definition that holds information of data source for API responses. Do not use this to ‘create’ DatasourceDefinition objects directly, use DatasourceAICatalogInfo and DatasourceDataWarehouseInfo.
- Attributes:
- idstr
The ID of the data source definition.
- data_store_idstr
The ID of the data store.
- credentials_idstr
The ID of the credentials.
- tablestr
The data source table name.
- schemastr
The offset into the dataset to create the chunk.
- catalogstr
The database or catalog name.
- storage_originstr
The origin data source or data warehouse (e.g., Snowflake, BigQuery).
- data_source_idstr
The ID of the data request used to generate sampling and metadata.
- total_rowsstr
The total number of rows in the dataset.
- source_sizestr
The size of the dataset.
- estimated_size_per_rowstr
The estimated size per row.
- columnsstr
The list of column names in the dataset.
- order_by_columnsList[str]
A list of columns used to sort the dataset.
- is_descending_orderbool
Orders the direction of the data. Defaults to False, ordering from smallest to largest.
- select_columnsList[str]
A list of columns to select from the dataset.
- datetime_partition_columnstr
The datetime partition column name used in OTV projects.
- validation_pctfloat
The percentage threshold between 0.1 and 1.0 for the first chunk validation.
- validation_limit_pctfloat
The percentage threshold between 0.1 and 1.0 for the validation kept.
- validation_start_datedatetime
The start date for validation.
- validation_end_datedatetime
The end date for validation.
- training_end_datedatetime
The end date for training.
- latest_timestampdatetime
The latest timestamp.
- earliest_timestampdatetime
The earliest timestamp.
- class datarobot._experimental.models.chunking_service.Chunk(id, chunk_definition_id, limit, offset, chunk_index=None, data_source_id=None, chunk_storage=None)
Data chunk object that holds metadata about a chunk.
- Attributes:
- idstr
The ID of the chunk entity.
- chunk_definition_idstr
The ID of the dataset chunk definition the chunk belongs to.
- limitint
The number of rows in the chunk.
- offsetint
The offset in the dataset to create the chunk.
- chunk_indexstr
The index of the chunk if chunks are divided uniformly. Otherwise, it is None.
- data_source_idstr
The ID of the data request used to create the chunk.
- chunk_storageChunkStorage
A list of storage locations where the chunk is stored.
- get_chunk_storage_id(storage_type)
Get storage location ID for the chunk.
- Parameters:
- storage_type: ChunkStorageType
The storage type where the chunk is stored.
- Returns:
- storage_reference_id: str
An ID that references the storage location for the chunk.
- Return type:
Optional
[str
]
- get_chunk_storage_version_id(storage_type)
Get storage version ID for the chunk.
- Parameters:
- storage_type: ChunkStorageType
The storage type where the chunk is stored.
- Returns:
- storage_reference_id: str
A catalog version ID associated with the AI Catalog dataset ID.
- Return type:
Optional
[str
]
- class datarobot._experimental.models.chunking_service.ChunkStorageType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
Supported chunk storage.
- class datarobot._experimental.models.chunking_service.OriginStorageType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
Supported data sources.
- class datarobot._experimental.models.chunking_service.ChunkStorage(storage_reference_id, chunk_storage_type, version_id=None)
The chunk storage location for the data chunks.
- Attributes:
- storage_reference_idstr
The ID of the storage entity.
- chunk_storage_typestr
The type of the chunk storage.
- version_idstr
The catalog version ID. This will only be used if the storage type is “AI Catalog”.
- class datarobot._experimental.models.chunking_service_v2.DatasetDefinition(id, creator_user_id, dataset_props, dynamic_dataset_props=None, dataset_info=None, name=None)
Dataset definition that holds information of dataset for API responses.
- Attributes:
- idstr
The ID of the data source definition.
- creator_user_idstr
The ID of the user.
- dataset_propsDatasetProps
The properties of the dataset in catalog.
- dynamic_dataset_propsDynamicDatasetProps
The properties of the dynamic dataset.
- dataset_infoDatasetInfo
The information about the dataset.
- name: str
The optional custom name of the dataset defintion.
- classmethod from_data(data)
Properly convert composition classes.
- Return type:
- classmethod create(dataset_id, dataset_version_id, name=None)
Create a dataset definition.
In order to create a dataset definition, you must first have an existing dataset in the Data Registry. A dataset can be uploaded using
dr.Dataset.create_from_file
if you have a file for exampleIf you have an existing dataset in the Data Registry: :rtype:
DatasetDefinition
Retrieve the dataset ID by the canonical name via:
[cr for cr in dr.Dataset.list() if cr.name == <name>][0].id
Retrieve the dataset version ID by the name via:
[cr for cr in dr.Dataset.list() if cr.name == <name>][0].version_id
- Parameters:
- dataset_idstr
The ID of the AI Catalog dataset.
- dataset_version_idstr
The optional ID of the AI Catalog dataset version.
- name: str
The optional custom name of the dataset definition.
- Returns:
- dataset_definition: DatasetDefinition
An instance of a created dataset definition.
- classmethod get(dataset_definition_id)
Retrieve a specific dataset definition metadata.
- Parameters:
- dataset_definition_id: str
The ID of the dataset definition.
- Returns:
- dataset_definition_idDatasetDefinition
The queried instance.
- Return type:
- classmethod delete(dataset_definition_id)
Delete a specific dataset definition
- Parameters:
- dataset_definition_id: str
The ID of the dataset definition.
- Return type:
None
- classmethod list()
List all dataset defintions
- Returns:
- A list of DatasetDefinition
- Return type:
List
[DatasetDefinition
]
- classmethod analyze(dataset_definition_id, max_wait=600)
Analyze a specific dataset definition
- Parameters:
- dataset_definition_id: str
The ID of the dataset definition.
- max_wait: int, optional
Time in seconds after which analyze is considered unsuccessful
- Return type:
None
- class datarobot._experimental.models.chunking_service_v2.ChunkDefinitionStats(expected_chunk_size, number_of_rows_per_chunk, total_number_of_chunks)
The chunk stats information.
- Attributes:
- expected_chunk_size: int
The expected chunk size, this field is auto generated.
- number_of_rows_per_chunk: int
The number of rows per chunk, this field is auto generated.
- total_number_of_chunks: int
The total number of chunks, this field is auto generated.
- class datarobot._experimental.models.chunking_service_v2.FeaturesChunkDefinition
The features chunk information.
- class datarobot._experimental.models.chunking_service_v2.RowsChunkDefinition(order_by_columns, is_descending_order=False, target_column=None, target_class=None, user_group_column=None, datetime_partition_column=None, otv_validation_start_date=None, otv_validation_end_date=None, otv_training_end_date=None, otv_latest_timestamp=None, otv_earliest_timestamp=None, otv_validation_downsampling_pct=None)
The rows chunk information.
- Attributes:
- order_by_columnsList[str]
List of the sorting column names.
- is_descending_orderbool
The sorting order. Defaults to False, ordering from smallest to largest.
- target_columnstr
The target column.
- target_classstr
For binary target, one of the possible values. For zero inflated, will be ‘0’.
- user_group_columnstr
The user group column.
- datetime_partition_columnstr
The datetime partition column name used in OTV projects.
- otv_validation_start_datedatetime
The start date for the validation set.
- otv_validation_end_datedatetime
The end date for the validation set.
- otv_training_end_datedatetime
The end date for the training set.
- otv_latest_timestampdatetime
The latest timestamp, this field is auto generated.
- otv_earliest_timestampdatetime
The earliest timestamp, this field is auto generated.
- otv_validation_downsampling_pctfloat
The percentage of the validation set to downsample, this field is auto generated.
- class datarobot._experimental.models.chunking_service_v2.DynamicDatasetProps(credentials_id)
The dataset props for a dynamic dataset.
- Attributes:
- credentials_idstr
The ID of the credentials.
- class datarobot._experimental.models.chunking_service_v2.DatasetInfo(total_rows, source_size, estimated_size_per_row, columns, dialect, data_store_id=None, data_source_id=None)
The dataset information.
- Attributes:
- total_rowsstr
The total number of rows in the dataset.
- source_sizestr
The size of the dataset.
- estimated_size_per_rowstr
The estimated size per row.
- columnsstr
The list of column names in the dataset.
- dialectstr
The sql dialect associated with the dataset (e.g., Snowflake, BigQuery, Spark).
- data_store_idstr
The ID of the data store.
- data_source_idstr
The ID of the data request used to generate sampling and metadata.
- class datarobot._experimental.models.chunking_service_v2.DatasetProps(dataset_id, dataset_version_id)
The dataset props for a catalog dataset.
- Attributes:
- dataset_idstr
The ID of the AI Catalog dataset.
- dataset_version_idstr
The ID of the AI Catalog dataset version.
- class datarobot._experimental.models.chunking_service_v2.ChunkDefinition(id, dataset_definition_id, name, is_readonly, partition_method, chunking_strategy_type, chunk_definition_stats=None, rows_chunk_definition=None, features_chunk_definition=None)
The chunk information.
- Attributes:
- idstr
The ID of the chunk entity.
- dataset_definition_idstr
The ID of the dataset definition.
- namestr
The name of the chunk entity.
- is_readonlybool
The read only flag.
- partition_methodstr
The partition method used to create chunks, either ‘random’, ‘stratified’, or ‘date’.
- chunking_strategy_typestr
The chunking strategy type, either ‘features’ or ‘rows’.
- chunk_definition_statsChunkDefinitionStats
The chunk stats information.
- rows_chunk_definitionRowsChunkDefinition
The rows chunk information.
- features_chunk_definitionFeaturesChunkDefinition
The features chunk information.
- classmethod from_data(data)
Properly convert composition classes.
- Return type:
- classmethod create(dataset_definition_id, name=None, partition_method=ChunkingPartitionMethod.RANDOM, chunking_strategy_type=ChunkingStrategy.ROWS, order_by_columns=None, is_descending_order=False, target_column=None, target_class=None, user_group_column=None, datetime_partition_column=None, otv_validation_start_date=None, otv_validation_end_date=None, otv_training_end_date=None)
Create a chunk definition.
- Parameters:
- dataset_definition_idstr
The ID of the dataset definition.
- name: str
The optional custom name of the chunk definition.
- partition_method: str
The partition method used to create chunks, either ‘random’, ‘stratified’, or ‘date’.
- chunking_strategy_type: str
The chunking strategy type, either ‘features’ or ‘rows’.
- order_by_columns: List[str]
List of the sorting column names.
- is_descending_order: bool
The sorting order. Defaults to False, ordering from smallest to largest.
- target_column: str
The target column.
- target_class: str
For binary target, one of the possible values. For zero inflated, will be ‘0’.
- user_group_column: str
The user group column.
- datetime_partition_column: str
The datetime partition column name used in OTV projects.
- otv_validation_start_date: datetime
The start date for the validation set.
- otv_validation_end_date: datetime
The end date for the validation set.
- otv_training_end_date: datetime
The end date for the training set.
- Returns:
- chunk_definition: ChunkDefinition
An instance of a created chunk definition.
- Return type:
- classmethod get(dataset_definition_id, chunk_definition_id)
Retrieve a specific chunk definition metadata.
- Parameters:
- dataset_definition_id: str
The ID of the dataset definition.
- chunk_definition_id: str
The ID of the chunk definition.
- Returns:
- chunk_definitionChunkDefinition
The queried instance.
- Return type:
- classmethod delete(dataset_definition_id, chunk_definition_id)
Delete a specific chunk definition
- Parameters:
- dataset_definition_id: str
The ID of the dataset definition.
- chunk_definition_id: str
The ID of the chunk definition.
- Return type:
None
- classmethod list(dataset_definition_id)
List all chunk defintions
- Parameters:
- dataset_definition_id: str
The ID of the dataset definition.
- Returns:
- A list of ChunkDefinition
- Return type:
List
[ChunkDefinition
]
- classmethod analyze(dataset_definition_id, chunk_definition_id, max_wait=600)
Analyze a specific chunk definition
- Parameters:
- dataset_definition_id: str
The ID of the dataset definition.
- chunk_definition_id: str
The ID of the chunk definition
- max_wait: int, optional
Time in seconds after which analyze is considered unsuccessful
- Return type:
None
- classmethod update(chunk_definition_id, dataset_definition_id, name=None, order_by_columns=None, is_descending_order=None, target_column=None, target_class=None, user_group_column=None, datetime_partition_column=None, otv_validation_start_date=None, otv_validation_end_date=None, otv_training_end_date=None, force_update=False)
Update a chunk definition.
- Parameters:
- chunk_definition_id: str
The ID of the chunk definition.
- dataset_definition_idstr
The ID of the dataset definition.
- name: str
The optional custom name of the chunk definition.
- order_by_columns: List[str]
List of the sorting column names.
- is_descending_order: bool
The sorting order. Defaults to False, ordering from smallest to largest.
- target_column: str
The target column.
- target_class: str
For binary target, one of the possible values. For zero inflated, will be ‘0’.
- user_group_column: str
The user group column.
- datetime_partition_column: str
The datetime partition column name used in OTV projects.
- otv_validation_start_date: datetime
The start date for the validation set.
- otv_validation_end_date: datetime
The end date for the validation set.
- otv_training_end_date: datetime
The end date for the training set.
- force_update: bool
If True, the update will be forced in some cases. For example, update after analysis is done.
- Returns:
- chunk_definition: ChunkDefinition
An update instance of a created chunk definition.
- Return type:
:::