Segmented modeling
API Reference for entities used in Segmented Modeling. See dedicated User Guide for examples.
- class datarobot.CombinedModel
A model from a segmented project. Combination of ordinary models in child segments projects.
- Variables:
id (
str
) – the id of the modelproject_id (
str
) – the id of the project the model belongs tosegmentation_task_id (
str
) – the id of a segmentation task used in this modelis_active_combined_model (
bool
) – flag indicating if this is the active combined model in segmented project
- classmethod get(project_id, combined_model_id)
Retrieve combined model
- Parameters:
project_id (
str
) – The project’s id.combined_model_id (
str
) – Id of the combined model.
- Returns:
The queried combined model.
- Return type:
- classmethod set_segment_champion(project_id, model_id, clone=False)
Update a segment champion in a combined model by setting the model_id that belongs to the child project_id as the champion.
- Parameters:
project_id (
str
) – The project id for the child model that contains the model id.model_id (
str
) – Id of the model to mark as the championclone (
bool
) – (New in version v2.29) optional, defaults to False. Defines if combined model has to be cloned prior to setting champion (champion will be set for new combined model if yes).
- Returns:
combined_model_id – Id of the combined model that was updated
- Return type:
str
- get_segments_info()
Retrieve Combined Model segments info
- Returns:
List of segments
- Return type:
list[SegmentInfo]
- get_segments_as_dataframe(encoding='utf-8')
Retrieve Combine Models segments as a DataFrame.
- Parameters:
encoding (
Optional[str]
) – A string representing the encoding to use in the output csv file. Defaults to ‘utf-8’.- Returns:
Combined model segments
- Return type:
DataFrame
- get_segments_as_csv(filename, encoding='utf-8')
Save the Combine Models segments to a csv.
- Parameters:
filename (
str
orfile object
) – The path or file object to save the data to.encoding (
Optional[str]
) – A string representing the encoding to use in the output csv file. Defaults to ‘utf-8’.
- Return type:
None
- train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)
Inherited from Model - CombinedModels cannot be retrained directly
- Return type:
NoReturn
- train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)
Inherited from Model - CombinedModels cannot be retrained directly
- Return type:
NoReturn
- retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)
Inherited from Model - CombinedModels cannot be retrained directly
- Return type:
NoReturn
- request_frozen_model(sample_pct=None, training_row_count=None)
Inherited from Model - CombinedModels cannot be retrained as frozen
- Return type:
NoReturn
- request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)
Inherited from Model - CombinedModels cannot be retrained as frozen
- Return type:
NoReturn
- cross_validate()
Inherited from Model - CombinedModels cannot request cross validation
- Return type:
NoReturn
- class datarobot.SegmentationTask
A Segmentation Task is used for segmenting an existing project into multiple child projects. Each child project (or segment) will be a separate autopilot run. Currently only user defined segmentation is supported.
Example for creating a new SegmentationTask for Time Series segmentation with a user defined id column:
from datarobot import SegmentationTask # Create the SegmentationTask segmentation_task_results = SegmentationTask.create( project_id=project.id, target=target, use_time_series=True, datetime_partition_column=datetime_partition_column, multiseries_id_columns=[multiseries_id_column], user_defined_segment_id_columns=[user_defined_segment_id_column] ) # Retrieve the completed SegmentationTask object from the job results segmentation_task = segmentation_task_results['completedJobs'][0]
- Variables:
id (
bson.ObjectId
) – The id of the segmentation task.project_id (
bson.ObjectId
) – The associated id of the parent project.type (
str
) – What type of job the segmentation task is associated with, e.g. auto_ml or auto_ts.created (
datetime.datetime
) – The date this segmentation task was created.segments_count (
int
) – The number of segments the segmentation task generated.segments (
list[str]
) – The segment names that the segmentation task generated.metadata (
dict
) – List of features that help to identify the parameters used by the segmentation task.data (
dict
) – Optional parameters that are associated with enabled metadata for the segmentation task.
- classmethod from_data(data)
Instantiate an object of this class using a dict.
- Parameters:
data (
dict
) – Correctly snake_cased keys and their values.- Return type:
- collect_payload()
Convert the record to a dictionary
- Return type:
Dict
[str
,str
]
- classmethod create(project_id, target, use_time_series=False, datetime_partition_column=None, multiseries_id_columns=None, user_defined_segment_id_columns=None, max_wait=600, model_package_id=None)
Creates segmentation tasks for the project based on the defined parameters.
- Parameters:
project_id (
str
) – The associated id of the parent project.target (
str
) – The column that represents the target in the dataset.use_time_series (
bool
) – Whether AutoTS or AutoML segmentations should be generated.datetime_partition_column (
str
ornull
) – Required for Time Series. The name of the column whose values as dates are used to assign a row to a particular partition.multiseries_id_columns (
List[str]
ornull
) – Required for Time Series. A list of the names of multiseries id columns to define series within the training data. Currently only one multiseries id column is supported.user_defined_segment_id_columns (
List[str]
ornull
) – Required when using a column for segmentation. A list of the segment id columns to use to define what columns are used to manually segment data. Currently only one user defined segment id column is supported.model_package_id (
str
) – Required when using automated segmentation. The associated id of the model in the DataRobot Model Registry that will be used to perform automated segmentation on a dataset.max_wait (
integer
) – The number of seconds to wait
- Returns:
segmentation_tasks – Dictionary containing the numberOfJobs, completedJobs, and failedJobs. completedJobs is a list of SegmentationTask objects, while failed jobs is a list of dictionaries indicating problems with submitted tasks.
- Return type:
dict
- classmethod list(project_id)
List all of the segmentation tasks that have been created for a specific project_id.
- Parameters:
project_id (
str
) – The id of the parent project- Returns:
segmentation_tasks – List of instances with initialized data.
- Return type:
- classmethod get(project_id, segmentation_task_id)
Retrieve information for a single segmentation task associated with a project_id.
- Parameters:
project_id (
str
) – The id of the parent projectsegmentation_task_id (
str
) – The id of the segmentation task
- Returns:
segmentation_task – Instance with initialized data.
- Return type:
- class datarobot.SegmentInfo
A SegmentInfo is an object containing information about the combined model segments
- Variables:
project_id (
str
) – The associated id of the child project.segment (
str
) – the name of the segmentproject_stage (
str
) – A description of the current stage of the projectproject_status_error (
str
) – Project status error message.autopilot_done (
bool
) – Is autopilot done for the project.model_count (
int
) – Count of trained models in project.model_id (
str
) – ID of segment champion model.
- classmethod list(project_id, model_id)
List all of the segments that have been created for a specific project_id.
- Parameters:
project_id (
str
) – The id of the parent project- Returns:
segments – List of instances with initialized data.
- Return type:
- class datarobot.models.segmentation.SegmentationTask
A Segmentation Task is used for segmenting an existing project into multiple child projects. Each child project (or segment) will be a separate autopilot run. Currently only user defined segmentation is supported.
Example for creating a new SegmentationTask for Time Series segmentation with a user defined id column:
from datarobot import SegmentationTask # Create the SegmentationTask segmentation_task_results = SegmentationTask.create( project_id=project.id, target=target, use_time_series=True, datetime_partition_column=datetime_partition_column, multiseries_id_columns=[multiseries_id_column], user_defined_segment_id_columns=[user_defined_segment_id_column] ) # Retrieve the completed SegmentationTask object from the job results segmentation_task = segmentation_task_results['completedJobs'][0]
- Variables:
id (
bson.ObjectId
) – The id of the segmentation task.project_id (
bson.ObjectId
) – The associated id of the parent project.type (
str
) – What type of job the segmentation task is associated with, e.g. auto_ml or auto_ts.created (
datetime.datetime
) – The date this segmentation task was created.segments_count (
int
) – The number of segments the segmentation task generated.segments (
list[str]
) – The segment names that the segmentation task generated.metadata (
dict
) – List of features that help to identify the parameters used by the segmentation task.data (
dict
) – Optional parameters that are associated with enabled metadata for the segmentation task.
- classmethod from_data(data)
Instantiate an object of this class using a dict.
- Parameters:
data (
dict
) – Correctly snake_cased keys and their values.- Return type:
- collect_payload()
Convert the record to a dictionary
- Return type:
Dict
[str
,str
]
- classmethod create(project_id, target, use_time_series=False, datetime_partition_column=None, multiseries_id_columns=None, user_defined_segment_id_columns=None, max_wait=600, model_package_id=None)
Creates segmentation tasks for the project based on the defined parameters.
- Parameters:
project_id (
str
) – The associated id of the parent project.target (
str
) – The column that represents the target in the dataset.use_time_series (
bool
) – Whether AutoTS or AutoML segmentations should be generated.datetime_partition_column (
str
ornull
) – Required for Time Series. The name of the column whose values as dates are used to assign a row to a particular partition.multiseries_id_columns (
List[str]
ornull
) – Required for Time Series. A list of the names of multiseries id columns to define series within the training data. Currently only one multiseries id column is supported.user_defined_segment_id_columns (
List[str]
ornull
) – Required when using a column for segmentation. A list of the segment id columns to use to define what columns are used to manually segment data. Currently only one user defined segment id column is supported.model_package_id (
str
) – Required when using automated segmentation. The associated id of the model in the DataRobot Model Registry that will be used to perform automated segmentation on a dataset.max_wait (
integer
) – The number of seconds to wait
- Returns:
segmentation_tasks – Dictionary containing the numberOfJobs, completedJobs, and failedJobs. completedJobs is a list of SegmentationTask objects, while failed jobs is a list of dictionaries indicating problems with submitted tasks.
- Return type:
dict
- classmethod list(project_id)
List all of the segmentation tasks that have been created for a specific project_id.
- Parameters:
project_id (
str
) – The id of the parent project- Returns:
segmentation_tasks – List of instances with initialized data.
- Return type:
- classmethod get(project_id, segmentation_task_id)
Retrieve information for a single segmentation task associated with a project_id.
- Parameters:
project_id (
str
) – The id of the parent projectsegmentation_task_id (
str
) – The id of the segmentation task
- Returns:
segmentation_task – Instance with initialized data.
- Return type:
- class datarobot.models.segmentation.SegmentationTaskCreatedResponse