Segmented modeling

API Reference for entities used in Segmented Modeling. See dedicated User Guide for examples.

class datarobot.CombinedModel

A model from a segmented project. Combination of ordinary models in child segments projects.

Variables:

id (str) – the id of the model
project_id (str) – the id of the project the model belongs to
segmentation_task_id (str) – the id of a segmentation task used in this model
is_active_combined_model (bool) – flag indicating if this is the active combined model in segmented project

classmethod get(project_id, combined_model_id)

Retrieve combined model

Parameters:

project_id (str) – The project’s id.
combined_model_id (str) – Id of the combined model.

Returns:

The queried combined model.

Return type:

CombinedModel

classmethod set_segment_champion(project_id, model_id, clone=False)

Update a segment champion in a combined model by setting the model_id that belongs to the child project_id as the champion.

Parameters:

project_id (str) – The project id for the child model that contains the model id.
model_id (str) – Id of the model to mark as the champion
clone (bool) – (New in version v2.29) optional, defaults to False. Defines if combined model has to be cloned prior to setting champion (champion will be set for new combined model if yes).

Returns:

combined_model_id – Id of the combined model that was updated

Return type:

str

get_segments_info()

Retrieve Combined Model segments info

Returns:: List of segments
Return type:: list[SegmentInfo]

get_segments_as_dataframe(encoding='utf-8')

Retrieve Combine Models segments as a DataFrame.

Parameters:: encoding (Optional[str]) – A string representing the encoding to use in the output csv file. Defaults to ‘utf-8’.
Returns:: Combined model segments
Return type:: DataFrame

get_segments_as_csv(filename, encoding='utf-8')

Save the Combine Models segments to a csv.

Parameters:

filename (str or file object) – The path or file object to save the data to.
encoding (Optional[str]) – A string representing the encoding to use in the output csv file. Defaults to ‘utf-8’.

Return type:

None

train(sample_pct=None, featurelist_id=None, scoring_type=None, training_row_count=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>)

Inherited from Model - CombinedModels cannot be retrained directly

Return type:: NoReturn

train_datetime(featurelist_id=None, training_row_count=None, training_duration=None, time_window_sample_pct=None, monotonic_increasing_featurelist_id=<object object>, monotonic_decreasing_featurelist_id=<object object>, use_project_settings=False, sampling_method=None, n_clusters=None)

Inherited from Model - CombinedModels cannot be retrained directly

Return type:: NoReturn

retrain(sample_pct=None, featurelist_id=None, training_row_count=None, n_clusters=None)

Inherited from Model - CombinedModels cannot be retrained directly

Return type:: NoReturn

request_frozen_model(sample_pct=None, training_row_count=None)

Inherited from Model - CombinedModels cannot be retrained as frozen

Return type:: NoReturn

request_frozen_datetime_model(training_row_count=None, training_duration=None, training_start_date=None, training_end_date=None, time_window_sample_pct=None, sampling_method=None)

Inherited from Model - CombinedModels cannot be retrained as frozen

Return type:: NoReturn

cross_validate()

Inherited from Model - CombinedModels cannot request cross validation

Return type:: NoReturn

class datarobot.SegmentationTask

A Segmentation Task is used for segmenting an existing project into multiple child projects. Each child project (or segment) will be a separate autopilot run. Currently only user defined segmentation is supported.

Example for creating a new SegmentationTask for Time Series segmentation with a user defined id column:

from datarobot import SegmentationTask

# Create the SegmentationTask
segmentation_task_results = SegmentationTask.create(
    project_id=project.id,
    target=target,
    use_time_series=True,
    datetime_partition_column=datetime_partition_column,
    multiseries_id_columns=[multiseries_id_column],
    user_defined_segment_id_columns=[user_defined_segment_id_column]
)

# Retrieve the completed SegmentationTask object from the job results
segmentation_task = segmentation_task_results['completedJobs'][0]

Variables:

id (bson.ObjectId) – The id of the segmentation task.
project_id (bson.ObjectId) – The associated id of the parent project.
type (str) – What type of job the segmentation task is associated with, e.g. auto_ml or auto_ts.
created (datetime.datetime) – The date this segmentation task was created.
segments_count (int) – The number of segments the segmentation task generated.
segments (list[str]) – The segment names that the segmentation task generated.
metadata (dict) – List of features that help to identify the parameters used by the segmentation task.
data (dict) – Optional parameters that are associated with enabled metadata for the segmentation task.

classmethod from_data(data)

Instantiate an object of this class using a dict.

Parameters:: data (dict) – Correctly snake_cased keys and their values.
Return type:: SegmentationTask

collect_payload()

Convert the record to a dictionary

Return type:: Dict[str, str]

classmethod create(project_id, target, use_time_series=False, datetime_partition_column=None, multiseries_id_columns=None, user_defined_segment_id_columns=None, max_wait=600, model_package_id=None)

Creates segmentation tasks for the project based on the defined parameters.

Parameters:

project_id (str) – The associated id of the parent project.
target (str) – The column that represents the target in the dataset.
use_time_series (bool) – Whether AutoTS or AutoML segmentations should be generated.
datetime_partition_column (str or null) – Required for Time Series. The name of the column whose values as dates are used to assign a row to a particular partition.
multiseries_id_columns (List[str] or null) – Required for Time Series. A list of the names of multiseries id columns to define series within the training data. Currently only one multiseries id column is supported.
user_defined_segment_id_columns (List[str] or null) – Required when using a column for segmentation. A list of the segment id columns to use to define what columns are used to manually segment data. Currently only one user defined segment id column is supported.
model_package_id (str) – Required when using automated segmentation. The associated id of the model in the DataRobot Model Registry that will be used to perform automated segmentation on a dataset.
max_wait (integer) – The number of seconds to wait

Returns:

segmentation_tasks – Dictionary containing the numberOfJobs, completedJobs, and failedJobs. completedJobs is a list of SegmentationTask objects, while failed jobs is a list of dictionaries indicating problems with submitted tasks.

Return type:

dict

classmethod list(project_id)

List all of the segmentation tasks that have been created for a specific project_id.

Parameters:: project_id (str) – The id of the parent project
Returns:: segmentation_tasks – List of instances with initialized data.
Return type:: list of SegmentationTask

classmethod get(project_id, segmentation_task_id)

Retrieve information for a single segmentation task associated with a project_id.

Parameters:

project_id (str) – The id of the parent project
segmentation_task_id (str) – The id of the segmentation task

Returns:

segmentation_task – Instance with initialized data.

Return type:

SegmentationTask

class datarobot.SegmentInfo

A SegmentInfo is an object containing information about the combined model segments

Variables:

project_id (str) – The associated id of the child project.
segment (str) – the name of the segment
project_stage (str) – A description of the current stage of the project
project_status_error (str) – Project status error message.
autopilot_done (bool) – Is autopilot done for the project.
model_count (int) – Count of trained models in project.
model_id (str) – ID of segment champion model.

classmethod list(project_id, model_id)

List all of the segments that have been created for a specific project_id.

Parameters:: project_id (str) – The id of the parent project
Returns:: segments – List of instances with initialized data.
Return type:: list of datarobot.models.segmentation.SegmentInfo

class datarobot.models.segmentation.SegmentationTask

A Segmentation Task is used for segmenting an existing project into multiple child projects. Each child project (or segment) will be a separate autopilot run. Currently only user defined segmentation is supported.

Example for creating a new SegmentationTask for Time Series segmentation with a user defined id column:

from datarobot import SegmentationTask

# Create the SegmentationTask
segmentation_task_results = SegmentationTask.create(
    project_id=project.id,
    target=target,
    use_time_series=True,
    datetime_partition_column=datetime_partition_column,
    multiseries_id_columns=[multiseries_id_column],
    user_defined_segment_id_columns=[user_defined_segment_id_column]
)

# Retrieve the completed SegmentationTask object from the job results
segmentation_task = segmentation_task_results['completedJobs'][0]

Variables:

id (bson.ObjectId) – The id of the segmentation task.
project_id (bson.ObjectId) – The associated id of the parent project.
type (str) – What type of job the segmentation task is associated with, e.g. auto_ml or auto_ts.
created (datetime.datetime) – The date this segmentation task was created.
segments_count (int) – The number of segments the segmentation task generated.
segments (list[str]) – The segment names that the segmentation task generated.
metadata (dict) – List of features that help to identify the parameters used by the segmentation task.
data (dict) – Optional parameters that are associated with enabled metadata for the segmentation task.

classmethod from_data(data)

Instantiate an object of this class using a dict.

Parameters:: data (dict) – Correctly snake_cased keys and their values.
Return type:: SegmentationTask

collect_payload()

Convert the record to a dictionary

Return type:: Dict[str, str]

classmethod create(project_id, target, use_time_series=False, datetime_partition_column=None, multiseries_id_columns=None, user_defined_segment_id_columns=None, max_wait=600, model_package_id=None)

Creates segmentation tasks for the project based on the defined parameters.

Parameters:

project_id (str) – The associated id of the parent project.
target (str) – The column that represents the target in the dataset.
use_time_series (bool) – Whether AutoTS or AutoML segmentations should be generated.
datetime_partition_column (str or null) – Required for Time Series. The name of the column whose values as dates are used to assign a row to a particular partition.
multiseries_id_columns (List[str] or null) – Required for Time Series. A list of the names of multiseries id columns to define series within the training data. Currently only one multiseries id column is supported.
user_defined_segment_id_columns (List[str] or null) – Required when using a column for segmentation. A list of the segment id columns to use to define what columns are used to manually segment data. Currently only one user defined segment id column is supported.
model_package_id (str) – Required when using automated segmentation. The associated id of the model in the DataRobot Model Registry that will be used to perform automated segmentation on a dataset.
max_wait (integer) – The number of seconds to wait

Returns:

segmentation_tasks – Dictionary containing the numberOfJobs, completedJobs, and failedJobs. completedJobs is a list of SegmentationTask objects, while failed jobs is a list of dictionaries indicating problems with submitted tasks.

Return type:

dict

classmethod list(project_id)

List all of the segmentation tasks that have been created for a specific project_id.

Parameters:: project_id (str) – The id of the parent project
Returns:: segmentation_tasks – List of instances with initialized data.
Return type:: list of SegmentationTask

classmethod get(project_id, segmentation_task_id)

Retrieve information for a single segmentation task associated with a project_id.

Parameters:

project_id (str) – The id of the parent project
segmentation_task_id (str) – The id of the segmentation task

Returns:

segmentation_task – Instance with initialized data.

Return type:

SegmentationTask

class datarobot.models.segmentation.SegmentationTaskCreatedResponse