Data Slices
- class datarobot.models.data_slice.DataSlice(id=None, name=None, filters=None, project_id=None)
Definition of a data slice
- Attributes:
- idstr
ID of the data slice.
- namestr
Name of the data slice definition.
- filterslist[DataSliceFiltersType]
- List of filters (dict) with params:
- operandstr
Name of the feature to use in the filter.
- operatorstr
Operator to use in the filter: ‘eq’, ‘in’, ‘<’, or ‘>’.
- valuesUnion[str, int, float]
Values to use from the feature.
- project_idstr
ID of the project that the model is part of.
- classmethod list(project, offset=0, limit=100)
List the data slices in the same project
- Parameters:
- projectUnion[str, Project]
ID of the project or Project object from which to list data slices.
- offsetint, optional
Number of items to skip.
- limitint, optional
Number of items to return.
- Returns:
- data_sliceslist[DataSlice]
- Return type:
List
[DataSlice
]
Examples
>>> import datarobot as dr >>> ... # set up your Client >>> data_slices = dr.DataSlice.list("646d0ea0cd8eb2355a68b0e5") >>> data_slices [DataSlice(...), DataSlice(...), ...]
- classmethod create(name, filters, project)
Creates a data slice in the project with the given name and filters
- Parameters:
- namestr
Name of the data slice definition.
- filterslist[DataSliceFiltersType]
- List of filters (dict) with params:
- operandstr
Name of the feature to use in filter.
- operatorstr
Operator to use: ‘eq’, ‘in’, ‘<’, or ‘>’.
- valuesUnion[str, int, float]
Values to use from the feature.
- projectUnion[str, Project]
Project ID or Project object from which to list data slices.
- Returns:
- data_sliceDataSlice
The data slice object created
- Return type:
Examples
>>> import datarobot as dr >>> ... # set up your Client and retrieve a project >>> data_slice = dr.DataSlice.create( >>> ... name='yes', >>> ... filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}], >>> ... project=project, >>> ... ) >>> data_slice DataSlice( filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}], id=646d1296bd0c543d88923c9d, name=yes, project_id=646d0ea0cd8eb2355a68b0e5 )
- delete()
Deletes the data slice from storage
- Return type:
None
Examples
>>> import datarobot as dr >>> data_slice = dr.DataSlice.get('5a8ac9ab07a57a0001be501f') >>> data_slice.delete()
>>> import datarobot as dr >>> ... # get project or project_id >>> data_slices = dr.DataSlice.list(project) # project object or project_id >>> data_slice = data_slices[0] # choose a data slice from the list >>> data_slice.delete()
- request_size(source, model=None)
Submits a request to validate the data slice’s filters and calculate the data slice’s number of rows on a given source
- Parameters:
- sourceINSIGHTS_SOURCES
Subset of data (partition or “source”) on which to apply the data slice for estimating available rows.
- modelOptional[Union[str, Model]]
Model object or ID of the model. It is only required when source is “training”.
- Returns:
- status_check_jobStatusCheckJob
Object contains all needed logic for a periodical status check of an async job.
- Return type:
Examples
>>> import datarobot as dr >>> ... # get project or project_id >>> data_slices = dr.DataSlice.list(project) # project object or project_id >>> data_slice = data_slices[0] # choose a data slice from the list >>> status_check_job = data_slice.request_size("validation")
Model is required when source is ‘training’
>>> import datarobot as dr >>> ... # get project or project_id >>> data_slices = dr.DataSlice.list(project) # project object or project_id >>> data_slice = data_slices[0] # choose a data slice from the list >>> status_check_job = data_slice.request_size("training", model)
- get_size_info(source, model=None)
Get information about the data slice applied to a source
- Parameters:
- sourceINSIGHTS_SOURCES
Source (partition or subset) to which the data slice was applied
- modelOptional[Union[str, Model]]
ID for the model whose training data was sliced with this data slice. Required when the source is “training”, and not used for other sources.
- Returns:
- slice_size_infoDataSliceSizeInfo
Information of the data slice applied to a source
- Return type:
Examples
>>> import datarobot as dr >>> ... # set up your Client >>> data_slices = dr.DataSlice.list("646d0ea0cd8eb2355a68b0e5") >>> data_slice = slices[0] # can be any slice in the list >>> data_slice_size_info = data_slice.get_size_info("validation") >>> data_slice_size_info DataSliceSizeInfo( data_slice_id=6493a1776ea78e6644382535, messages=[ { 'level': 'WARNING', 'description': 'Low Observation Count', 'additional_info': 'Insufficient number of observations to compute some insights.' } ], model_id=None, project_id=646d0ea0cd8eb2355a68b0e5, slice_size=1, source=validation, ) >>> data_slice_size_info.to_dict() { 'data_slice_id': '6493a1776ea78e6644382535', 'messages': [ { 'level': 'WARNING', 'description': 'Low Observation Count', 'additional_info': 'Insufficient number of observations to compute some insights.' } ], 'model_id': None, 'project_id': '646d0ea0cd8eb2355a68b0e5', 'slice_size': 1, 'source': 'validation', }
>>> import datarobot as dr >>> ... # set up your Client >>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535") >>> data_slice_size_info = data_slice.get_size_info("validation")
When using source=’training’, the model param is required.
>>> import datarobot as dr >>> ... # set up your Client >>> model = dr.Model.get(project_id, model_id) >>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535") >>> data_slice_size_info = data_slice.get_size_info("training", model)
>>> import datarobot as dr >>> ... # set up your Client >>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535") >>> data_slice_size_info = data_slice.get_size_info("training", model_id)
- classmethod get(data_slice_id)
Retrieve a specific data slice.
- Parameters:
- data_slice_idstr
The identifier of the data slice to retrieve.
- Returns:
- data_slice: DataSlice
The required data slice.
- Return type:
Examples
>>> import datarobot as dr >>> dr.DataSlice.get('648b232b9da812a6aaa0b7a9') DataSlice(filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}], id=648b232b9da812a6aaa0b7a9, name=test, project_id=644bc575572480b565ca42cd )
- class datarobot.models.data_slice.DataSliceSizeInfo(data_slice_id=None, project_id=None, source=None, slice_size=None, messages=None, model_id=None)
Definition of a data slice applied to a source
- Attributes:
- data_slice_idstr
ID of the data slice
- project_idstr
ID of the project
- sourcestr
Data source used to calculate the number of rows (slice size) after applying the data slice’s filters
- model_idstr, optional
ID of the model, required when source (subset) is ‘training’
- slice_sizeint
Number of rows in the data slice for a given source
- messageslist[DataSliceSizeMessageType]
List of user-relevant messages related to a data slice