Data Slices

class datarobot.models.data_slice.DataSlice(id=None, name=None, filters=None, project_id=None)

Definition of a data slice

Attributes:
idstr

ID of the data slice.

namestr

Name of the data slice definition.

filterslist[DataSliceFiltersType]
List of filters (dict) with params:
  • operandstr

    Name of the feature to use in the filter.

  • operatorstr

    Operator to use in the filter: ‘eq’, ‘in’, ‘<’, or ‘>’.

  • valuesUnion[str, int, float]

    Values to use from the feature.

project_idstr

ID of the project that the model is part of.

classmethod list(project, offset=0, limit=100)

List the data slices in the same project

Parameters:
projectUnion[str, Project]

ID of the project or Project object from which to list data slices.

offsetint, optional

Number of items to skip.

limitint, optional

Number of items to return.

Returns:
data_sliceslist[DataSlice]
Return type:

List[DataSlice]

Examples

>>> import datarobot as dr
>>> ...  # set up your Client
>>> data_slices = dr.DataSlice.list("646d0ea0cd8eb2355a68b0e5")
>>> data_slices
[DataSlice(...), DataSlice(...), ...]
classmethod create(name, filters, project)

Creates a data slice in the project with the given name and filters

Parameters:
namestr

Name of the data slice definition.

filterslist[DataSliceFiltersType]
List of filters (dict) with params:
  • operandstr

    Name of the feature to use in filter.

  • operatorstr

    Operator to use: ‘eq’, ‘in’, ‘<’, or ‘>’.

  • valuesUnion[str, int, float]

    Values to use from the feature.

projectUnion[str, Project]

Project ID or Project object from which to list data slices.

Returns:
data_sliceDataSlice

The data slice object created

Return type:

DataSlice

Examples

>>> import datarobot as dr
>>> ...  # set up your Client and retrieve a project
>>> data_slice = dr.DataSlice.create(
>>> ...    name='yes',
>>> ...    filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}],
>>> ...    project=project,
>>> ...  )
>>> data_slice
DataSlice(
    filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}],
    id=646d1296bd0c543d88923c9d,
    name=yes,
    project_id=646d0ea0cd8eb2355a68b0e5
)
delete()

Deletes the data slice from storage

Return type:

None

Examples

>>> import datarobot as dr
>>> data_slice = dr.DataSlice.get('5a8ac9ab07a57a0001be501f')
>>> data_slice.delete()
>>> import datarobot as dr
>>> ... # get project or project_id
>>> data_slices = dr.DataSlice.list(project)  # project object or project_id
>>> data_slice = data_slices[0]  # choose a data slice from the list
>>> data_slice.delete()
request_size(source, model=None)

Submits a request to validate the data slice’s filters and calculate the data slice’s number of rows on a given source

Parameters:
sourceINSIGHTS_SOURCES

Subset of data (partition or “source”) on which to apply the data slice for estimating available rows.

modelOptional[Union[str, Model]]

Model object or ID of the model. It is only required when source is “training”.

Returns:
status_check_jobStatusCheckJob

Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

Examples

>>> import datarobot as dr
>>> ... # get project or project_id
>>> data_slices = dr.DataSlice.list(project)  # project object or project_id
>>> data_slice = data_slices[0]  # choose a data slice from the list
>>> status_check_job = data_slice.request_size("validation")

Model is required when source is ‘training’

>>> import datarobot as dr
>>> ... # get project or project_id
>>> data_slices = dr.DataSlice.list(project)  # project object or project_id
>>> data_slice = data_slices[0]  # choose a data slice from the list
>>> status_check_job = data_slice.request_size("training", model)
get_size_info(source, model=None)

Get information about the data slice applied to a source

Parameters:
sourceINSIGHTS_SOURCES

Source (partition or subset) to which the data slice was applied

modelOptional[Union[str, Model]]

ID for the model whose training data was sliced with this data slice. Required when the source is “training”, and not used for other sources.

Returns:
slice_size_infoDataSliceSizeInfo

Information of the data slice applied to a source

Return type:

DataSliceSizeInfo

Examples

>>> import datarobot as dr
>>> ...  # set up your Client
>>> data_slices = dr.DataSlice.list("646d0ea0cd8eb2355a68b0e5")
>>> data_slice = slices[0]  # can be any slice in the list
>>> data_slice_size_info = data_slice.get_size_info("validation")
>>> data_slice_size_info
DataSliceSizeInfo(
    data_slice_id=6493a1776ea78e6644382535,
    messages=[
        {
            'level': 'WARNING',
            'description': 'Low Observation Count',
            'additional_info': 'Insufficient number of observations to compute some insights.'
        }
    ],
    model_id=None,
    project_id=646d0ea0cd8eb2355a68b0e5,
    slice_size=1,
    source=validation,
)
>>> data_slice_size_info.to_dict()
{
    'data_slice_id': '6493a1776ea78e6644382535',
    'messages': [
        {
            'level': 'WARNING',
            'description': 'Low Observation Count',
            'additional_info': 'Insufficient number of observations to compute some insights.'
        }
    ],
    'model_id': None,
    'project_id': '646d0ea0cd8eb2355a68b0e5',
    'slice_size': 1,
    'source': 'validation',
}
>>> import datarobot as dr
>>> ...  # set up your Client
>>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535")
>>> data_slice_size_info = data_slice.get_size_info("validation")

When using source=’training’, the model param is required.

>>> import datarobot as dr
>>> ...  # set up your Client
>>> model = dr.Model.get(project_id, model_id)
>>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535")
>>> data_slice_size_info = data_slice.get_size_info("training", model)
>>> import datarobot as dr
>>> ...  # set up your Client
>>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535")
>>> data_slice_size_info = data_slice.get_size_info("training", model_id)
classmethod get(data_slice_id)

Retrieve a specific data slice.

Parameters:
data_slice_idstr

The identifier of the data slice to retrieve.

Returns:
data_slice: DataSlice

The required data slice.

Return type:

DataSlice

Examples

>>> import datarobot as dr
>>> dr.DataSlice.get('648b232b9da812a6aaa0b7a9')
DataSlice(filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}],
          id=648b232b9da812a6aaa0b7a9,
          name=test,
          project_id=644bc575572480b565ca42cd
          )
class datarobot.models.data_slice.DataSliceSizeInfo(data_slice_id=None, project_id=None, source=None, slice_size=None, messages=None, model_id=None)

Definition of a data slice applied to a source

Attributes:
data_slice_idstr

ID of the data slice

project_idstr

ID of the project

sourcestr

Data source used to calculate the number of rows (slice size) after applying the data slice’s filters

model_idstr, optional

ID of the model, required when source (subset) is ‘training’

slice_sizeint

Number of rows in the data slice for a given source

messageslist[DataSliceSizeMessageType]

List of user-relevant messages related to a data slice