Data slices

class datarobot.models.data_slice.DataSlice

Definition of a data slice

Variables:
  • id (str) – ID of the data slice.

  • name (str) – Name of the data slice definition.

  • filters (list[DataSliceFiltersType]) –

    List of DataSliceFiltersType with params
    • operand (str) Name of the feature to use in the filter.

    • operator (str) Operator to use in the filter - eq, in, <, or >.

    • values (Union[str, int, float]) Values to use from the feature.

  • project_id (str) – ID of the project that the model is part of.

classmethod list(project, offset=0, limit=100)

List the data slices in the same project

Parameters:
  • project (Union[str, Project]) – ID of the project or Project object from which to list data slices.

  • offset (Optional[int]) – Number of items to skip.

  • limit (Optional[int]) – Number of items to return.

Returns:

data_slices

Return type:

list[DataSlice]

Examples

>>> import datarobot as dr
>>> ...  # set up your Client
>>> data_slices = dr.DataSlice.list("646d0ea0cd8eb2355a68b0e5")
>>> data_slices
[DataSlice(...), DataSlice(...), ...]
classmethod create(name, filters, project)

Creates a data slice in the project with the given name and filters

Parameters:
  • name (str) – Name of the data slice definition.

  • filters (list[DataSliceFiltersType]) –

    List of filters (dict) with params:
    • operand (str)

      Name of the feature to use in filter.

    • operator (str)

      Operator to use: ‘eq’, ‘in’, ‘<’, or ‘>’.

    • values (Union[str, int, float])

      Values to use from the feature.

  • project (Union[str, Project]) – Project ID or Project object from which to list data slices.

Returns:

data_slice – The data slice object created

Return type:

DataSlice

Examples

>>> import datarobot as dr
>>> ...  # set up your Client and retrieve a project
>>> data_slice = dr.DataSlice.create(
>>> ...    name='yes',
>>> ...    filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}],
>>> ...    project=project,
>>> ...  )
>>> data_slice
DataSlice(
    filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}],
    id=646d1296bd0c543d88923c9d,
    name=yes,
    project_id=646d0ea0cd8eb2355a68b0e5
)
delete()

Deletes the data slice from storage :rtype: None

Examples

>>> import datarobot as dr
>>> data_slice = dr.DataSlice.get('5a8ac9ab07a57a0001be501f')
>>> data_slice.delete()
>>> import datarobot as dr
>>> ... # get project or project_id
>>> data_slices = dr.DataSlice.list(project)  # project object or project_id
>>> data_slice = data_slices[0]  # choose a data slice from the list
>>> data_slice.delete()
request_size(source, model=None)

Submits a request to validate the data slice’s filters and calculate the data slice’s number of rows on a given source

Parameters:
  • source (INSIGHTS_SOURCES) – Subset of data (partition or “source”) on which to apply the data slice for estimating available rows.

  • model (Optional[Union[str, Model]]) – Model object or ID of the model. It is only required when source is “training”.

Returns:

status_check_job – Object contains all needed logic for a periodical status check of an async job.

Return type:

StatusCheckJob

Examples

>>> import datarobot as dr
>>> ... # get project or project_id
>>> data_slices = dr.DataSlice.list(project)  # project object or project_id
>>> data_slice = data_slices[0]  # choose a data slice from the list
>>> status_check_job = data_slice.request_size("validation")

Model is required when source is ‘training’

>>> import datarobot as dr
>>> ... # get project or project_id
>>> data_slices = dr.DataSlice.list(project)  # project object or project_id
>>> data_slice = data_slices[0]  # choose a data slice from the list
>>> status_check_job = data_slice.request_size("training", model)
get_size_info(source, model=None)

Get information about the data slice applied to a source

Parameters:
  • source (INSIGHTS_SOURCES) – Source (partition or subset) to which the data slice was applied

  • model (Optional[Union[str, Model]]) – ID for the model whose training data was sliced with this data slice. Required when the source is “training”, and not used for other sources.

Returns:

slice_size_info – Information of the data slice applied to a source

Return type:

DataSliceSizeInfo

Examples

>>> import datarobot as dr
>>> ...  # set up your Client
>>> data_slices = dr.DataSlice.list("646d0ea0cd8eb2355a68b0e5")
>>> data_slice = slices[0]  # can be any slice in the list
>>> data_slice_size_info = data_slice.get_size_info("validation")
>>> data_slice_size_info
DataSliceSizeInfo(
    data_slice_id=6493a1776ea78e6644382535,
    messages=[
        {
            'level': 'WARNING',
            'description': 'Low Observation Count',
            'additional_info': 'Insufficient number of observations to compute some insights.'
        }
    ],
    model_id=None,
    project_id=646d0ea0cd8eb2355a68b0e5,
    slice_size=1,
    source=validation,
)
>>> data_slice_size_info.to_dict()
{
    'data_slice_id': '6493a1776ea78e6644382535',
    'messages': [
        {
            'level': 'WARNING',
            'description': 'Low Observation Count',
            'additional_info': 'Insufficient number of observations to compute some insights.'
        }
    ],
    'model_id': None,
    'project_id': '646d0ea0cd8eb2355a68b0e5',
    'slice_size': 1,
    'source': 'validation',
}
>>> import datarobot as dr
>>> ...  # set up your Client
>>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535")
>>> data_slice_size_info = data_slice.get_size_info("validation")

When using source=’training’, the model param is required.

>>> import datarobot as dr
>>> ...  # set up your Client
>>> model = dr.Model.get(project_id, model_id)
>>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535")
>>> data_slice_size_info = data_slice.get_size_info("training", model)
>>> import datarobot as dr
>>> ...  # set up your Client
>>> data_slice = dr.DataSlice.get("6493a1776ea78e6644382535")
>>> data_slice_size_info = data_slice.get_size_info("training", model_id)
classmethod get(data_slice_id)

Retrieve a specific data slice.

Parameters:

data_slice_id (str) – The identifier of the data slice to retrieve.

Returns:

data_slice – The required data slice.

Return type:

DataSlice

Examples

>>> import datarobot as dr
>>> dr.DataSlice.get('648b232b9da812a6aaa0b7a9')
DataSlice(filters=[{'operand': 'binary_target', 'operator': 'eq', 'values': ['Yes']}],
          id=648b232b9da812a6aaa0b7a9,
          name=test,
          project_id=644bc575572480b565ca42cd
          )
class datarobot.models.data_slice.DataSliceSizeInfo

Definition of a data slice applied to a source

Variables:
  • data_slice_id (str) – ID of the data slice

  • project_id (str) – ID of the project

  • source (str) – Data source used to calculate the number of rows (slice size) after applying the data slice’s filters

  • model_id (Optional[str]) – ID of the model, required when source (subset) is ‘training’

  • slice_size (int) – Number of rows in the data slice for a given source

  • messages (list[DataSliceSizeMessageType]) – List of user-relevant messages related to a data slice