Dataset definition

class datarobot.helpers.feature_discovery.DatasetDefinition

Dataset definition for the Feature Discovery

Added in version v2.25.

Variables:
  • identifier (str) – Alias of the dataset (used directly as part of the generated feature names)

  • catalog_id (Optional[str]) – Identifier of the catalog item

  • catalog_version_id (str) – Identifier of the catalog item version

  • primary_temporal_key (Optional[str]) – Name of the column indicating time of record creation

  • feature_list_id (Optional[str]) – Identifier of the feature list. This decides which columns in the dataset are used for feature generation

  • snapshot_policy (Optional[str]) – Policy to use when creating a project or making predictions. If omitted, by default endpoint will use ‘latest’. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)

Examples

import datarobot as dr
dataset_definition = dr.DatasetDefinition(
    identifier='profile',
    catalog_id='5ec4aec1f072bc028e3471ae',
    catalog_version_id='5ec4aec2f072bc028e3471b1',
)

dataset_definition = dr.DatasetDefinition(
    identifier='transaction',
    catalog_id='5ec4aec1f072bc028e3471ae',
    catalog_version_id='5ec4aec2f072bc028e3471b1',
    primary_temporal_key='Date'
)