Dataset Definition

class datarobot.helpers.feature_discovery.DatasetDefinition(identifier, catalog_id, catalog_version_id, snapshot_policy='latest', feature_list_id=None, primary_temporal_key=None)

Dataset definition for the Feature Discovery

Added in version v2.25.

Examples

import datarobot as dr
dataset_definition = dr.DatasetDefinition(
    identifier='profile',
    catalog_id='5ec4aec1f072bc028e3471ae',
    catalog_version_id='5ec4aec2f072bc028e3471b1',
)

dataset_definition = dr.DatasetDefinition(
    identifier='transaction',
    catalog_id='5ec4aec1f072bc028e3471ae',
    catalog_version_id='5ec4aec2f072bc028e3471b1',
    primary_temporal_key='Date'
)
Attributes:
identifier: string

Alias of the dataset (used directly as part of the generated feature names)

catalog_id: string, optional

Identifier of the catalog item

catalog_version_id: string

Identifier of the catalog item version

primary_temporal_key: string, optional

Name of the column indicating time of record creation

feature_list_id: string, optional

Identifier of the feature list. This decides which columns in the dataset are used for feature generation

snapshot_policy: string, optional

Policy to use when creating a project or making predictions. If omitted, by default endpoint will use ‘latest’. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)