Secondary Dataset

class datarobot.helpers.feature_discovery.SecondaryDataset(identifier, catalog_id, catalog_version_id, snapshot_policy='latest')

A secondary dataset to be used for feature discovery

Added in version v2.25.

Examples

import datarobot as dr
dataset_definition = dr.SecondaryDataset(
    identifier='profile',
    catalog_id='5ec4aec1f072bc028e3471ae',
    catalog_version_id='5ec4aec2f072bc028e3471b1',
)
Attributes:
identifier: string

Alias of the dataset (used directly as part of the generated feature names)

catalog_id: string

Identifier of the catalog item

catalog_version_id: string

Identifier of the catalog item version

snapshot_policy: string, optional

Policy to use while creating a project or making predictions. If omitted, by default endpoint will use ‘latest’. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)

Secondary Dataset Configurations

class datarobot.models.SecondaryDatasetConfigurations(id, project_id, config=None, secondary_datasets=None, name=None, creator_full_name=None, creator_user_id=None, created=None, featurelist_id=None, credential_ids=None, is_default=None, project_version=None)

Create secondary dataset configurations for a given project

Added in version v2.20.

Attributes:
idstr

Id of this secondary dataset configuration

project_idstr

Id of the associated project.

config: list of DatasetConfiguration (Deprecated in version v2.23)

List of secondary dataset configurations

secondary_datasets: list of SecondaryDataset (new in v2.23)

List of secondary datasets (secondaryDataset)

name: str

Verbose name of the SecondaryDatasetConfig. null if it wasn’t specified.

created: datetime.datetime

DR-formatted datetime. null for legacy (before DR 6.0) db records.

creator_user_id: str

Id of the user created this config.

creator_full_name: str

fullname or email of the user created this config.

featurelist_id: str, optional

Id of the feature list. null if it wasn’t specified.

credential_ids: list of DatasetsCredentials, optional

credentials used by the secondary datasets if the datasets used in the configuration are from datasource

is_default: bool, optional

Boolean flag if default config created during feature discovery aim

project_version: str, optional

Version of project when its created (Release version)

classmethod create(project_id, secondary_datasets, name, featurelist_id=None)

create secondary dataset configurations :rtype: SecondaryDatasetConfigurations

Added in version v2.20.

Parameters:
project_idstr

id of the associated project.

secondary_datasets: list of SecondaryDataset (New in version v2.23)

list of secondary datasets used by the configuration each element is a datarobot.helpers.feature_discovery.SecondaryDataset

name: str (New in version v2.23)

Name of the secondary datasets configuration

featurelist_id: str, or None (New in version v2.23)

Id of the featurelist

Returns:
an instance of SecondaryDatasetConfigurations
Raises:
ClientError

raised if incorrect configuration parameters are provided

Examples

profile_secondary_dataset = dr.SecondaryDataset(
    identifier='profile',
    catalog_id='5ec4aec1f072bc028e3471ae',
    catalog_version_id='5ec4aec2f072bc028e3471b1',
    snapshot_policy='latest'
)

transaction_secondary_dataset = dr.SecondaryDataset(
    identifier='transaction',
    catalog_id='5ec4aec268f0f30289a03901',
    catalog_version_id='5ec4aec268f0f30289a03900',
    snapshot_policy='latest'
)

secondary_datasets = [profile_secondary_dataset, transaction_secondary_dataset]
new_secondary_dataset_config = dr.SecondaryDatasetConfigurations.create(
    project_id=project.id,
    name='My config',
    secondary_datasets=secondary_datasets
)

>>> new_secondary_dataset_config.id
'5fd1e86c589238a4e635e93d'
delete()

Removes the Secondary datasets configuration :rtype: None

Added in version v2.21.

Raises:
ClientError

Raised if an invalid or already deleted secondary dataset config id is provided

Examples

# Deleting with a valid secondary_dataset_config id
status_code = dr.SecondaryDatasetConfigurations.delete(some_config_id)
status_code
>>> 204
get()

Retrieve a single secondary dataset configuration for a given id :rtype: SecondaryDatasetConfigurations

Added in version v2.21.

Returns:
secondary_dataset_configurationsSecondaryDatasetConfigurations

The requested secondary dataset configurations

Examples

config_id = '5fd1e86c589238a4e635e93d'
secondary_dataset_config = dr.SecondaryDatasetConfigurations(id=config_id).get()
>>> secondary_dataset_config
{
     'created': datetime.datetime(2020, 12, 9, 6, 16, 22, tzinfo=tzutc()),
     'creator_full_name': u'[email protected]',
     'creator_user_id': u'asdf4af1gf4bdsd2fba1de0a',
     'credential_ids': None,
     'featurelist_id': None,
     'id': u'5fd1e86c589238a4e635e93d',
     'is_default': True,
     'name': u'My config',
     'project_id': u'5fd06afce2456ec1e9d20457',
     'project_version': None,
     'secondary_datasets': [
            {
                'snapshot_policy': u'latest',
                'identifier': u'profile',
                'catalog_version_id': u'5fd06b4af24c641b68e4d88f',
                'catalog_id': u'5fd06b4af24c641b68e4d88e'
            },
            {
                'snapshot_policy': u'dynamic',
                'identifier': u'transaction',
                'catalog_version_id': u'5fd1e86c589238a4e635e98e',
                'catalog_id': u'5fd1e86c589238a4e635e98d'
            }
     ]
}
classmethod list(project_id, featurelist_id=None, limit=None, offset=None)

Returns list of secondary dataset configurations. :rtype: List[SecondaryDatasetConfigurations]

Added in version v2.23.

Parameters:
project_id: str

The Id of project

featurelist_id: str, optional

Id of the feature list to filter the secondary datasets configurations

Returns:
secondary_dataset_configurationslist of SecondaryDatasetConfigurations

The requested list of secondary dataset configurations for a given project

Examples

pid = '5fd06afce2456ec1e9d20457'
secondary_dataset_configs = dr.SecondaryDatasetConfigurations.list(pid)
>>> secondary_dataset_configs[0]
    {
         'created': datetime.datetime(2020, 12, 9, 6, 16, 22, tzinfo=tzutc()),
         'creator_full_name': u'[email protected]',
         'creator_user_id': u'asdf4af1gf4bdsd2fba1de0a',
         'credential_ids': None,
         'featurelist_id': None,
         'id': u'5fd1e86c589238a4e635e93d',
         'is_default': True,
         'name': u'My config',
         'project_id': u'5fd06afce2456ec1e9d20457',
         'project_version': None,
         'secondary_datasets': [
                {
                    'snapshot_policy': u'latest',
                    'identifier': u'profile',
                    'catalog_version_id': u'5fd06b4af24c641b68e4d88f',
                    'catalog_id': u'5fd06b4af24c641b68e4d88e'
                },
                {
                    'snapshot_policy': u'dynamic',
                    'identifier': u'transaction',
                    'catalog_version_id': u'5fd1e86c589238a4e635e98e',
                    'catalog_id': u'5fd1e86c589238a4e635e98d'
                }
         ]
    }