Secondary datasets

class datarobot.helpers.feature_discovery.SecondaryDataset

A secondary dataset to be used for feature discovery

Added in version v2.25.

Variables:
  • identifier (str) – Alias of the dataset (used directly as part of the generated feature names)

  • catalog_id (str) – Identifier of the catalog item

  • catalog_version_id (str) – Identifier of the catalog item version

  • snapshot_policy (Optional[str]) – Policy to use while creating a project or making predictions. If omitted, by default endpoint will use ‘latest’. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)

Examples

import datarobot as dr
dataset_definition = dr.SecondaryDataset(
    identifier='profile',
    catalog_id='5ec4aec1f072bc028e3471ae',
    catalog_version_id='5ec4aec2f072bc028e3471b1',
)

Secondary dataset configurations

class datarobot.models.SecondaryDatasetConfigurations

Create secondary dataset configurations for a given project

Added in version v2.20.

Variables:
  • id (str) – Id of this secondary dataset configuration

  • project_id (str) – Id of the associated project.

  • config (list of DatasetConfiguration (Deprecated in version v2.23)) – List of secondary dataset configurations

  • secondary_datasets (list of SecondaryDataset (new in v2.23)) – List of secondary datasets (secondaryDataset)

  • name (str) – Verbose name of the SecondaryDatasetConfig. null if it wasn’t specified.

  • created (datetime.datetime) – DR-formatted datetime. null for legacy (before DR 6.0) db records.

  • creator_user_id (str) – Id of the user created this config.

  • creator_full_name (str) – fullname or email of the user created this config.

  • featurelist_id (Optional[str]) – Id of the feature list. null if it wasn’t specified.

  • credential_ids (Optional[list of DatasetsCredentials]) – credentials used by the secondary datasets if the datasets used in the configuration are from datasource

  • is_default (Optional[bool]) – Boolean flag if default config created during feature discovery aim

  • project_version (Optional[str]) – Version of project when its created (Release version)

classmethod create(project_id, secondary_datasets, name, featurelist_id=None)

create secondary dataset configurations

Added in version v2.20.

Parameters:
  • project_id (str) – id of the associated project.

  • secondary_datasets (list of SecondaryDataset (New in version v2.23)) – list of secondary datasets used by the configuration each element is a datarobot.helpers.feature_discovery.SecondaryDataset

  • name (str (New in version v2.23)) – Name of the secondary datasets configuration

  • featurelist_id (str, or None (New in version v2.23)) – Id of the featurelist

Return type:

an instance of SecondaryDatasetConfigurations

Raises:

ClientError – raised if incorrect configuration parameters are provided

Examples

   profile_secondary_dataset = dr.SecondaryDataset(
       identifier='profile',
       catalog_id='5ec4aec1f072bc028e3471ae',
       catalog_version_id='5ec4aec2f072bc028e3471b1',
       snapshot_policy='latest'
   )

   transaction_secondary_dataset = dr.SecondaryDataset(
       identifier='transaction',
       catalog_id='5ec4aec268f0f30289a03901',
       catalog_version_id='5ec4aec268f0f30289a03900',
       snapshot_policy='latest'
   )

   secondary_datasets = [profile_secondary_dataset, transaction_secondary_dataset]
   new_secondary_dataset_config = dr.SecondaryDatasetConfigurations.create(
       project_id=project.id,
       name='My config',
       secondary_datasets=secondary_datasets
   )

>>> new_secondary_dataset_config.id
'5fd1e86c589238a4e635e93d'
delete()

Removes the Secondary datasets configuration :rtype: None

Added in version v2.21.

Raises:

ClientError – Raised if an invalid or already deleted secondary dataset config id is provided

Examples

# Deleting with a valid secondary_dataset_config id
status_code = dr.SecondaryDatasetConfigurations.delete(some_config_id)
status_code
>>> 204
get()

Retrieve a single secondary dataset configuration for a given id

Added in version v2.21.

Returns:

secondary_dataset_configurations – The requested secondary dataset configurations

Return type:

SecondaryDatasetConfigurations

Examples

config_id = '5fd1e86c589238a4e635e93d'
secondary_dataset_config = dr.SecondaryDatasetConfigurations(id=config_id).get()
>>> secondary_dataset_config
{
     'created': datetime.datetime(2020, 12, 9, 6, 16, 22, tzinfo=tzutc()),
     'creator_full_name': u'[email protected]',
     'creator_user_id': u'asdf4af1gf4bdsd2fba1de0a',
     'credential_ids': None,
     'featurelist_id': None,
     'id': u'5fd1e86c589238a4e635e93d',
     'is_default': True,
     'name': u'My config',
     'project_id': u'5fd06afce2456ec1e9d20457',
     'project_version': None,
     'secondary_datasets': [
            {
                'snapshot_policy': u'latest',
                'identifier': u'profile',
                'catalog_version_id': u'5fd06b4af24c641b68e4d88f',
                'catalog_id': u'5fd06b4af24c641b68e4d88e'
            },
            {
                'snapshot_policy': u'dynamic',
                'identifier': u'transaction',
                'catalog_version_id': u'5fd1e86c589238a4e635e98e',
                'catalog_id': u'5fd1e86c589238a4e635e98d'
            }
     ]
}
classmethod list(project_id, featurelist_id=None, limit=None, offset=None)

Returns list of secondary dataset configurations.

Added in version v2.23.

Parameters:
  • project_id (str) – The Id of project

  • featurelist_id (Optional[str]) – Id of the feature list to filter the secondary datasets configurations

Returns:

secondary_dataset_configurations – The requested list of secondary dataset configurations for a given project

Return type:

list of SecondaryDatasetConfigurations

Examples

pid = '5fd06afce2456ec1e9d20457'
secondary_dataset_configs = dr.SecondaryDatasetConfigurations.list(pid)
>>> secondary_dataset_configs[0]
    {
         'created': datetime.datetime(2020, 12, 9, 6, 16, 22, tzinfo=tzutc()),
         'creator_full_name': u'[email protected]',
         'creator_user_id': u'asdf4af1gf4bdsd2fba1de0a',
         'credential_ids': None,
         'featurelist_id': None,
         'id': u'5fd1e86c589238a4e635e93d',
         'is_default': True,
         'name': u'My config',
         'project_id': u'5fd06afce2456ec1e9d20457',
         'project_version': None,
         'secondary_datasets': [
                {
                    'snapshot_policy': u'latest',
                    'identifier': u'profile',
                    'catalog_version_id': u'5fd06b4af24c641b68e4d88f',
                    'catalog_id': u'5fd06b4af24c641b68e4d88e'
                },
                {
                    'snapshot_policy': u'dynamic',
                    'identifier': u'transaction',
                    'catalog_version_id': u'5fd1e86c589238a4e635e98e',
                    'catalog_id': u'5fd1e86c589238a4e635e98d'
                }
         ]
    }