Secondary datasets
- class datarobot.helpers.feature_discovery.SecondaryDataset
A secondary dataset to be used for feature discovery
Added in version v2.25.
- Variables:
identifier (
str
) – Alias of the dataset (used directly as part of the generated feature names)catalog_id (
str
) – Identifier of the catalog itemcatalog_version_id (
str
) – Identifier of the catalog item versionsnapshot_policy (
Optional[str]
) – Policy to use while creating a project or making predictions. If omitted, by default endpoint will use ‘latest’. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)
Examples
import datarobot as dr dataset_definition = dr.SecondaryDataset( identifier='profile', catalog_id='5ec4aec1f072bc028e3471ae', catalog_version_id='5ec4aec2f072bc028e3471b1', )
Secondary dataset configurations
- class datarobot.models.SecondaryDatasetConfigurations
Create secondary dataset configurations for a given project
Added in version v2.20.
- Variables:
id (
str
) – Id of this secondary dataset configurationproject_id (
str
) – Id of the associated project.config (
list
ofDatasetConfiguration (Deprecated in version v2.23)
) – List of secondary dataset configurationssecondary_datasets (
list
ofSecondaryDataset (new in v2.23)
) – List of secondary datasets (secondaryDataset)name (
str
) – Verbose name of the SecondaryDatasetConfig. null if it wasn’t specified.created (
datetime.datetime
) – DR-formatted datetime. null for legacy (before DR 6.0) db records.creator_user_id (
str
) – Id of the user created this config.creator_full_name (
str
) – fullname or email of the user created this config.featurelist_id (
Optional[str]
) – Id of the feature list. null if it wasn’t specified.credential_ids (
Optional[list
ofDatasetsCredentials]
) – credentials used by the secondary datasets if the datasets used in the configuration are from datasourceis_default (
Optional[bool]
) – Boolean flag if default config created during feature discovery aimproject_version (
Optional[str]
) – Version of project when its created (Release version)
- classmethod create(project_id, secondary_datasets, name, featurelist_id=None)
create secondary dataset configurations
Added in version v2.20.
- Parameters:
project_id (
str
) – id of the associated project.secondary_datasets (
list
ofSecondaryDataset (New in version v2.23)
) – list of secondary datasets used by the configuration each element is adatarobot.helpers.feature_discovery.SecondaryDataset
name (
str (New in version v2.23)
) – Name of the secondary datasets configurationfeaturelist_id (
str
, orNone (New in version v2.23)
) – Id of the featurelist
- Return type:
an instance
ofSecondaryDatasetConfigurations
- Raises:
ClientError – raised if incorrect configuration parameters are provided
Examples
profile_secondary_dataset = dr.SecondaryDataset( identifier='profile', catalog_id='5ec4aec1f072bc028e3471ae', catalog_version_id='5ec4aec2f072bc028e3471b1', snapshot_policy='latest' ) transaction_secondary_dataset = dr.SecondaryDataset( identifier='transaction', catalog_id='5ec4aec268f0f30289a03901', catalog_version_id='5ec4aec268f0f30289a03900', snapshot_policy='latest' ) secondary_datasets = [profile_secondary_dataset, transaction_secondary_dataset] new_secondary_dataset_config = dr.SecondaryDatasetConfigurations.create( project_id=project.id, name='My config', secondary_datasets=secondary_datasets ) >>> new_secondary_dataset_config.id '5fd1e86c589238a4e635e93d'
- delete()
Removes the Secondary datasets configuration :rtype:
None
Added in version v2.21.
- Raises:
ClientError – Raised if an invalid or already deleted secondary dataset config id is provided
Examples
# Deleting with a valid secondary_dataset_config id status_code = dr.SecondaryDatasetConfigurations.delete(some_config_id) status_code >>> 204
- get()
Retrieve a single secondary dataset configuration for a given id
Added in version v2.21.
- Returns:
secondary_dataset_configurations – The requested secondary dataset configurations
- Return type:
Examples
config_id = '5fd1e86c589238a4e635e93d' secondary_dataset_config = dr.SecondaryDatasetConfigurations(id=config_id).get() >>> secondary_dataset_config { 'created': datetime.datetime(2020, 12, 9, 6, 16, 22, tzinfo=tzutc()), 'creator_full_name': u'[email protected]', 'creator_user_id': u'asdf4af1gf4bdsd2fba1de0a', 'credential_ids': None, 'featurelist_id': None, 'id': u'5fd1e86c589238a4e635e93d', 'is_default': True, 'name': u'My config', 'project_id': u'5fd06afce2456ec1e9d20457', 'project_version': None, 'secondary_datasets': [ { 'snapshot_policy': u'latest', 'identifier': u'profile', 'catalog_version_id': u'5fd06b4af24c641b68e4d88f', 'catalog_id': u'5fd06b4af24c641b68e4d88e' }, { 'snapshot_policy': u'dynamic', 'identifier': u'transaction', 'catalog_version_id': u'5fd1e86c589238a4e635e98e', 'catalog_id': u'5fd1e86c589238a4e635e98d' } ] }
- classmethod list(project_id, featurelist_id=None, limit=None, offset=None)
Returns list of secondary dataset configurations.
Added in version v2.23.
- Parameters:
project_id (
str
) – The Id of projectfeaturelist_id (
Optional[str]
) – Id of the feature list to filter the secondary datasets configurations
- Returns:
secondary_dataset_configurations – The requested list of secondary dataset configurations for a given project
- Return type:
Examples
pid = '5fd06afce2456ec1e9d20457' secondary_dataset_configs = dr.SecondaryDatasetConfigurations.list(pid) >>> secondary_dataset_configs[0] { 'created': datetime.datetime(2020, 12, 9, 6, 16, 22, tzinfo=tzutc()), 'creator_full_name': u'[email protected]', 'creator_user_id': u'asdf4af1gf4bdsd2fba1de0a', 'credential_ids': None, 'featurelist_id': None, 'id': u'5fd1e86c589238a4e635e93d', 'is_default': True, 'name': u'My config', 'project_id': u'5fd06afce2456ec1e9d20457', 'project_version': None, 'secondary_datasets': [ { 'snapshot_policy': u'latest', 'identifier': u'profile', 'catalog_version_id': u'5fd06b4af24c641b68e4d88f', 'catalog_id': u'5fd06b4af24c641b68e4d88e' }, { 'snapshot_policy': u'dynamic', 'identifier': u'transaction', 'catalog_version_id': u'5fd1e86c589238a4e635e98e', 'catalog_id': u'5fd1e86c589238a4e635e98d' } ] }