Relationship

class datarobot.helpers.feature_discovery.Relationship

Relationship between dataset defined in DatasetDefinition

Added in version v2.25.

Variables:

dataset1_identifier (Optional[str]) – Identifier of the first dataset in this relationship. This is specified in the identifier field of dataset_definition structure. If None, then the relationship is with the primary dataset.
dataset2_identifier (str) – Identifier of the second dataset in this relationship. This is specified in the identifier field of dataset_definition schema.
dataset1_keys (List[str]) – (max length: 10 min length: 1) Column(s) from the first dataset which are used to join to the second dataset
dataset2_keys (List[str]) – (max length: 10 min length: 1) Column(s) from the second dataset that are used to join to the first dataset
feature_derivation_window_start (int, or None) – How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.
feature_derivation_window_end (Optional[int]) – How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.
feature_derivation_window_time_unit (Optional[int]) – Time unit of the feature derivation window. One of datarobot.enums.AllowedTimeUnitsSAFER If present, time-aware joins will be used. Only applicable when dataset1_identifier is not provided.
feature_derivation_windows (list of dict, or None) – List of feature derivation windows settings. If present, time-aware joins will be used. Only allowed when feature_derivation_window_start, feature_derivation_window_end and feature_derivation_window_time_unit are not provided.
prediction_point_rounding (Optional[int]) – Closest value of prediction_point_rounding_time_unit to round the prediction point into the past when applying the feature derivation window. Will be a positive integer, if present.Only applicable when dataset1_identifier is not provided.
prediction_point_rounding_time_unit (Optional[str]) – Time unit of the prediction point rounding. One of datarobot.enums.AllowedTimeUnitsSAFER Only applicable when dataset1_identifier is not provided.
schema (The feature_derivation_windows is a list of dictionary with) –

start: int
How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin.

end: int
How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end.

unit: str
Time unit of the feature derivation window. One of datarobot.enums.AllowedTimeUnitsSAFER.

Examples

import datarobot as dr
relationship = dr.Relationship(
    dataset1_identifier='profile',
    dataset2_identifier='transaction',
    dataset1_keys=['CustomerID'],
    dataset2_keys=['CustomerID']
)

relationship = dr.Relationship(
    dataset2_identifier='profile',
    dataset1_keys=['CustomerID'],
    dataset2_keys=['CustomerID'],
    feature_derivation_window_start=-14,
    feature_derivation_window_end=-1,
    feature_derivation_window_time_unit='DAY',
    prediction_point_rounding=1,
    prediction_point_rounding_time_unit='DAY'
)

Relationships configuration

class datarobot.models.RelationshipsConfiguration

A Relationships configuration specifies a set of secondary datasets as well as the relationships among them. It is used to configure Feature Discovery for a project to generate features automatically from these datasets.

Variables:

id (str) – Id of the created relationships configuration
dataset_definitions (list) – Each element is a dataset_definitions for a dataset.
relationships (list) – Each element is a relationship between two datasets
feature_discovery_mode (str) – Mode of feature discovery. Supported values are ‘default’ and ‘manual’
feature_discovery_settings (list) – List of feature discovery settings used to customize the feature discovery process
is (The feature_discovery_settings structure)
identifier (str) – Alias of the dataset (used directly as part of the generated feature names)
catalog_id (str, or None) – Identifier of the catalog item
catalog_version_id (str) – Identifier of the catalog item version
primary_temporal_key (Optional[str]) – Name of the column indicating time of record creation
feature_list_id (Optional[str]) – Identifier of the feature list. This decides which columns in the dataset are used for feature generation
snapshot_policy (str) – Policy to use when creating a project or making predictions. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)
feature_lists (list) – List of feature list info
data_source (dict) – Data source info if the dataset is from data source
data_sources (list) – List of Data source details for a JDBC datasets
is_deleted (Optional[bool]) – Whether the dataset is deleted or not
is
data_store_id (str) – Id of the data store.
data_store_name (str) – User-friendly name of the data store.
url (str) – Url used to connect to the data store.
dbtable (str) – Name of table from the data store.
schema (The feature_derivation_windows is a list of dictionary with) – Schema definition of the table from the data store
catalog (str) – Catalog name of the data source.
is
id – Id of the featurelist
name (str) – Name of the featurelist
features (List[str]) – Names of all the Features in the featurelist
dataset_id (str) – Project the featurelist belongs to
creation_date (datetime.datetime) – When the featurelist was created
user_created (bool) – Whether the featurelist was created by a user or by DataRobot automation
created_by (str) – Name of user who created it
description (str) – Description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.
dataset_id – Dataset which is associated with the feature list
dataset_version_id (str or None) – Version of the dataset which is associated with feature list. Only relevant for Informative features
is
dataset1_identifier (str or None) – Identifier of the first dataset in this relationship. This is specified in the identifier field of dataset_definition structure. If None, then the relationship is with the primary dataset.
dataset2_identifier (str) – Identifier of the second dataset in this relationship. This is specified in the identifier field of dataset_definition schema.
dataset1_keys (List[str] (max length: 10 min length: 1)) – Column(s) from the first dataset which are used to join to the second dataset
dataset2_keys (List[str]) – (max length: 10 min length: 1) Column(s) from the second dataset that are used to join to the first dataset
time_unit (str, or None) – Time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR. If present, the feature engineering Graph will perform time-aware joins.
feature_derivation_window_start (int, or None) – How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.
feature_derivation_window_end (int or None) – How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.
feature_derivation_window_time_unit (int or None) – Time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR If present, time-aware joins will be used. Only applicable when dataset1Identifier is not provided.
feature_derivation_windows (list of dict, or None) – List of feature derivation windows settings. If present, time-aware joins will be used. Only allowed when feature_derivation_window_start, feature_derivation_window_end and feature_derivation_window_time_unit are not provided.
prediction_point_rounding (int, or None) – Closest value of prediction_point_rounding_time_unit to round the prediction point into the past when applying the feature derivation window. Will be a positive integer, if present.Only applicable when dataset1_identifier is not provided.
prediction_point_rounding_time_unit (str, or None) – time unit of the prediction point rounding. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR Only applicable when dataset1_identifier is not provided.
schema –

start: int
How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin.

end: int
How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end.

unit: str
Time unit of the feature derivation window. One of datarobot.enums.AllowedTimeUnitsSAFER.
is
name – Name of the feature discovery setting
value (bool) – Value of the feature discovery setting
specifying (To see the list of possible settings, create a RelationshipConfiguration without)
possible (settings and check its feature_discovery_settings attribute, which is a list of)
values. (settings with their default)

classmethod create(dataset_definitions, relationships, feature_discovery_settings=None)

Create a Relationships Configuration

Parameters:

dataset_definitions (list of DatasetDefinition) – Each element is a datarobot.helpers.feature_discovery.DatasetDefinition
relationships (list of Relationship) – Each element is a datarobot.helpers.feature_discovery.Relationship
feature_discovery_settings (Optional[List[FeatureDiscoverySetting]]) – Each element is a dictionary or a datarobot.helpers.feature_discovery.FeatureDiscoverySetting. If not provided, default settings will be used.

Returns:

relationships_configuration – Created relationships configuration

Return type:

RelationshipsConfiguration

Examples

dataset_definition = dr.DatasetDefinition(
    identifier='profile',
    catalog_id='5fd06b4af24c641b68e4d88f',
    catalog_version_id='5fd06b4af24c641b68e4d88f'
)
relationship = dr.Relationship(
    dataset2_identifier='profile',
    dataset1_keys=['CustomerID'],
    dataset2_keys=['CustomerID'],
    feature_derivation_window_start=-14,
    feature_derivation_window_end=-1,
    feature_derivation_window_time_unit='DAY',
    prediction_point_rounding=1,
    prediction_point_rounding_time_unit='DAY'
)
dataset_definitions = [dataset_definition]
relationships = [relationship]
relationship_config = dr.RelationshipsConfiguration.create(
    dataset_definitions=dataset_definitions,
    relationships=relationships,
    feature_discovery_settings = [
        {'name': 'enable_categorical_statistics', 'value': True},
        {'name': 'enable_numeric_skewness', 'value': True},
    ]
)
>>> relationship_config.id
'5c88a37770fc42a2fcc62759'

get()

Retrieve the Relationships configuration for a given id

Returns:: relationships_configuration – The requested relationships configuration
Return type:: RelationshipsConfiguration
Raises:: ClientError – Raised if an invalid relationships config id is provided.

Examples

relationships_config = dr.RelationshipsConfiguration(valid_config_id)
result = relationships_config.get()
>>> result.id
'5c88a37770fc42a2fcc62759'

replace(dataset_definitions, relationships, feature_discovery_settings=None)

Update the Relationships Configuration which is not used in the feature discovery Project

Parameters:

dataset_definitions (List[DatasetDefinition]) – Each element is a datarobot.helpers.feature_discovery.DatasetDefinition
relationships (List[Relationship]) – Each element is a datarobot.helpers.feature_discovery.Relationship
feature_discovery_settings (Optional[List[FeatureDiscoverySetting]]) – Each element is a dictionary or a datarobot.helpers.feature_discovery.FeatureDiscoverySetting. If not provided, default settings will be used.

Returns:

relationships_configuration – the updated relationships configuration

Return type:

RelationshipsConfiguration

delete()

Delete the Relationships configuration

Raises:: ClientError – Raised if an invalid relationships config id is provided.

Examples

# Deleting with a valid id
relationships_config = dr.RelationshipsConfiguration(valid_config_id)
status_code = relationships_config.delete()
status_code
>>> 204
relationships_config.get()
>>> ClientError: Relationships Configuration not found