Relationship

class datarobot.helpers.feature_discovery.Relationship(dataset2_identifier, dataset1_keys, dataset2_keys, dataset1_identifier=None, feature_derivation_window_start=None, feature_derivation_window_end=None, feature_derivation_window_time_unit=None, feature_derivation_windows=None, prediction_point_rounding=None, prediction_point_rounding_time_unit=None)

Relationship between dataset defined in DatasetDefinition

Added in version v2.25.

Examples

import datarobot as dr
relationship = dr.Relationship(
    dataset1_identifier='profile',
    dataset2_identifier='transaction',
    dataset1_keys=['CustomerID'],
    dataset2_keys=['CustomerID']
)

relationship = dr.Relationship(
    dataset2_identifier='profile',
    dataset1_keys=['CustomerID'],
    dataset2_keys=['CustomerID'],
    feature_derivation_window_start=-14,
    feature_derivation_window_end=-1,
    feature_derivation_window_time_unit='DAY',
    prediction_point_rounding=1,
    prediction_point_rounding_time_unit='DAY'
)
Attributes:
dataset1_identifier: string, optional

Identifier of the first dataset in this relationship. This is specified in the identifier field of dataset_definition structure. If None, then the relationship is with the primary dataset.

dataset2_identifier: string

Identifier of the second dataset in this relationship. This is specified in the identifier field of dataset_definition schema.

dataset1_keys: list of string (max length: 10 min length: 1)

Column(s) from the first dataset which are used to join to the second dataset

dataset2_keys: list of string (max length: 10 min length: 1)

Column(s) from the second dataset that are used to join to the first dataset

feature_derivation_window_start: int, or None

How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_end: int, optional

How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_time_unit: int, optional

Time unit of the feature derivation window. One of datarobot.enums.AllowedTimeUnitsSAFER If present, time-aware joins will be used. Only applicable when dataset1_identifier is not provided.

feature_derivation_windows: list of dict, or None

List of feature derivation windows settings. If present, time-aware joins will be used. Only allowed when feature_derivation_window_start, feature_derivation_window_end and feature_derivation_window_time_unit are not provided.

prediction_point_rounding: int, optional

Closest value of prediction_point_rounding_time_unit to round the prediction point into the past when applying the feature derivation window. Will be a positive integer, if present.Only applicable when dataset1_identifier is not provided.

prediction_point_rounding_time_unit: string, optional

Time unit of the prediction point rounding. One of datarobot.enums.AllowedTimeUnitsSAFER Only applicable when dataset1_identifier is not provided.

The `feature_derivation_windows` is a list of dictionary with schema:
start: int

How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin.

end: int

How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end.

unit: string

Time unit of the feature derivation window. One of datarobot.enums.AllowedTimeUnitsSAFER.

Relationships Configuration

class datarobot.models.RelationshipsConfiguration(id, dataset_definitions=None, relationships=None, feature_discovery_mode=None, feature_discovery_settings=None)

A Relationships configuration specifies a set of secondary datasets as well as the relationships among them. It is used to configure Feature Discovery for a project to generate features automatically from these datasets.

Attributes:
idstring

Id of the created relationships configuration

dataset_definitions: list

Each element is a dataset_definitions for a dataset.

relationships: list

Each element is a relationship between two datasets

feature_discovery_mode: str

Mode of feature discovery. Supported values are ‘default’ and ‘manual’

feature_discovery_settings: list

List of feature discovery settings used to customize the feature discovery process

The `dataset_definitions` structure is
identifier: string

Alias of the dataset (used directly as part of the generated feature names)

catalog_id: str, or None

Identifier of the catalog item

catalog_version_id: str

Identifier of the catalog item version

primary_temporal_key: string, optional

Name of the column indicating time of record creation

feature_list_id: string, optional

Identifier of the feature list. This decides which columns in the dataset are used for feature generation

snapshot_policy: str

Policy to use when creating a project or making predictions. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)

feature_lists: list

List of feature list info

data_source: dict

Data source info if the dataset is from data source

data_sources: list

List of Data source details for a JDBC datasets

is_deleted: bool, optional

Whether the dataset is deleted or not

The `data source info` structured is
data_store_id: str

Id of the data store.

data_store_namestr

User-friendly name of the data store.

urlstr

Url used to connect to the data store.

dbtablestr

Name of table from the data store.

schema: str

Schema definition of the table from the data store

catalog: str

Catalog name of the data source.

The `feature list info` structure is
idstr

Id of the featurelist

namestr

Name of the featurelist

featureslist of str

Names of all the Features in the featurelist

dataset_idstr

Project the featurelist belongs to

creation_datedatetime.datetime

When the featurelist was created

user_createdbool

Whether the featurelist was created by a user or by DataRobot automation

created_by: str

Name of user who created it

descriptionstr

Description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.

dataset_id: str

Dataset which is associated with the feature list

dataset_version_id: str or None

Version of the dataset which is associated with feature list. Only relevant for Informative features

The `relationships` schema is
dataset1_identifier: str or None

Identifier of the first dataset in this relationship. This is specified in the identifier field of dataset_definition structure. If None, then the relationship is with the primary dataset.

dataset2_identifier: str

Identifier of the second dataset in this relationship. This is specified in the identifier field of dataset_definition schema.

dataset1_keys: list of str (max length: 10 min length: 1)

Column(s) from the first dataset which are used to join to the second dataset

dataset2_keys: list of str (max length: 10 min length: 1)

Column(s) from the second dataset that are used to join to the first dataset

time_unit: str, or None

Time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR. If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_start: int, or None

How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_end: int, or None

How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.

feature_derivation_window_time_unit: int or None

Time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR If present, time-aware joins will be used. Only applicable when dataset1Identifier is not provided.

feature_derivation_windows: list of dict, or None

List of feature derivation windows settings. If present, time-aware joins will be used. Only allowed when feature_derivation_window_start, feature_derivation_window_end and feature_derivation_window_time_unit are not provided.

prediction_point_rounding: int, or None

Closest value of prediction_point_rounding_time_unit to round the prediction point into the past when applying the feature derivation window. Will be a positive integer, if present.Only applicable when dataset1_identifier is not provided.

prediction_point_rounding_time_unit: str, or None

time unit of the prediction point rounding. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR Only applicable when dataset1_identifier is not provided.

The `feature_derivation_windows` is a list of dictionary with schema:
start: int

How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin.

end: int

How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end.

unit: string

Time unit of the feature derivation window. One of datarobot.enums.AllowedTimeUnitsSAFER.

The `feature_discovery_settings` structure is:
name: str

Name of the feature discovery setting

value: bool

Value of the feature discovery setting

To see the list of possible settings, create a RelationshipConfiguration without specifying
settings and check its `feature_discovery_settings` attribute, which is a list of possible
settings with their default values.
classmethod create(dataset_definitions, relationships, feature_discovery_settings=None)

Create a Relationships Configuration

Parameters:
dataset_definitions: list of dataset definitions

Each element is a datarobot.helpers.feature_discovery.DatasetDefinition

relationships: list of relationships

Each element is a datarobot.helpers.feature_discovery.Relationship

feature_discovery_settingslist of feature discovery settings, optional

Each element is a dictionary or a datarobot.helpers.feature_discovery.FeatureDiscoverySetting. If not provided, default settings will be used.

Returns:
relationships_configuration: RelationshipsConfiguration

Created relationships configuration

Examples

dataset_definition = dr.DatasetDefinition(
    identifier='profile',
    catalog_id='5fd06b4af24c641b68e4d88f',
    catalog_version_id='5fd06b4af24c641b68e4d88f'
)
relationship = dr.Relationship(
    dataset2_identifier='profile',
    dataset1_keys=['CustomerID'],
    dataset2_keys=['CustomerID'],
    feature_derivation_window_start=-14,
    feature_derivation_window_end=-1,
    feature_derivation_window_time_unit='DAY',
    prediction_point_rounding=1,
    prediction_point_rounding_time_unit='DAY'
)
dataset_definitions = [dataset_definition]
relationships = [relationship]
relationship_config = dr.RelationshipsConfiguration.create(
    dataset_definitions=dataset_definitions,
    relationships=relationships,
    feature_discovery_settings = [
        {'name': 'enable_categorical_statistics', 'value': True},
        {'name': 'enable_numeric_skewness', 'value': True},
    ]
)
>>> relationship_config.id
'5c88a37770fc42a2fcc62759'
get()

Retrieve the Relationships configuration for a given id

Returns:
relationships_configuration: RelationshipsConfiguration

The requested relationships configuration

Raises:
ClientError

Raised if an invalid relationships config id is provided.

Examples

relationships_config = dr.RelationshipsConfiguration(valid_config_id)
result = relationships_config.get()
>>> result.id
'5c88a37770fc42a2fcc62759'
replace(dataset_definitions, relationships, feature_discovery_settings=None)

Update the Relationships Configuration which is not used in the feature discovery Project

Parameters:
dataset_definitions: list of dataset definition

Each element is a datarobot.helpers.feature_discovery.DatasetDefinition

relationships: list of relationships

Each element is a datarobot.helpers.feature_discovery.Relationship

feature_discovery_settingslist of feature discovery settings, optional

Each element is a dictionary or a datarobot.helpers.feature_discovery.FeatureDiscoverySetting. If not provided, default settings will be used.

Returns:
relationships_configuration: RelationshipsConfiguration

the updated relationships configuration

delete()

Delete the Relationships configuration

Raises:
ClientError

Raised if an invalid relationships config id is provided.

Examples

# Deleting with a valid id
relationships_config = dr.RelationshipsConfiguration(valid_config_id)
status_code = relationships_config.delete()
status_code
>>> 204
relationships_config.get()
>>> ClientError: Relationships Configuration not found