Relationship

class datarobot.helpers.feature_discovery.Relationship

Relationship between dataset defined in DatasetDefinition

Added in version v2.25.

Variables:
  • dataset1_identifier (Optional[str]) – Identifier of the first dataset in this relationship. This is specified in the identifier field of dataset_definition structure. If None, then the relationship is with the primary dataset.

  • dataset2_identifier (str) – Identifier of the second dataset in this relationship. This is specified in the identifier field of dataset_definition schema.

  • dataset1_keys (List[str]) – (max length: 10 min length: 1) Column(s) from the first dataset which are used to join to the second dataset

  • dataset2_keys (List[str]) – (max length: 10 min length: 1) Column(s) from the second dataset that are used to join to the first dataset

  • feature_derivation_window_start (int, or None) – How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.

  • feature_derivation_window_end (Optional[int]) – How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.

  • feature_derivation_window_time_unit (Optional[int]) – Time unit of the feature derivation window. One of datarobot.enums.AllowedTimeUnitsSAFER If present, time-aware joins will be used. Only applicable when dataset1_identifier is not provided.

  • feature_derivation_windows (list of dict, or None) – List of feature derivation windows settings. If present, time-aware joins will be used. Only allowed when feature_derivation_window_start, feature_derivation_window_end and feature_derivation_window_time_unit are not provided.

  • prediction_point_rounding (Optional[int]) – Closest value of prediction_point_rounding_time_unit to round the prediction point into the past when applying the feature derivation window. Will be a positive integer, if present.Only applicable when dataset1_identifier is not provided.

  • prediction_point_rounding_time_unit (Optional[str]) – Time unit of the prediction point rounding. One of datarobot.enums.AllowedTimeUnitsSAFER Only applicable when dataset1_identifier is not provided.

  • schema (The feature_derivation_windows is a list of dictionary with) –

    start: int

    How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin.

    end: int

    How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end.

    unit: str

    Time unit of the feature derivation window. One of datarobot.enums.AllowedTimeUnitsSAFER.

Examples

import datarobot as dr
relationship = dr.Relationship(
    dataset1_identifier='profile',
    dataset2_identifier='transaction',
    dataset1_keys=['CustomerID'],
    dataset2_keys=['CustomerID']
)

relationship = dr.Relationship(
    dataset2_identifier='profile',
    dataset1_keys=['CustomerID'],
    dataset2_keys=['CustomerID'],
    feature_derivation_window_start=-14,
    feature_derivation_window_end=-1,
    feature_derivation_window_time_unit='DAY',
    prediction_point_rounding=1,
    prediction_point_rounding_time_unit='DAY'
)

Relationships configuration

class datarobot.models.RelationshipsConfiguration

A Relationships configuration specifies a set of secondary datasets as well as the relationships among them. It is used to configure Feature Discovery for a project to generate features automatically from these datasets.

Variables:
  • id (str) – Id of the created relationships configuration

  • dataset_definitions (list) – Each element is a dataset_definitions for a dataset.

  • relationships (list) – Each element is a relationship between two datasets

  • feature_discovery_mode (str) – Mode of feature discovery. Supported values are ‘default’ and ‘manual’

  • feature_discovery_settings (list) – List of feature discovery settings used to customize the feature discovery process

  • is (The feature_discovery_settings structure)

  • identifier (str) – Alias of the dataset (used directly as part of the generated feature names)

  • catalog_id (str, or None) – Identifier of the catalog item

  • catalog_version_id (str) – Identifier of the catalog item version

  • primary_temporal_key (Optional[str]) – Name of the column indicating time of record creation

  • feature_list_id (Optional[str]) – Identifier of the feature list. This decides which columns in the dataset are used for feature generation

  • snapshot_policy (str) – Policy to use when creating a project or making predictions. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)

  • feature_lists (list) – List of feature list info

  • data_source (dict) – Data source info if the dataset is from data source

  • data_sources (list) – List of Data source details for a JDBC datasets

  • is_deleted (Optional[bool]) – Whether the dataset is deleted or not

  • is

  • data_store_id (str) – Id of the data store.

  • data_store_name (str) – User-friendly name of the data store.

  • url (str) – Url used to connect to the data store.

  • dbtable (str) – Name of table from the data store.

  • schema (The feature_derivation_windows is a list of dictionary with) – Schema definition of the table from the data store

  • catalog (str) – Catalog name of the data source.

  • is

  • id – Id of the featurelist

  • name (str) – Name of the featurelist

  • features (List[str]) – Names of all the Features in the featurelist

  • dataset_id (str) – Project the featurelist belongs to

  • creation_date (datetime.datetime) – When the featurelist was created

  • user_created (bool) – Whether the featurelist was created by a user or by DataRobot automation

  • created_by (str) – Name of user who created it

  • description (str) – Description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.

  • dataset_id – Dataset which is associated with the feature list

  • dataset_version_id (str or None) – Version of the dataset which is associated with feature list. Only relevant for Informative features

  • is

  • dataset1_identifier (str or None) – Identifier of the first dataset in this relationship. This is specified in the identifier field of dataset_definition structure. If None, then the relationship is with the primary dataset.

  • dataset2_identifier (str) – Identifier of the second dataset in this relationship. This is specified in the identifier field of dataset_definition schema.

  • dataset1_keys (List[str] (max length: 10 min length: 1)) – Column(s) from the first dataset which are used to join to the second dataset

  • dataset2_keys (List[str]) – (max length: 10 min length: 1) Column(s) from the second dataset that are used to join to the first dataset

  • time_unit (str, or None) – Time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR. If present, the feature engineering Graph will perform time-aware joins.

  • feature_derivation_window_start (int, or None) – How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.

  • feature_derivation_window_end (int or None) – How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.

  • feature_derivation_window_time_unit (int or None) – Time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR If present, time-aware joins will be used. Only applicable when dataset1Identifier is not provided.

  • feature_derivation_windows (list of dict, or None) – List of feature derivation windows settings. If present, time-aware joins will be used. Only allowed when feature_derivation_window_start, feature_derivation_window_end and feature_derivation_window_time_unit are not provided.

  • prediction_point_rounding (int, or None) – Closest value of prediction_point_rounding_time_unit to round the prediction point into the past when applying the feature derivation window. Will be a positive integer, if present.Only applicable when dataset1_identifier is not provided.

  • prediction_point_rounding_time_unit (str, or None) – time unit of the prediction point rounding. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR Only applicable when dataset1_identifier is not provided.

  • schema

    start: int

    How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin.

    end: int

    How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end.

    unit: str

    Time unit of the feature derivation window. One of datarobot.enums.AllowedTimeUnitsSAFER.

  • is

  • name – Name of the feature discovery setting

  • value (bool) – Value of the feature discovery setting

  • specifying (To see the list of possible settings, create a RelationshipConfiguration without)

  • possible (settings and check its feature_discovery_settings attribute, which is a list of)

  • values. (settings with their default)

classmethod create(dataset_definitions, relationships, feature_discovery_settings=None)

Create a Relationships Configuration

Parameters:
  • dataset_definitions (list of DatasetDefinition) – Each element is a datarobot.helpers.feature_discovery.DatasetDefinition

  • relationships (list of Relationship) – Each element is a datarobot.helpers.feature_discovery.Relationship

  • feature_discovery_settings (Optional[List[FeatureDiscoverySetting]]) – Each element is a dictionary or a datarobot.helpers.feature_discovery.FeatureDiscoverySetting. If not provided, default settings will be used.

Returns:

relationships_configuration – Created relationships configuration

Return type:

RelationshipsConfiguration

Examples

dataset_definition = dr.DatasetDefinition(
    identifier='profile',
    catalog_id='5fd06b4af24c641b68e4d88f',
    catalog_version_id='5fd06b4af24c641b68e4d88f'
)
relationship = dr.Relationship(
    dataset2_identifier='profile',
    dataset1_keys=['CustomerID'],
    dataset2_keys=['CustomerID'],
    feature_derivation_window_start=-14,
    feature_derivation_window_end=-1,
    feature_derivation_window_time_unit='DAY',
    prediction_point_rounding=1,
    prediction_point_rounding_time_unit='DAY'
)
dataset_definitions = [dataset_definition]
relationships = [relationship]
relationship_config = dr.RelationshipsConfiguration.create(
    dataset_definitions=dataset_definitions,
    relationships=relationships,
    feature_discovery_settings = [
        {'name': 'enable_categorical_statistics', 'value': True},
        {'name': 'enable_numeric_skewness', 'value': True},
    ]
)
>>> relationship_config.id
'5c88a37770fc42a2fcc62759'
get()

Retrieve the Relationships configuration for a given id

Returns:

relationships_configuration – The requested relationships configuration

Return type:

RelationshipsConfiguration

Raises:

ClientError – Raised if an invalid relationships config id is provided.

Examples

relationships_config = dr.RelationshipsConfiguration(valid_config_id)
result = relationships_config.get()
>>> result.id
'5c88a37770fc42a2fcc62759'
replace(dataset_definitions, relationships, feature_discovery_settings=None)

Update the Relationships Configuration which is not used in the feature discovery Project

Parameters:
  • dataset_definitions (List[DatasetDefinition]) – Each element is a datarobot.helpers.feature_discovery.DatasetDefinition

  • relationships (List[Relationship]) – Each element is a datarobot.helpers.feature_discovery.Relationship

  • feature_discovery_settings (Optional[List[FeatureDiscoverySetting]]) – Each element is a dictionary or a datarobot.helpers.feature_discovery.FeatureDiscoverySetting. If not provided, default settings will be used.

Returns:

relationships_configuration – the updated relationships configuration

Return type:

RelationshipsConfiguration

delete()

Delete the Relationships configuration

Raises:

ClientError – Raised if an invalid relationships config id is provided.

Examples

# Deleting with a valid id
relationships_config = dr.RelationshipsConfiguration(valid_config_id)
status_code = relationships_config.delete()
status_code
>>> 204
relationships_config.get()
>>> ClientError: Relationships Configuration not found