Relationship
- class datarobot.helpers.feature_discovery.Relationship(dataset2_identifier, dataset1_keys, dataset2_keys, dataset1_identifier=None, feature_derivation_window_start=None, feature_derivation_window_end=None, feature_derivation_window_time_unit=None, feature_derivation_windows=None, prediction_point_rounding=None, prediction_point_rounding_time_unit=None)
Relationship between dataset defined in DatasetDefinition
Added in version v2.25.
Examples
import datarobot as dr relationship = dr.Relationship( dataset1_identifier='profile', dataset2_identifier='transaction', dataset1_keys=['CustomerID'], dataset2_keys=['CustomerID'] ) relationship = dr.Relationship( dataset2_identifier='profile', dataset1_keys=['CustomerID'], dataset2_keys=['CustomerID'], feature_derivation_window_start=-14, feature_derivation_window_end=-1, feature_derivation_window_time_unit='DAY', prediction_point_rounding=1, prediction_point_rounding_time_unit='DAY' )
- Attributes:
- dataset1_identifier: string, optional
Identifier of the first dataset in this relationship. This is specified in the identifier field of dataset_definition structure. If None, then the relationship is with the primary dataset.
- dataset2_identifier: string
Identifier of the second dataset in this relationship. This is specified in the identifier field of dataset_definition schema.
- dataset1_keys: list of string (max length: 10 min length: 1)
Column(s) from the first dataset which are used to join to the second dataset
- dataset2_keys: list of string (max length: 10 min length: 1)
Column(s) from the second dataset that are used to join to the first dataset
- feature_derivation_window_start: int, or None
How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.
- feature_derivation_window_end: int, optional
How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.
- feature_derivation_window_time_unit: int, optional
Time unit of the feature derivation window. One of
datarobot.enums.AllowedTimeUnitsSAFER
If present, time-aware joins will be used. Only applicable when dataset1_identifier is not provided.- feature_derivation_windows: list of dict, or None
List of feature derivation windows settings. If present, time-aware joins will be used. Only allowed when feature_derivation_window_start, feature_derivation_window_end and feature_derivation_window_time_unit are not provided.
- prediction_point_rounding: int, optional
Closest value of prediction_point_rounding_time_unit to round the prediction point into the past when applying the feature derivation window. Will be a positive integer, if present.Only applicable when dataset1_identifier is not provided.
- prediction_point_rounding_time_unit: string, optional
Time unit of the prediction point rounding. One of
datarobot.enums.AllowedTimeUnitsSAFER
Only applicable when dataset1_identifier is not provided.- The `feature_derivation_windows` is a list of dictionary with schema:
- start: int
How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin.
- end: int
How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end.
- unit: string
Time unit of the feature derivation window. One of
datarobot.enums.AllowedTimeUnitsSAFER
.
Relationships Configuration
- class datarobot.models.RelationshipsConfiguration(id, dataset_definitions=None, relationships=None, feature_discovery_mode=None, feature_discovery_settings=None)
A Relationships configuration specifies a set of secondary datasets as well as the relationships among them. It is used to configure Feature Discovery for a project to generate features automatically from these datasets.
- Attributes:
- idstring
Id of the created relationships configuration
- dataset_definitions: list
Each element is a dataset_definitions for a dataset.
- relationships: list
Each element is a relationship between two datasets
- feature_discovery_mode: str
Mode of feature discovery. Supported values are ‘default’ and ‘manual’
- feature_discovery_settings: list
List of feature discovery settings used to customize the feature discovery process
- The `dataset_definitions` structure is
- identifier: string
Alias of the dataset (used directly as part of the generated feature names)
- catalog_id: str, or None
Identifier of the catalog item
- catalog_version_id: str
Identifier of the catalog item version
- primary_temporal_key: string, optional
Name of the column indicating time of record creation
- feature_list_id: string, optional
Identifier of the feature list. This decides which columns in the dataset are used for feature generation
- snapshot_policy: str
Policy to use when creating a project or making predictions. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)
- feature_lists: list
List of feature list info
- data_source: dict
Data source info if the dataset is from data source
- data_sources: list
List of Data source details for a JDBC datasets
- is_deleted: bool, optional
Whether the dataset is deleted or not
- The `data source info` structured is
- data_store_id: str
Id of the data store.
- data_store_namestr
User-friendly name of the data store.
- urlstr
Url used to connect to the data store.
- dbtablestr
Name of table from the data store.
- schema: str
Schema definition of the table from the data store
- catalog: str
Catalog name of the data source.
- The `feature list info` structure is
- idstr
Id of the featurelist
- namestr
Name of the featurelist
- featureslist of str
Names of all the Features in the featurelist
- dataset_idstr
Project the featurelist belongs to
- creation_datedatetime.datetime
When the featurelist was created
- user_createdbool
Whether the featurelist was created by a user or by DataRobot automation
- created_by: str
Name of user who created it
- descriptionstr
Description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.
- dataset_id: str
Dataset which is associated with the feature list
- dataset_version_id: str or None
Version of the dataset which is associated with feature list. Only relevant for Informative features
- The `relationships` schema is
- dataset1_identifier: str or None
Identifier of the first dataset in this relationship. This is specified in the identifier field of dataset_definition structure. If None, then the relationship is with the primary dataset.
- dataset2_identifier: str
Identifier of the second dataset in this relationship. This is specified in the identifier field of dataset_definition schema.
- dataset1_keys: list of str (max length: 10 min length: 1)
Column(s) from the first dataset which are used to join to the second dataset
- dataset2_keys: list of str (max length: 10 min length: 1)
Column(s) from the second dataset that are used to join to the first dataset
- time_unit: str, or None
Time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR. If present, the feature engineering Graph will perform time-aware joins.
- feature_derivation_window_start: int, or None
How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.
- feature_derivation_window_end: int, or None
How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.
- feature_derivation_window_time_unit: int or None
Time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR If present, time-aware joins will be used. Only applicable when dataset1Identifier is not provided.
- feature_derivation_windows: list of dict, or None
List of feature derivation windows settings. If present, time-aware joins will be used. Only allowed when feature_derivation_window_start, feature_derivation_window_end and feature_derivation_window_time_unit are not provided.
- prediction_point_rounding: int, or None
Closest value of prediction_point_rounding_time_unit to round the prediction point into the past when applying the feature derivation window. Will be a positive integer, if present.Only applicable when dataset1_identifier is not provided.
- prediction_point_rounding_time_unit: str, or None
time unit of the prediction point rounding. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR Only applicable when dataset1_identifier is not provided.
- The `feature_derivation_windows` is a list of dictionary with schema:
- start: int
How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin.
- end: int
How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end.
- unit: string
Time unit of the feature derivation window. One of
datarobot.enums.AllowedTimeUnitsSAFER
.
- The `feature_discovery_settings` structure is:
- name: str
Name of the feature discovery setting
- value: bool
Value of the feature discovery setting
- To see the list of possible settings, create a RelationshipConfiguration without specifying
- settings and check its `feature_discovery_settings` attribute, which is a list of possible
- settings with their default values.
- classmethod create(dataset_definitions, relationships, feature_discovery_settings=None)
Create a Relationships Configuration
- Parameters:
- dataset_definitions: list of dataset definitions
Each element is a
datarobot.helpers.feature_discovery.DatasetDefinition
- relationships: list of relationships
Each element is a
datarobot.helpers.feature_discovery.Relationship
- feature_discovery_settingslist of feature discovery settings, optional
Each element is a dictionary or a
datarobot.helpers.feature_discovery.FeatureDiscoverySetting
. If not provided, default settings will be used.
- Returns:
- relationships_configuration: RelationshipsConfiguration
Created relationships configuration
Examples
dataset_definition = dr.DatasetDefinition( identifier='profile', catalog_id='5fd06b4af24c641b68e4d88f', catalog_version_id='5fd06b4af24c641b68e4d88f' ) relationship = dr.Relationship( dataset2_identifier='profile', dataset1_keys=['CustomerID'], dataset2_keys=['CustomerID'], feature_derivation_window_start=-14, feature_derivation_window_end=-1, feature_derivation_window_time_unit='DAY', prediction_point_rounding=1, prediction_point_rounding_time_unit='DAY' ) dataset_definitions = [dataset_definition] relationships = [relationship] relationship_config = dr.RelationshipsConfiguration.create( dataset_definitions=dataset_definitions, relationships=relationships, feature_discovery_settings = [ {'name': 'enable_categorical_statistics', 'value': True}, {'name': 'enable_numeric_skewness', 'value': True}, ] ) >>> relationship_config.id '5c88a37770fc42a2fcc62759'
- get()
Retrieve the Relationships configuration for a given id
- Returns:
- relationships_configuration: RelationshipsConfiguration
The requested relationships configuration
- Raises:
- ClientError
Raised if an invalid relationships config id is provided.
Examples
relationships_config = dr.RelationshipsConfiguration(valid_config_id) result = relationships_config.get() >>> result.id '5c88a37770fc42a2fcc62759'
- replace(dataset_definitions, relationships, feature_discovery_settings=None)
Update the Relationships Configuration which is not used in the feature discovery Project
- Parameters:
- dataset_definitions: list of dataset definition
Each element is a
datarobot.helpers.feature_discovery.DatasetDefinition
- relationships: list of relationships
Each element is a
datarobot.helpers.feature_discovery.Relationship
- feature_discovery_settingslist of feature discovery settings, optional
Each element is a dictionary or a
datarobot.helpers.feature_discovery.FeatureDiscoverySetting
. If not provided, default settings will be used.
- Returns:
- relationships_configuration: RelationshipsConfiguration
the updated relationships configuration
- delete()
Delete the Relationships configuration
- Raises:
- ClientError
Raised if an invalid relationships config id is provided.
Examples
# Deleting with a valid id relationships_config = dr.RelationshipsConfiguration(valid_config_id) status_code = relationships_config.delete() status_code >>> 204 relationships_config.get() >>> ClientError: Relationships Configuration not found