Relationship
- class datarobot.helpers.feature_discovery.Relationship
Relationship between dataset defined in DatasetDefinition
Added in version v2.25.
- Variables:
dataset1_identifier (
Optional[str]
) – Identifier of the first dataset in this relationship. This is specified in the identifier field of dataset_definition structure. If None, then the relationship is with the primary dataset.dataset2_identifier (
str
) – Identifier of the second dataset in this relationship. This is specified in the identifier field of dataset_definition schema.dataset1_keys (
List[str]
) – (max length: 10 min length: 1) Column(s) from the first dataset which are used to join to the second datasetdataset2_keys (
List[str]
) – (max length: 10 min length: 1) Column(s) from the second dataset that are used to join to the first datasetfeature_derivation_window_start (
int
, orNone
) – How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.feature_derivation_window_end (
Optional[int]
) – How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.feature_derivation_window_time_unit (
Optional[int]
) – Time unit of the feature derivation window. One ofdatarobot.enums.AllowedTimeUnitsSAFER
If present, time-aware joins will be used. Only applicable when dataset1_identifier is not provided.feature_derivation_windows (
list
ofdict
, orNone
) – List of feature derivation windows settings. If present, time-aware joins will be used. Only allowed when feature_derivation_window_start, feature_derivation_window_end and feature_derivation_window_time_unit are not provided.prediction_point_rounding (
Optional[int]
) – Closest value of prediction_point_rounding_time_unit to round the prediction point into the past when applying the feature derivation window. Will be a positive integer, if present.Only applicable when dataset1_identifier is not provided.prediction_point_rounding_time_unit (
Optional[str]
) – Time unit of the prediction point rounding. One ofdatarobot.enums.AllowedTimeUnitsSAFER
Only applicable when dataset1_identifier is not provided.schema (The feature_derivation_windows is a list of dictionary with) –
- start: int
How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin.
- end: int
How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end.
- unit: str
Time unit of the feature derivation window. One of
datarobot.enums.AllowedTimeUnitsSAFER
.
Examples
import datarobot as dr relationship = dr.Relationship( dataset1_identifier='profile', dataset2_identifier='transaction', dataset1_keys=['CustomerID'], dataset2_keys=['CustomerID'] ) relationship = dr.Relationship( dataset2_identifier='profile', dataset1_keys=['CustomerID'], dataset2_keys=['CustomerID'], feature_derivation_window_start=-14, feature_derivation_window_end=-1, feature_derivation_window_time_unit='DAY', prediction_point_rounding=1, prediction_point_rounding_time_unit='DAY' )
Relationships configuration
- class datarobot.models.RelationshipsConfiguration
A Relationships configuration specifies a set of secondary datasets as well as the relationships among them. It is used to configure Feature Discovery for a project to generate features automatically from these datasets.
- Variables:
id (
str
) – Id of the created relationships configurationdataset_definitions (
list
) – Each element is a dataset_definitions for a dataset.relationships (
list
) – Each element is a relationship between two datasetsfeature_discovery_mode (
str
) – Mode of feature discovery. Supported values are ‘default’ and ‘manual’feature_discovery_settings (
list
) – List of feature discovery settings used to customize the feature discovery processis (The feature_discovery_settings structure)
identifier (
str
) – Alias of the dataset (used directly as part of the generated feature names)catalog_id (
str
, orNone
) – Identifier of the catalog itemcatalog_version_id (
str
) – Identifier of the catalog item versionprimary_temporal_key (
Optional[str]
) – Name of the column indicating time of record creationfeature_list_id (
Optional[str]
) – Identifier of the feature list. This decides which columns in the dataset are used for feature generationsnapshot_policy (
str
) – Policy to use when creating a project or making predictions. Must be one of the following values: ‘specified’: Use specific snapshot specified by catalogVersionId ‘latest’: Use latest snapshot from the same catalog item ‘dynamic’: Get data from the source (only applicable for JDBC datasets)feature_lists (
list
) – List of feature list infodata_source (
dict
) – Data source info if the dataset is from data sourcedata_sources (
list
) – List of Data source details for a JDBC datasetsis_deleted (
Optional[bool]
) – Whether the dataset is deleted or notis
data_store_id (
str
) – Id of the data store.data_store_name (
str
) – User-friendly name of the data store.url (
str
) – Url used to connect to the data store.dbtable (
str
) – Name of table from the data store.schema (The feature_derivation_windows is a list of dictionary with) – Schema definition of the table from the data store
catalog (
str
) – Catalog name of the data source.is
id – Id of the featurelist
name (
str
) – Name of the featurelistfeatures (
List[str]
) – Names of all the Features in the featurelistdataset_id (
str
) – Project the featurelist belongs tocreation_date (
datetime.datetime
) – When the featurelist was createduser_created (
bool
) – Whether the featurelist was created by a user or by DataRobot automationcreated_by (
str
) – Name of user who created itdescription (
str
) – Description of the featurelist. Can be updated by the user and may be supplied by default for DataRobot-created featurelists.dataset_id – Dataset which is associated with the feature list
dataset_version_id (
str
orNone
) – Version of the dataset which is associated with feature list. Only relevant for Informative featuresis
dataset1_identifier (
str
orNone
) – Identifier of the first dataset in this relationship. This is specified in the identifier field of dataset_definition structure. If None, then the relationship is with the primary dataset.dataset2_identifier (
str
) – Identifier of the second dataset in this relationship. This is specified in the identifier field of dataset_definition schema.dataset1_keys (
List[str] (max length
:10 min length
:1)
) – Column(s) from the first dataset which are used to join to the second datasetdataset2_keys (
List[str]
) – (max length: 10 min length: 1) Column(s) from the second dataset that are used to join to the first datasettime_unit (
str
, orNone
) – Time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR. If present, the feature engineering Graph will perform time-aware joins.feature_derivation_window_start (
int
, orNone
) – How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin. Will be a negative integer, If present, the feature engineering Graph will perform time-aware joins.feature_derivation_window_end (
int
orNone
) – How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end. Will be a non-positive integer, if present. If present, the feature engineering Graph will perform time-aware joins.feature_derivation_window_time_unit (
int
orNone
) – Time unit of the feature derivation window. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR If present, time-aware joins will be used. Only applicable when dataset1Identifier is not provided.feature_derivation_windows (
list
ofdict
, orNone
) – List of feature derivation windows settings. If present, time-aware joins will be used. Only allowed when feature_derivation_window_start, feature_derivation_window_end and feature_derivation_window_time_unit are not provided.prediction_point_rounding (
int
, orNone
) – Closest value of prediction_point_rounding_time_unit to round the prediction point into the past when applying the feature derivation window. Will be a positive integer, if present.Only applicable when dataset1_identifier is not provided.prediction_point_rounding_time_unit (
str
, orNone
) – time unit of the prediction point rounding. Supported values are MILLISECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR Only applicable when dataset1_identifier is not provided.schema –
- start: int
How many time_units of each dataset’s primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should begin.
- end: int
How many timeUnits of each dataset’s record primary temporal key into the past relative to the datetimePartitionColumn the feature derivation window should end.
- unit: str
Time unit of the feature derivation window. One of
datarobot.enums.AllowedTimeUnitsSAFER
.
is
name – Name of the feature discovery setting
value (
bool
) – Value of the feature discovery settingspecifying (To see the list of possible settings, create a RelationshipConfiguration without)
possible (settings and check its feature_discovery_settings attribute, which is a list of)
values. (settings with their default)
- classmethod create(dataset_definitions, relationships, feature_discovery_settings=None)
Create a Relationships Configuration
- Parameters:
dataset_definitions (
list
ofDatasetDefinition
) – Each element is adatarobot.helpers.feature_discovery.DatasetDefinition
relationships (
list
ofRelationship
) – Each element is adatarobot.helpers.feature_discovery.Relationship
feature_discovery_settings (
Optional[List[FeatureDiscoverySetting]]
) – Each element is a dictionary or adatarobot.helpers.feature_discovery.FeatureDiscoverySetting
. If not provided, default settings will be used.
- Returns:
relationships_configuration – Created relationships configuration
- Return type:
Examples
dataset_definition = dr.DatasetDefinition( identifier='profile', catalog_id='5fd06b4af24c641b68e4d88f', catalog_version_id='5fd06b4af24c641b68e4d88f' ) relationship = dr.Relationship( dataset2_identifier='profile', dataset1_keys=['CustomerID'], dataset2_keys=['CustomerID'], feature_derivation_window_start=-14, feature_derivation_window_end=-1, feature_derivation_window_time_unit='DAY', prediction_point_rounding=1, prediction_point_rounding_time_unit='DAY' ) dataset_definitions = [dataset_definition] relationships = [relationship] relationship_config = dr.RelationshipsConfiguration.create( dataset_definitions=dataset_definitions, relationships=relationships, feature_discovery_settings = [ {'name': 'enable_categorical_statistics', 'value': True}, {'name': 'enable_numeric_skewness', 'value': True}, ] ) >>> relationship_config.id '5c88a37770fc42a2fcc62759'
- get()
Retrieve the Relationships configuration for a given id
- Returns:
relationships_configuration – The requested relationships configuration
- Return type:
- Raises:
ClientError – Raised if an invalid relationships config id is provided.
Examples
relationships_config = dr.RelationshipsConfiguration(valid_config_id) result = relationships_config.get() >>> result.id '5c88a37770fc42a2fcc62759'
- replace(dataset_definitions, relationships, feature_discovery_settings=None)
Update the Relationships Configuration which is not used in the feature discovery Project
- Parameters:
dataset_definitions (
List[DatasetDefinition]
) – Each element is adatarobot.helpers.feature_discovery.DatasetDefinition
relationships (
List[Relationship]
) – Each element is adatarobot.helpers.feature_discovery.Relationship
feature_discovery_settings (
Optional[List[FeatureDiscoverySetting]]
) – Each element is a dictionary or adatarobot.helpers.feature_discovery.FeatureDiscoverySetting
. If not provided, default settings will be used.
- Returns:
relationships_configuration – the updated relationships configuration
- Return type:
- delete()
Delete the Relationships configuration
- Raises:
ClientError – Raised if an invalid relationships config id is provided.
Examples
# Deleting with a valid id relationships_config = dr.RelationshipsConfiguration(valid_config_id) status_code = relationships_config.delete() status_code >>> 204 relationships_config.get() >>> ClientError: Relationships Configuration not found