Feature lineage
- class datarobot.models.FeatureLineage
Lineage of an automatically engineered feature.
- Variables:
steps (
list
) –list of steps which were applied to build the feature.
steps
structure is:- id - (int)
step id starting with 0.
- step_type: (str)
one of the data/action/json/generatedData.
- name: (str)
name of the step.
- description: (str)
description of the step.
- parents: (list[int])
references to other steps id.
- is_time_aware: (bool)
indicator of step being time aware. Mandatory only for action and join steps. action step provides additional information about feature derivation window in the timeInfo field.
- catalog_id: (str)
id of the catalog for a data step.
- catalog_version_id: (str)
id of the catalog version for a data step.
- group_by: (list[str])
list of columns which this action step aggregated by.
- columns: (list)
names of columns involved into the feature generation. Available only for data steps.
- time_info: (dict)
description of the feature derivation window which was applied to this action step.
- join_info: (list[dict])
join step details.
columns
structure is- data_type: (str)
the type of the feature, e.g. ‘Categorical’, ‘Text’
- is_input: (bool)
indicates features which provided data to transform in this lineage.
- name: (str)
feature name.
- is_cutoff: (bool)
indicates a cutoff column.
time_info
structure is:- latest: (dict)
end of the feature derivation window applied.
- duration: (dict)
size of the feature derivation window applied.
latest
and duration structure is:- time_unit: (str)
time unit name like ‘MINUTE’, ‘DAY’, ‘MONTH’ etc.
- duration: (int)
value/size of this duration object.
join_info
structure is:- join_type - (str)
kind of join, left/right.
- left_table - (dict)
information about a dataset which was considered as left.
- right_table - (str)
information about a dataset which was considered as right.
left_table
andright_table
structure is:- columns - (list[str])
list of columns which datasets were joined by.
- datasteps - (list[int])
list of data steps id which brought the columns into the current step dataset.
- classmethod get(project_id, id)
Retrieve a single FeatureLineage.
- Parameters:
project_id (
str
) – The id of the project the feature belongs toid (
str
) – id of a feature lineage to retrieve
- Returns:
lineage – The queried instance
- Return type: