Feature lineage

class datarobot.models.FeatureLineage

Lineage of an automatically engineered feature.

Variables:

steps (list) –

list of steps which were applied to build the feature.

steps structure is:

id - (int)

step id starting with 0.

step_type: (str)

one of the data/action/json/generatedData.

name: (str)

name of the step.

description: (str)

description of the step.

parents: (list[int])

references to other steps id.

is_time_aware: (bool)

indicator of step being time aware. Mandatory only for action and join steps. action step provides additional information about feature derivation window in the timeInfo field.

catalog_id: (str)

id of the catalog for a data step.

catalog_version_id: (str)

id of the catalog version for a data step.

group_by: (list[str])

list of columns which this action step aggregated by.

columns: (list)

names of columns involved into the feature generation. Available only for data steps.

time_info: (dict)

description of the feature derivation window which was applied to this action step.

join_info: (list[dict])

join step details.

columns structure is

data_type: (str)

the type of the feature, e.g. ‘Categorical’, ‘Text’

is_input: (bool)

indicates features which provided data to transform in this lineage.

name: (str)

feature name.

is_cutoff: (bool)

indicates a cutoff column.

time_info structure is:

latest: (dict)

end of the feature derivation window applied.

duration: (dict)

size of the feature derivation window applied.

latest and duration structure is:

time_unit: (str)

time unit name like ‘MINUTE’, ‘DAY’, ‘MONTH’ etc.

duration: (int)

value/size of this duration object.

join_info structure is:

join_type - (str)

kind of join, left/right.

left_table - (dict)

information about a dataset which was considered as left.

right_table - (str)

information about a dataset which was considered as right.

left_table and right_table structure is:

columns - (list[str])

list of columns which datasets were joined by.

datasteps - (list[int])

list of data steps id which brought the columns into the current step dataset.

classmethod get(project_id, id)

Retrieve a single FeatureLineage.

Parameters:
  • project_id (str) – The id of the project the feature belongs to

  • id (str) – id of a feature lineage to retrieve

Returns:

lineage – The queried instance

Return type:

FeatureLineage