Recipes

class datarobot.models.recipe.Recipe

Data wrangling entity containing information required to transform one or more datasets and generate SQL.

A recipe acts like a blueprint for creating a dataset by applying a series of operations (filters, aggregations, etc.) to one or more input datasets or datasources.

Variables:
  • id (str) – The unique identifier of the recipe.

  • name (str) – The name of the recipe. Not unique.

  • status (str) – The status of the recipe.

  • dialect (DataWranglingDialect) – The dialect of the recipe.

  • recipe_type (RecipeType) – The type of the recipe.

  • inputs (List[Union[JDBCTableDataSourceInput, RecipeDatasetInput]]) – The list of inputs for the recipe. Each input can be either a JDBCTableDataSourceInput or a RecipeDatasetInput.

  • operations (Optional[List[WranglingOperation]]) – The list of operations for the recipe.

  • downsampling (Optional[DownsamplingOperation]) – The downsampling operation applied to the recipe. Used when publishing the recipe to a dataset.

  • settings (Optional[RecipeSettings]) – The settings for the recipe.

update(name=None, description=None, sql=None, recipe_type=None, inputs=None, operations=None, **kwargs)

Update the recipe.

Parameters:
  • name (Optional[str]) – The new recipe name.

  • description (Optional[str]) – The new recipe description.

  • sql (Optional[str]) – The new wrangling sql. Only applicable for the SQL recipe_type.

  • recipe_type (Optional[RecipeType]) – The new type of the recipe. Only switching between SQL and WRANGLING is applicable.

  • inputs (Optional[List[JDBCTableDataSourceInput | RecipeDatasetInput]]) – The new list of recipe inputs. You can update sampling and/or aliases using this parameter.

  • operations (Optional[List[WranglingOperation]]) – The new list of operations. Only applicable for the WRANGLING recipe_type.

  • downsampling (Optional[DownsamplingOperation]) – The new downsampling or None if you wouldn’t like to apply any downsampling on publishing.

Return type:

None

Examples

Update downsampling to only keep 500 random rows when publishing:

>>> import datarobot as dr
>>> from datarobot.models.recipe_operation import RandomDownsamplingOperation
>>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f')
>>> recipe.update(
...     downsampling=RandomDownsamplingOperation(max_rows=500)
... )
get_preview(max_wait=600, number_of_operations_to_use=None)

Retrieve preview of sample data. Compute preview if absent.

Parameters:
  • max_wait (int) – Maximum number of seconds to wait when retrieving the preview.

  • number_of_operations_to_use (Optional[int]) – Number of operations to use when computing the preview. If provided, the first N operations will be used. If not provided, all operations will be used.

Returns:

preview – The preview of the application of the recipe.

Return type:

RecipePreview

Examples

>>> import datarobot as dr
>>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f')
>>> preview = recipe.get_preview()
>>> preview
RecipePreview(
    columns=['feature_1', 'feature_2', 'feature_3'],
    count=4,
    data=[['5', 'true', 'James'], ['-7', 'false', 'Bryan'], ['2', 'false', 'Jamie'], ['4', 'true', 'Lyra']],
    total_count=4,
    byte_size=46,
    result_schema=[
        {'data_type': 'INT_TYPE', 'name': 'feature_1'},
        {'data_type': 'BOOLEAN_TYPE', 'name': 'feature_2'},
        {'data_type': 'STRING_TYPE', 'name': 'feature_3'}
    ],
    stored_count=4,
    estimated_size_exceeds_limit=False,
)
>>> preview.df
  feature_1 feature_2 feature_3
0         5      true     James
1        -7     false     Bryan
2         2     false     Jamie
3         4      true      Lyra
classmethod update_downsampling(recipe_id, downsampling)

Set the downsampling operation for the recipe. Downsampling is applied during publishing. Consider using update() instead to update a Recipe instance.

Parameters:
  • recipe_id (str) – Recipe ID.

  • downsampling (Optional[DownsamplingOperation]) – Downsampling operation to be applied during publishing. If None, no downsampling will be applied.

Returns:

recipe – Recipe with updated downsampling.

Return type:

Recipe

Examples

>>> import datarobot as dr
>>> from datarobot.models.recipe_operation import RandomDownsamplingOperation
>>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f')
>>> recipe = dr.Recipe.update_downsampling(
...     recipe_id=recipe.id,
...     downsampling=RandomDownsamplingOperation(max_rows=1000)
... )

See also

Recipe.update

retrieve_preview(max_wait=600, number_of_operations_to_use=None)

Retrieve preview of sample data. Compute preview if absent.

Deprecated since version 3.10: This method is deprecated and will be removed in 3.12. Use Recipe.get_preview instead.

Parameters:
  • max_wait (int) – Maximum number of seconds to wait when retrieving the preview.

  • number_of_operations_to_use (Optional[int]) – Number of operations to use when computing the preview. If provided, the first N operations will be used. If not provided, all operations will be used.

Returns:

preview – Preview data computed.

Return type:

Dict[str, Any]

retrieve_insights(max_wait=600, number_of_operations_to_use=None)

Retrieve insights for the recipe sample data. Requires a preview of sample data to be computed first with .get_preview(). Computing the preview starts the insights job in the background automatically if it not already running. Will block thread until insights are ready or max_wait is exceeded.

Parameters:
  • max_wait (int) – Maximum number of seconds to wait when retrieving the insights.

  • number_of_operations_to_use (Optional[int]) – Number of operations to use when computing insights. A preview must be computed first for the same number of operations. If provided, the first N operations will be used. If not provided, all operations will be used.

Returns:

insights – The insights for the recipe sample data.

Return type:

Dict[str, Any]

classmethod set_inputs(recipe_id, inputs)

Set the inputs for the recipe. Inputs can be a dataset or JDBC tables datasource. Consider using update() instead to update a Recipe instance.

Parameters:
Returns:

recipe – Recipe with updated inputs.

Return type:

Recipe

See also

Recipe.update

classmethod set_operations(recipe_id, operations)

Set the list of operations to use in the recipe. Operations are applied in order on the input(s). Consider using update() instead to update a Recipe instance.

Parameters:
  • recipe_id (str) – Recipe ID.

  • operations (List[WranglingOperation]) – List of operations to set in the recipe.

Returns:

recipe – Recipe with updated list of operations.

Return type:

Recipe

Examples

>>> import datarobot as dr
>>> from datarobot.models.recipe_operation import FilterOperation, FilterCondition
>>> from datarobot.enums import FilterOperationFunctions
>>> recipe = dr.Recipe.get("690bbf77aa31530d8287ae5f")
>>> new_operations = [
...    FilterOperation(
...        conditions=[
...            FilterCondition(
...                column="column_A",
...                function=FilterOperationFunctions.GREATER_THAN,
...                function_arguments=[100]
...            )
...        ]
...    )
... ]
>>> recipe = dr.Recipe.set_operations(recipe.id, operations=new_operations)

See also

Recipe.update

classmethod set_recipe_metadata(recipe_id, metadata)

Update metadata for the recipe.

Parameters:
  • recipe_id (str) – Recipe ID.

  • metadata (Dict[str, str]) – Dictionary of metadata to be updated.

Returns:

recipe – New recipe with updated metadata.

Return type:

Recipe

classmethod list(search=None, dialect=None, status=None, recipe_type=None, order_by=None, created_by_user_id=None, created_by_username=None)

List recipes. Apply filters to narrow down results.

Parameters:
  • search (Optional[str]) – Recipe name to filter by.

  • dialect (Optional[DataWranglingDialect]) – Recipe dialect to filter by.

  • status (Optional[str]) – Recipe status to filter by. E.g., draft, published.

  • recipe_type (Optional[RecipeType]) – Recipe type to filter by.

  • order_by (Optional[str]) – Field to order results by. For reverse ordering prefix with ‘-’, e.g. -recipe_id.

  • created_by_user_id (Optional[str]) – User ID to filter recipes by. Return recipes created by user(s) associated with a user ID.

  • created_by_username (Optional[str]) – User name to filter recipes by. Return recipes created by user(s) associated with username.

Returns:

recipes – List of recipes matching the filter criteria.

Return type:

List[Recipe]

Examples

>>> import datarobot as dr
>>> recipes = dr.Recipe.list()
>>> recipes
[Recipe(
    dialect='spark',
    id='690bbf77aa31530d8287ae5f',
    name='Sample Recipe',
    status='draft',
    recipe_type='SQL',
    inputs=[...],
    operations=[...],
    downsampling=...,
    settings=...,
), ...]

See also

Recipe.get

classmethod get(recipe_id)

Retrieve a recipe by ID.

Parameters:

recipe_id (str) – The ID of the recipe to retrieve.

Returns:

recipe – The recipe with the specified ID.

Return type:

Recipe

Examples

>>> import datarobot as dr
>>> recipe = dr.Recipe.get("690bbf77aa31530d8287ae5f")
>>> recipe
Recipe(
    dialect='spark',
    id='690bbf77aa31530d8287ae5f',
    name='Sample Recipe',
    status='draft',
    recipe_type='SQL',
    inputs=[...],
    operations=[...],
    downsampling=...,
    settings=...,
)

See also

Recipe.list

get_sql(operations=None)

Generate SQL for the recipe, taking into account its operations and inputs. This does not modify the recipe.

Parameters:

operations (Optional[List[WranglingOperation]]) – If provided, generate SQL for the given list of operations instead of the recipe’s operations, using the recipe’s inputs as the base. .. deprecated:: 3.10 operations is deprecated and will be removed in 3.12. Use generate_sql_for_operations class method instead.

Returns:

sql – Generated SQL string.

Return type:

str

Examples

>>> import datarobot as dr
>>> from datarobot.models.recipe_operation import FilterOperation, FilterCondition
>>> from datarobot.enums import FilterOperationFunctions
>>> recipe = dr.Recipe.get("690bbf77aa31530d8287ae5f")
>>> recipe.update(operations=[
...    FilterOperation(
...        conditions=[
...            FilterCondition(
...                column="column_A",
...                function=FilterOperationFunctions.GREATER_THAN,
...                function_arguments=[100]
...            )
...        ]
...    )
... ])
>>> recipe.get_sql()
"SELECT `sample_dataset`.`column_A` FROM `sample_dataset` WHERE `sample_dataset`.`column_A` > 100"
classmethod generate_sql_for_operations(recipe_id, operations)

Generate SQL for an arbitrary list of operations, using an existing recipe as a base. This does not modify the recipe. If you want to generate SQL for a recipe’s operations, use get_sql() instead.

Parameters:
  • recipe_id (str) – The ID of the recipe to use as a base. The SQL generation will use the recipe’s inputs and dialect.

  • operations (List[WranglingOperation]) – The list of operations to generate SQL for.

Returns:

sql – Generated SQL string.

Return type:

str

Examples

>>> import datarobot as dr
>>> from datarobot.models.recipe_operation import FilterOperation, FilterCondition
>>> from datarobot.enums import FilterOperationFunctions
>>> dr.Recipe.generate_sql_for_operations(
...    recipe_id="690bbf77aa31530d8287ae5f",
...    operations=[
...        FilterOperation(
...            conditions=[
...                FilterCondition(
...                    column="column_A",
...                    function=FilterOperationFunctions.LESS_THAN,
...                    function_arguments=[20]
...                )
...            ]
...        )
...    ]
... )
"SELECT `sample_dataset`.`column_A` FROM `sample_dataset` WHERE `sample_dataset`.`column_A` < 20"
classmethod from_data_store(use_case, data_store, data_source_type, dialect, data_source_inputs, recipe_type=RecipeType.WRANGLING)

Create a wrangling recipe from data store.

Return type:

Recipe

classmethod from_dataset(use_case, dataset, dialect=None, inputs=None, recipe_type=RecipeType.WRANGLING, snapshot_policy=DataWranglingSnapshotPolicy.LATEST)

Create a wrangling recipe from dataset.

Return type:

Recipe

class datarobot.models.recipe.RecipeSettings

Settings, for example to apply at downsampling stage.

class datarobot.models.recipe.RecipeMetadata

The recipe metadata.

Variables:
  • name (Optional[str]) – The name of the recipe.

  • description (Optional[str]) – The description of the recipe.

  • recipe_type (Optional[RecipeType]) – The type of the recipe.

  • sql (Optional[str]) – The SQL query of the transformation that the recipe performs.

class datarobot.models.recipe.RecipePreview

A preview of data output from the application of a recipe.

Variables:
  • columns (List[str]) – List of column names in the preview.

  • count (int) – Number of rows in the preview.

  • data (List[List[Any]]) – The preview data as a list of rows, where each row is a list of values.

  • total_count (int) – Total number of rows in the dataset.

  • byte_size (int) – Data memory usage in bytes.

  • result_schema (List[Dict[Any]]) – JDBC result schema for the preview data.

  • stored_count (int) – Number of rows available for preview.

  • estimated_size_exceeds_limit (bool) – If downsampling should be done based on sample size.

  • next (Optional[RecipePreview]) – The next set of preview data, if available, otherwise None.

  • previous (Optional[RecipePreview]) – The previous set of preview data, if available, otherwise None.

  • df (pandas.DataFrame) – The preview data as a pandas DataFrame.

Recipe Inputs

class datarobot.models.recipe.RecipeDatasetInput

Object, describing inputs for recipe transformations.

class datarobot.models.recipe.DatasetInput
class datarobot.models.recipe.DataSourceInput

Inputs required to create a new recipe from data store.

class datarobot.models.recipe.JDBCTableDataSourceInput

Object, describing inputs for recipe transformations.

Recipe Operations

class datarobot.models.recipe_operation.BaseOperation

Single base transformation unit in Data Wrangler recipe.

Sampling Operations

class datarobot.models.recipe_operation.SamplingOperation
class datarobot.models.recipe_operation.RandomSamplingOperation
class datarobot.models.recipe_operation.DatetimeSamplingOperation

Downsampling Operations

Downsampling reduces the size of the dataset published for faster experimentation.

class datarobot.models.recipe_operation.DownsamplingOperation

Base class for downsampling operations.

class datarobot.models.recipe_operation.RandomDownsamplingOperation

A downsampling technique that reduces the size of the majority class using random sampling (i.e., each sample has an equal probability of being chosen).

Parameters:
  • max_rows (int) – The maximum number of rows to downsample to.

  • seed (int) – The random seed to use for downsampling. Optional.

Examples

>>> from datarobot.models.recipe_operation import RandomDownsamplingOperation
>>> op = RandomDownsamplingOperation(max_rows=600)
class datarobot.models.recipe_operation.SmartDownsamplingOperation

A downsampling technique that relies on the distribution of target values to adjust size and specifies how much a specific class was sampled in a new column.

For this technique to work, ensure the recipe has set target and weightsFeature in the recipe’s settings.

Parameters:
  • max_rows (int) – The maximum number of rows to downsample to.

  • method (SmartDownsamplingMethod) – The downsampling method to use.

  • seed (int) – The random seed to use for downsampling. Optional.

Examples

>>> from datarobot.models.recipe_operation import SmartDownsamplingOperation, SmartDownsamplingMethod
>>> op = SmartDownsamplingOperation(max_rows=1000, method=SmartDownsamplingMethod.BINARY)

Wrangling Operations

class datarobot.models.recipe_operation.WranglingOperation

Base class for data wrangling operations.

class datarobot.models.recipe_operation.LagsOperation

Data wrangling operation to create one or more lags for a feature based off of a datetime ordering feature. This operation will create a new column for each lag order specified.

Parameters:
  • column (str) – Column name to create lags for.

  • orders (List[int]) – List of lag orders to create.

  • datetime_partition_column (str) – Column name used to partition the data by datetime. Used to order the data for lag creation.

  • multiseries_id_column (Optional[str]) – Column name used to identify time series within the data. Required only for multiseries.

Examples

Create lags of orders 1, 5 and 30 in stock price data on opening price column “open_price”, ordered by datetime column “date”. The data contains multiple time series identified by “ticker_symbol”:

>>> import datarobot as dr
>>> from datarobot.models.recipe_operation import LagsOperation
>>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f')
>>> lags_op = LagsOperation(
...     column="open_price",
...     orders=[1, 5, 30],
...     datetime_partition_column="date",
...     multiseries_id_column="ticker_symbol",
... )
>>> recipe.update(operations=[lags_op])
class datarobot.models.recipe_operation.WindowCategoricalStatsOperation

Data wrangling operation to calculate categorical statistics for a rolling window. This operation will create a new column for each method specified.

Parameters:
  • column (str) – Column name to create rolling statistics for.

  • window_size (int) – Number of rows to include in the rolling window.

  • methods (List[CategoricalStatsMethods]) – List of methods to apply for rolling statistics. Currently only supports datarobot.enums.CategoricalStatsMethods.MOST_FREQUENT.

  • datetime_partition_column (str) – Column name used to partition the data by datetime. Used to order the timeseries data.

  • multiseries_id_column (Optional[str]) – Column name used to identify each time series within the data. Required only for multiseries.

  • rolling_most_frequent_udf (Optional[str]) – Fully qualified path to rolling most frequent user defined function. Used to optimize sql execution with snowflake.

Examples

Create rolling categorical statistics to track the most frequent product category purchased by customers based on their last 50 purchases:

>>> import datarobot as dr
>>> from datarobot.models.recipe_operation import WindowCategoricalStatsOperation
>>> from datarobot.enums import CategoricalStatsMethods
>>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f')
>>> window_cat_stats_op = WindowCategoricalStatsOperation(
...     column="product_category",
...     window_size=50,
...     methods=[CategoricalStatsMethods.MOST_FREQUENT],
...     datetime_partition_column="purchase_date",
...     multiseries_id_column="customer_id",
... )
>>> recipe.update(operations=[window_cat_stats_op])
class datarobot.models.recipe_operation.WindowNumericStatsOperation

Data wrangling operation to calculate numeric statistics for a rolling window. This operation will create one or more new columns.

Parameters:
  • column (str) – Column name to create rolling statistics for.

  • window_size (int) – Number of rows to include in the rolling window.

  • methods (List[NumericStatsMethods]) – List of methods to apply for rolling statistics. A new column will be created for each method.

  • datetime_partition_column (str) – Column name used to partition the data by datetime. Used to order the timeseries data.

  • multiseries_id_column (Optional[str]) – Column name used to identify each time series within the data. Required only for multiseries.

  • rolling_median_udf (Optional[str]) – Fully qualified path to a rolling median user-defined function. Used to optimize SQL execution with Snowflake.

Examples

Create rolling numeric statistics to track the maximum, minimum, and median stock prices over the last 7 trading sessions:

>>> import datarobot as dr
>>> from datarobot.models.recipe_operation import WindowNumericStatsOperation
>>> from datarobot.enums import NumericStatsMethods
>>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f')
>>> window_num_stats_op = WindowNumericStatsOperation(
...     column="stock_price",
...     window_size=7,
...     methods=[
...         NumericStatsMethods.MAX,
...         NumericStatsMethods.MIN,
...         NumericStatsMethods.MEDIAN,
...     ],
...     datetime_partition_column="trading_date",
...     multiseries_id_column="ticker_symbol",
... )
>>> recipe.update(operations=[window_num_stats_op])
class datarobot.models.recipe_operation.TimeSeriesOperation

Data wrangling operation to generate a dataset ready for time series modeling: with forecast point, forecast distances, known in advance columns, etc.

Parameters:
  • target_column (str) – Target column to use for generating naive baseline features during feature reduction.

  • datetime_partition_column (str) – Column name used to partition the data by datetime. Used to order the time series data.

  • forecast_distances (List[int]) – List of forecast distances to generate features for. Each distance represents a relative position that determines how many rows ahead to predict.

  • task_plan (List[TaskPlanElement]) – List of task plans for each column.

  • baseline_periods (Optional[List[int]]) – List of integers representing the periodicities used to generate naive baseline features from the target. Baseline period = 1 corresponds to the naive latest baseline.

  • known_in_advance_columns (Optional[List[str]]) – List of columns that are known in advance at prediction time, i.e. features that do not need to be lagged.

  • multiseries_id_column (Optional[str]) – Column name used to identify each time series within the data. Required only for multiseries.

  • rolling_median_udf (Optional[str]) – Fully qualified path to rolling median user defined function. Used to optimize SQL execution with Snowflake.

  • rolling_most_frequent_udf (Optional[str]) – Fully qualified path to rolling most frequent user defined function.

  • forecast_point (Optional[datetime]) – To use at prediction time.

Examples

Create a time series operation for sales forecasting with forecast distances of 7 and 30 days, using the sale amount as the target column, the date of the sale for datetime ordering, and “store_id” as the multiseries identifier. The operation includes a task plan to compute lags of orders 1, 7, and 30 on the sales amount, and specifies known in advance columns “promotion” and “holiday_flag”:

>>> import datarobot as dr
>>> from datarobot.models.recipe_operation import TimeSeriesOperation, TaskPlanElement, Lags
>>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f')
>>> task_plan = [
...     TaskPlanElement(
...         column="sales_amount",
...         task_list=[Lags(orders=[1, 7, 30])]
...     )
... ]
>>> time_series_op = TimeSeriesOperation(
...     target_column="sales_amount",
...     datetime_partition_column="sale_date",
...     forecast_distances=[7, 30],
...     task_plan=[task_plan],
...     known_in_advance_columns=["promotion", "holiday_flag"],
...     multiseries_id_column="store_id"
... )
>>> recipe.update(operations=[time_series_op])
class datarobot.models.recipe_operation.ComputeNewOperation

Data wrangling operation to create a new feature computed using a SQL expression.

Parameters:
  • expression (str) – SQL expression to compute the new feature.

  • new_feature_name (str) – Name of the new feature.

Examples

Create a new feature “total_sales” by summing the total of “online_sales” and “in_store_sales”, rounded to the nearest dollar:

>>> import datarobot as dr
>>> from datarobot.models.recipe_operation import ComputeNewOperation
>>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f')
>>> compute_new_op = ComputeNewOperation(
...     expression="ROUND(online_sales + in_store_sales, 0)",
...     new_feature_name="total_sales"
... )
>>> recipe.update(operations=[compute_new_op])
class datarobot.models.recipe_operation.RenameColumnsOperation

Data wrangling operation to rename one or more columns.

Parameters:

column_mappings (Dict[str, str]) – Mapping of original column names to new column names.

Examples

Rename columns “old_name1” to “new_name1” and “old_name2” to “new_name2”:

>>> import datarobot as dr
>>> from datarobot.models.recipe_operation import RenameColumnsOperation
>>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f')
>>> rename_op = RenameColumnsOperation(
...     column_mappings={'old_name1': 'new_name1', 'old_name2': 'new_name2'}
... )
>>> recipe.update(operations=[rename_op])
class datarobot.models.recipe_operation.FilterOperation

Data wrangling operation to filter rows based on one or more conditions.

Parameters:
  • conditions (List[FilterCondition]) – List of conditions to filter on.

  • keep_rows (Optional[bool]) – If matching rows should be kept or dropped.

  • operator (Optional[str]) – Operator to use between conditions when using multiple conditions. Allowed values: [and, or].

Examples

Filter input to only keep users older than 18:

>>> import datarobot as dr
>>> from datarobot.models.recipe_operation import FilterOperation, FilterCondition
>>> from datarobot.enums import FilterOperationFunctions
>>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f')
>>> condition = FilterCondition(
...     column="age",
...     function=FilterOperationFunctions.GREATER_THAN,
...     function_arguments=[18]
... )
>>> filter_op = FilterOperation(conditions=[condition], keep_rows=True)
>>> recipe.update(operations=[filter_op])

Filter input to filter out rows where “status” is either “inactive” or “banned”:

>>> import datarobot as dr
>>> from datarobot.models.recipe_operation import FilterOperation, FilterCondition
>>> from datarobot.enums import FilterOperationFunctions
>>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f')
>>> inactive_cond = FilterCondition(
...     column="status",
...     function=FilterOperationFunctions.EQUALS,
...     function_arguments=["inactive"]
... )
>>> banned_cond = FilterCondition(
...     column="status",
...     function=FilterOperationFunctions.EQUALS,
...     function_arguments=["banned"]
... )
>>> filter_op = FilterOperation(
...     conditions=[inactive_cond, banned_cond],
...     keep_rows=False,
...     operator="or"
... )
>>> recipe.update(operations=[filter_op])
class datarobot.models.recipe_operation.DropColumnsOperation

Data wrangling operation to drop one or more columns.

Parameters:

columns (List[str]) – Columns to drop.

Examples

>>> import datarobot as dr
>>> from datarobot.models.recipe_operation import DropColumnsOperation
>>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f')
>>> drop_op = DropColumnsOperation(columns=['col1', 'col2'])
>>> recipe.update(operations=[drop_op])
class datarobot.models.recipe_operation.DedupeRowsOperation

Data wrangling operation to remove duplicate rows. Uses values from all columns.

Examples

>>> import datarobot as dr
>>> from datarobot.models.recipe_operation import DedupeRowsOperation
>>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f')
>>> dedupe_op = DedupeRowsOperation()
>>> recipe.update(operations=[dedupe_op])
class datarobot.models.recipe_operation.FindAndReplaceOperation

Data wrangling operation to find and replace strings in a column.

Parameters:
  • column (str) – Column name to perform find and replace on.

  • find (str) – String or expression to find.

  • replace_with (str) – String to replace with.

  • match_mode (FindAndReplaceMatchMode) – Match mode to use when finding strings.

  • is_case_sensitive (bool) – Whether the find operation should be case sensitive.

Examples

Set Recipe operations to search for exact match of “old_value” in column “col1” and replace with “new_value”:

>>> import datarobot as dr
>>> from datarobot.models.recipe_operation import FindAndReplaceOperation
>>> from datarobot.enums importFindAndReplaceMatchMode
>>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f')
>>> find_replace_op = FindAndReplaceOperation(
...     column="col1",
...     find="old_value",
...     replace_with="new_value",
...     match_mode=FindAndReplaceMatchMode.EXACT,
...     is_case_sensitive=True
... )
>>> recipe.update(operations=[find_replace_op])

Set Recipe operations to use regular expression to replace names starting with “Brand” in column “name” and replace with “Lyra”:

>>> import datarobot as dr
>>> from datarobot.models.recipe_operation import FindAndReplaceOperation
>>> from datarobot.enums import FindAndReplaceMatchMode
>>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f')
>>> find_replace_op = FindAndReplaceOperation(
...     column="name",
...     find="^Brand.*",
...     replace_with="Lyra",
...     match_mode=FindAndReplaceMatchMode.REGEX
... )
>>> recipe.update(operations=[find_replace_op])

Enums and Helpers

class datarobot.models.recipe_operation.TaskPlanElement

Represents a task plan element for a specific column in a time series operation.

Parameters:
  • column (str) – Column name for which the task plan is defined.

  • task_list (List[BaseTimeAwareTask]) – List of time-aware tasks to be applied to the column.

class datarobot.models.recipe_operation.BaseTimeAwareTask

Base class for time-aware tasks in time series operation task plan.

class datarobot.models.recipe_operation.CategoricalStats

Time-aware task to compute categorical statistics for a rolling window.

Parameters:
  • methods (List[CategoricalStatsMethods]) – List of categorical statistical methods to apply for rolling statistics.

  • window_size (int) – Number of rows to include in the rolling window.

class datarobot.models.recipe_operation.NumericStats

Time-aware task to compute numeric statistics for a rolling window.

Parameters:
  • methods (List[NumericStatsMethods]) – List of numeric statistical methods to apply for rolling statistics.

  • window_size (int) – Number of rows to include in the rolling window.

class datarobot.models.recipe_operation.Lags

Time-aware task to create one or more lags for a feature.

Parameters:

orders (List[int]) – List of lag orders to create.

class datarobot.enums.CategoricalStatsMethods

Supported categorical stats methods for data wrangling.

class datarobot.enums.NumericStatsMethods

Supported numeric stats methods for data wrangling.

class datarobot.models.recipe_operation.FilterCondition

Condition to filter rows in a FilterOperation.

Parameters:
  • column (str) – Column name to apply the condition on.

  • function (FilterOperationFunctions) – The filtering function to use.

  • function_arguments (List[Union[str, int, float]]) – The list of arguments for the filtering function.

Examples

FilterCondition to filter rows where “age” is between 18 and 65:

>>> from datarobot.models.recipe_operation import FilterCondition
>>> from datarobot.enums import FilterOperationFunctions
>>> condition = FilterCondition(
...     column="age",
...     function=FilterOperationFunctions.BETWEEN,
...     function_arguments=[18, 65]
... )
class datarobot.enums.FilterOperationFunctions

Operations supported in a FilterCondition.

class datarobot.enums.RecipeType
class datarobot.enums.DataWranglingDialect
class datarobot.enums.FindAndReplaceMatchMode

Find and replace modes used when searching for strings to replace.