Recipes
- class datarobot.models.recipe.Recipe
Data wrangling entity containing information required to transform one or more datasets and generate SQL.
A recipe acts like a blueprint for creating a dataset by applying a series of operations (filters, aggregations, etc.) to one or more input datasets or datasources.
- Variables:
id (
str) – The unique identifier of the recipe.name (
str) – The name of the recipe. Not unique.status (
str) – The status of the recipe.dialect (
DataWranglingDialect) – The dialect of the recipe.recipe_type (
RecipeType) – The type of the recipe.inputs (
List[Union[JDBCTableDataSourceInput,RecipeDatasetInput]]) – The list of inputs for the recipe. Each input can be either aJDBCTableDataSourceInputor aRecipeDatasetInput.operations (
Optional[List[WranglingOperation]]) – The list of operations for the recipe.downsampling (
Optional[DownsamplingOperation]) – The downsampling operation applied to the recipe. Used when publishing the recipe to a dataset.settings (
Optional[RecipeSettings]) – The settings for the recipe.
- update(name=None, description=None, sql=None, recipe_type=None, inputs=None, operations=None, **kwargs)
Update the recipe.
- Parameters:
name (
Optional[str]) – The new recipe name.description (
Optional[str]) – The new recipe description.sql (
Optional[str]) – The new wrangling sql. Only applicable for the SQL recipe_type.recipe_type (
Optional[RecipeType]) – The new type of the recipe. Only switching between SQL and WRANGLING is applicable.inputs (
Optional[List[JDBCTableDataSourceInput|RecipeDatasetInput]]) – The new list of recipe inputs. You can update sampling and/or aliases using this parameter.operations (
Optional[List[WranglingOperation]]) – The new list of operations. Only applicable for the WRANGLING recipe_type.downsampling (
Optional[DownsamplingOperation]) – The new downsampling or None if you wouldn’t like to apply any downsampling on publishing.- Return type:
NoneExamples
Update downsampling to only keep 500 random rows when publishing:
>>> import datarobot as dr >>> from datarobot.models.recipe_operation import RandomDownsamplingOperation >>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f') >>> recipe.update( ... downsampling=RandomDownsamplingOperation(max_rows=500) ... )
- get_preview(max_wait=600, number_of_operations_to_use=None)
Retrieve preview of sample data. Compute preview if absent.
- Parameters:
max_wait (
int) – Maximum number of seconds to wait when retrieving the preview.number_of_operations_to_use (
Optional[int]) – Number of operations to use when computing the preview. If provided, the first N operations will be used. If not provided, all operations will be used.- Returns:
preview – The preview of the application of the recipe.
- Return type:
Examples
>>> import datarobot as dr >>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f') >>> preview = recipe.get_preview() >>> preview RecipePreview( columns=['feature_1', 'feature_2', 'feature_3'], count=4, data=[['5', 'true', 'James'], ['-7', 'false', 'Bryan'], ['2', 'false', 'Jamie'], ['4', 'true', 'Lyra']], total_count=4, byte_size=46, result_schema=[ {'data_type': 'INT_TYPE', 'name': 'feature_1'}, {'data_type': 'BOOLEAN_TYPE', 'name': 'feature_2'}, {'data_type': 'STRING_TYPE', 'name': 'feature_3'} ], stored_count=4, estimated_size_exceeds_limit=False, ) >>> preview.df feature_1 feature_2 feature_3 0 5 true James 1 -7 false Bryan 2 2 false Jamie 3 4 true Lyra
- classmethod update_downsampling(recipe_id, downsampling)
Set the downsampling operation for the recipe. Downsampling is applied during publishing. Consider using update() instead to update a Recipe instance.
- Parameters:
recipe_id (
str) – Recipe ID.downsampling (
Optional[DownsamplingOperation]) – Downsampling operation to be applied during publishing. If None, no downsampling will be applied.- Returns:
recipe – Recipe with updated downsampling.
- Return type:
Examples
>>> import datarobot as dr >>> from datarobot.models.recipe_operation import RandomDownsamplingOperation >>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f') >>> recipe = dr.Recipe.update_downsampling( ... recipe_id=recipe.id, ... downsampling=RandomDownsamplingOperation(max_rows=1000) ... )See also
- retrieve_preview(max_wait=600, number_of_operations_to_use=None)
Retrieve preview of sample data. Compute preview if absent.
Deprecated since version 3.10: This method is deprecated and will be removed in 3.12. Use
Recipe.get_previewinstead.
- Parameters:
max_wait (
int) – Maximum number of seconds to wait when retrieving the preview.number_of_operations_to_use (
Optional[int]) – Number of operations to use when computing the preview. If provided, the first N operations will be used. If not provided, all operations will be used.- Returns:
preview – Preview data computed.
- Return type:
Dict[str,Any]See also
- retrieve_insights(max_wait=600, number_of_operations_to_use=None)
Retrieve insights for the recipe sample data. Requires a preview of sample data to be computed first with .get_preview(). Computing the preview starts the insights job in the background automatically if it not already running. Will block thread until insights are ready or max_wait is exceeded.
- Parameters:
max_wait (
int) – Maximum number of seconds to wait when retrieving the insights.number_of_operations_to_use (
Optional[int]) – Number of operations to use when computing insights. A preview must be computed first for the same number of operations. If provided, the first N operations will be used. If not provided, all operations will be used.- Returns:
insights – The insights for the recipe sample data.
- Return type:
Dict[str,Any]
- classmethod set_inputs(recipe_id, inputs)
Set the inputs for the recipe. Inputs can be a dataset or JDBC tables datasource. Consider using update() instead to update a Recipe instance.
- Parameters:
recipe_id (
str) – Recipe ID.inputs (
List[JDBCTableDataSourceInput|RecipeDatasetInput]) – List of inputs to use in the recipe.- Returns:
recipe – Recipe with updated inputs.
- Return type:
See also
- classmethod set_operations(recipe_id, operations)
Set the list of operations to use in the recipe. Operations are applied in order on the input(s). Consider using update() instead to update a Recipe instance.
- Parameters:
recipe_id (
str) – Recipe ID.operations (
List[WranglingOperation]) – List of operations to set in the recipe.- Returns:
recipe – Recipe with updated list of operations.
- Return type:
Examples
>>> import datarobot as dr >>> from datarobot.models.recipe_operation import FilterOperation, FilterCondition >>> from datarobot.enums import FilterOperationFunctions >>> recipe = dr.Recipe.get("690bbf77aa31530d8287ae5f") >>> new_operations = [ ... FilterOperation( ... conditions=[ ... FilterCondition( ... column="column_A", ... function=FilterOperationFunctions.GREATER_THAN, ... function_arguments=[100] ... ) ... ] ... ) ... ] >>> recipe = dr.Recipe.set_operations(recipe.id, operations=new_operations)See also
- classmethod set_recipe_metadata(recipe_id, metadata)
Update metadata for the recipe.
- Parameters:
recipe_id (
str) – Recipe ID.metadata (
Dict[str,str]) – Dictionary of metadata to be updated.- Returns:
recipe – New recipe with updated metadata.
- Return type:
- classmethod list(search=None, dialect=None, status=None, recipe_type=None, order_by=None, created_by_user_id=None, created_by_username=None)
List recipes. Apply filters to narrow down results.
- Parameters:
search (
Optional[str]) – Recipe name to filter by.dialect (
Optional[DataWranglingDialect]) – Recipe dialect to filter by.status (
Optional[str]) – Recipe status to filter by. E.g., draft, published.recipe_type (
Optional[RecipeType]) – Recipe type to filter by.order_by (
Optional[str]) – Field to order results by. For reverse ordering prefix with ‘-’, e.g. -recipe_id.created_by_user_id (
Optional[str]) – User ID to filter recipes by. Return recipes created by user(s) associated with a user ID.created_by_username (
Optional[str]) – User name to filter recipes by. Return recipes created by user(s) associated with username.- Returns:
recipes – List of recipes matching the filter criteria.
- Return type:
List[Recipe]Examples
>>> import datarobot as dr >>> recipes = dr.Recipe.list() >>> recipes [Recipe( dialect='spark', id='690bbf77aa31530d8287ae5f', name='Sample Recipe', status='draft', recipe_type='SQL', inputs=[...], operations=[...], downsampling=..., settings=..., ), ...]See also
- classmethod get(recipe_id)
Retrieve a recipe by ID.
- Parameters:
recipe_id (
str) – The ID of the recipe to retrieve.- Returns:
recipe – The recipe with the specified ID.
- Return type:
Examples
>>> import datarobot as dr >>> recipe = dr.Recipe.get("690bbf77aa31530d8287ae5f") >>> recipe Recipe( dialect='spark', id='690bbf77aa31530d8287ae5f', name='Sample Recipe', status='draft', recipe_type='SQL', inputs=[...], operations=[...], downsampling=..., settings=..., )See also
- get_sql(operations=None)
Generate SQL for the recipe, taking into account its operations and inputs. This does not modify the recipe.
- Parameters:
operations (
Optional[List[WranglingOperation]]) – If provided, generate SQL for the given list of operations instead of the recipe’s operations, using the recipe’s inputs as the base. .. deprecated:: 3.10 operations is deprecated and will be removed in 3.12. Use generate_sql_for_operations class method instead.- Returns:
sql – Generated SQL string.
- Return type:
strExamples
>>> import datarobot as dr >>> from datarobot.models.recipe_operation import FilterOperation, FilterCondition >>> from datarobot.enums import FilterOperationFunctions >>> recipe = dr.Recipe.get("690bbf77aa31530d8287ae5f") >>> recipe.update(operations=[ ... FilterOperation( ... conditions=[ ... FilterCondition( ... column="column_A", ... function=FilterOperationFunctions.GREATER_THAN, ... function_arguments=[100] ... ) ... ] ... ) ... ]) >>> recipe.get_sql() "SELECT `sample_dataset`.`column_A` FROM `sample_dataset` WHERE `sample_dataset`.`column_A` > 100"See also
- classmethod generate_sql_for_operations(recipe_id, operations)
Generate SQL for an arbitrary list of operations, using an existing recipe as a base. This does not modify the recipe. If you want to generate SQL for a recipe’s operations, use get_sql() instead.
- Parameters:
recipe_id (
str) – The ID of the recipe to use as a base. The SQL generation will use the recipe’s inputs and dialect.operations (
List[WranglingOperation]) – The list of operations to generate SQL for.- Returns:
sql – Generated SQL string.
- Return type:
strExamples
>>> import datarobot as dr >>> from datarobot.models.recipe_operation import FilterOperation, FilterCondition >>> from datarobot.enums import FilterOperationFunctions >>> dr.Recipe.generate_sql_for_operations( ... recipe_id="690bbf77aa31530d8287ae5f", ... operations=[ ... FilterOperation( ... conditions=[ ... FilterCondition( ... column="column_A", ... function=FilterOperationFunctions.LESS_THAN, ... function_arguments=[20] ... ) ... ] ... ) ... ] ... ) "SELECT `sample_dataset`.`column_A` FROM `sample_dataset` WHERE `sample_dataset`.`column_A` < 20"
- classmethod from_data_store(use_case, data_store, data_source_type, dialect, data_source_inputs, recipe_type=RecipeType.WRANGLING)
Create a wrangling recipe from data store.
- Return type:
- class datarobot.models.recipe.RecipeSettings
Settings, for example to apply at downsampling stage.
- class datarobot.models.recipe.RecipeMetadata
The recipe metadata.
- Variables:
name (
Optional[str]) – The name of the recipe.description (
Optional[str]) – The description of the recipe.recipe_type (
Optional[RecipeType]) – The type of the recipe.sql (
Optional[str]) – The SQL query of the transformation that the recipe performs.
- class datarobot.models.recipe.RecipePreview
A preview of data output from the application of a recipe.
- Variables:
columns (
List[str]) – List of column names in the preview.count (
int) – Number of rows in the preview.data (
List[List[Any]]) – The preview data as a list of rows, where each row is a list of values.total_count (
int) – Total number of rows in the dataset.byte_size (
int) – Data memory usage in bytes.result_schema (
List[Dict[Any]]) – JDBC result schema for the preview data.stored_count (
int) – Number of rows available for preview.estimated_size_exceeds_limit (
bool) – If downsampling should be done based on sample size.next (
Optional[RecipePreview]) – The next set of preview data, if available, otherwise None.previous (
Optional[RecipePreview]) – The previous set of preview data, if available, otherwise None.df (
pandas.DataFrame) – The preview data as a pandas DataFrame.
Recipe Inputs
- class datarobot.models.recipe.RecipeDatasetInput
Object, describing inputs for recipe transformations.
- class datarobot.models.recipe.DatasetInput
- class datarobot.models.recipe.DataSourceInput
Inputs required to create a new recipe from data store.
- class datarobot.models.recipe.JDBCTableDataSourceInput
Object, describing inputs for recipe transformations.
Recipe Operations
- class datarobot.models.recipe_operation.BaseOperation
Single base transformation unit in Data Wrangler recipe.
Sampling Operations
Downsampling Operations
Downsampling reduces the size of the dataset published for faster experimentation.
- class datarobot.models.recipe_operation.DownsamplingOperation
Base class for downsampling operations.
- class datarobot.models.recipe_operation.RandomDownsamplingOperation
A downsampling technique that reduces the size of the majority class using random sampling (i.e., each sample has an equal probability of being chosen).
- Parameters:
max_rows (
int) – The maximum number of rows to downsample to.seed (
int) – The random seed to use for downsampling. Optional.Examples
>>> from datarobot.models.recipe_operation import RandomDownsamplingOperation >>> op = RandomDownsamplingOperation(max_rows=600)
- class datarobot.models.recipe_operation.SmartDownsamplingOperation
A downsampling technique that relies on the distribution of target values to adjust size and specifies how much a specific class was sampled in a new column.
For this technique to work, ensure the recipe has set target and weightsFeature in the recipe’s settings.
- Parameters:
max_rows (
int) – The maximum number of rows to downsample to.method (
SmartDownsamplingMethod) – The downsampling method to use.seed (
int) – The random seed to use for downsampling. Optional.Examples
>>> from datarobot.models.recipe_operation import SmartDownsamplingOperation, SmartDownsamplingMethod >>> op = SmartDownsamplingOperation(max_rows=1000, method=SmartDownsamplingMethod.BINARY)
Wrangling Operations
- class datarobot.models.recipe_operation.WranglingOperation
Base class for data wrangling operations.
- class datarobot.models.recipe_operation.LagsOperation
Data wrangling operation to create one or more lags for a feature based off of a datetime ordering feature. This operation will create a new column for each lag order specified.
- Parameters:
column (
str) – Column name to create lags for.orders (
List[int]) – List of lag orders to create.datetime_partition_column (
str) – Column name used to partition the data by datetime. Used to order the data for lag creation.multiseries_id_column (
Optional[str]) – Column name used to identify time series within the data. Required only for multiseries.Examples
Create lags of orders 1, 5 and 30 in stock price data on opening price column “open_price”, ordered by datetime column “date”. The data contains multiple time series identified by “ticker_symbol”:
>>> import datarobot as dr >>> from datarobot.models.recipe_operation import LagsOperation >>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f') >>> lags_op = LagsOperation( ... column="open_price", ... orders=[1, 5, 30], ... datetime_partition_column="date", ... multiseries_id_column="ticker_symbol", ... ) >>> recipe.update(operations=[lags_op])
- class datarobot.models.recipe_operation.WindowCategoricalStatsOperation
Data wrangling operation to calculate categorical statistics for a rolling window. This operation will create a new column for each method specified.
- Parameters:
column (
str) – Column name to create rolling statistics for.window_size (
int) – Number of rows to include in the rolling window.methods (
List[CategoricalStatsMethods]) – List of methods to apply for rolling statistics. Currently only supports datarobot.enums.CategoricalStatsMethods.MOST_FREQUENT.datetime_partition_column (
str) – Column name used to partition the data by datetime. Used to order the timeseries data.multiseries_id_column (
Optional[str]) – Column name used to identify each time series within the data. Required only for multiseries.rolling_most_frequent_udf (
Optional[str]) – Fully qualified path to rolling most frequent user defined function. Used to optimize sql execution with snowflake.Examples
Create rolling categorical statistics to track the most frequent product category purchased by customers based on their last 50 purchases:
>>> import datarobot as dr >>> from datarobot.models.recipe_operation import WindowCategoricalStatsOperation >>> from datarobot.enums import CategoricalStatsMethods >>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f') >>> window_cat_stats_op = WindowCategoricalStatsOperation( ... column="product_category", ... window_size=50, ... methods=[CategoricalStatsMethods.MOST_FREQUENT], ... datetime_partition_column="purchase_date", ... multiseries_id_column="customer_id", ... ) >>> recipe.update(operations=[window_cat_stats_op])
- class datarobot.models.recipe_operation.WindowNumericStatsOperation
Data wrangling operation to calculate numeric statistics for a rolling window. This operation will create one or more new columns.
- Parameters:
column (
str) – Column name to create rolling statistics for.window_size (
int) – Number of rows to include in the rolling window.methods (
List[NumericStatsMethods]) – List of methods to apply for rolling statistics. A new column will be created for each method.datetime_partition_column (
str) – Column name used to partition the data by datetime. Used to order the timeseries data.multiseries_id_column (
Optional[str]) – Column name used to identify each time series within the data. Required only for multiseries.rolling_median_udf (
Optional[str]) – Fully qualified path to a rolling median user-defined function. Used to optimize SQL execution with Snowflake.Examples
Create rolling numeric statistics to track the maximum, minimum, and median stock prices over the last 7 trading sessions:
>>> import datarobot as dr >>> from datarobot.models.recipe_operation import WindowNumericStatsOperation >>> from datarobot.enums import NumericStatsMethods >>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f') >>> window_num_stats_op = WindowNumericStatsOperation( ... column="stock_price", ... window_size=7, ... methods=[ ... NumericStatsMethods.MAX, ... NumericStatsMethods.MIN, ... NumericStatsMethods.MEDIAN, ... ], ... datetime_partition_column="trading_date", ... multiseries_id_column="ticker_symbol", ... ) >>> recipe.update(operations=[window_num_stats_op])
- class datarobot.models.recipe_operation.TimeSeriesOperation
Data wrangling operation to generate a dataset ready for time series modeling: with forecast point, forecast distances, known in advance columns, etc.
- Parameters:
target_column (
str) – Target column to use for generating naive baseline features during feature reduction.datetime_partition_column (
str) – Column name used to partition the data by datetime. Used to order the time series data.forecast_distances (
List[int]) – List of forecast distances to generate features for. Each distance represents a relative position that determines how many rows ahead to predict.task_plan (
List[TaskPlanElement]) – List of task plans for each column.baseline_periods (
Optional[List[int]]) – List of integers representing the periodicities used to generate naive baseline features from the target. Baseline period = 1 corresponds to the naive latest baseline.known_in_advance_columns (
Optional[List[str]]) – List of columns that are known in advance at prediction time, i.e. features that do not need to be lagged.multiseries_id_column (
Optional[str]) – Column name used to identify each time series within the data. Required only for multiseries.rolling_median_udf (
Optional[str]) – Fully qualified path to rolling median user defined function. Used to optimize SQL execution with Snowflake.rolling_most_frequent_udf (
Optional[str]) – Fully qualified path to rolling most frequent user defined function.forecast_point (
Optional[datetime]) – To use at prediction time.Examples
Create a time series operation for sales forecasting with forecast distances of 7 and 30 days, using the sale amount as the target column, the date of the sale for datetime ordering, and “store_id” as the multiseries identifier. The operation includes a task plan to compute lags of orders 1, 7, and 30 on the sales amount, and specifies known in advance columns “promotion” and “holiday_flag”:
>>> import datarobot as dr >>> from datarobot.models.recipe_operation import TimeSeriesOperation, TaskPlanElement, Lags >>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f') >>> task_plan = [ ... TaskPlanElement( ... column="sales_amount", ... task_list=[Lags(orders=[1, 7, 30])] ... ) ... ] >>> time_series_op = TimeSeriesOperation( ... target_column="sales_amount", ... datetime_partition_column="sale_date", ... forecast_distances=[7, 30], ... task_plan=[task_plan], ... known_in_advance_columns=["promotion", "holiday_flag"], ... multiseries_id_column="store_id" ... ) >>> recipe.update(operations=[time_series_op])
- class datarobot.models.recipe_operation.ComputeNewOperation
Data wrangling operation to create a new feature computed using a SQL expression.
- Parameters:
expression (
str) – SQL expression to compute the new feature.new_feature_name (
str) – Name of the new feature.Examples
Create a new feature “total_sales” by summing the total of “online_sales” and “in_store_sales”, rounded to the nearest dollar:
>>> import datarobot as dr >>> from datarobot.models.recipe_operation import ComputeNewOperation >>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f') >>> compute_new_op = ComputeNewOperation( ... expression="ROUND(online_sales + in_store_sales, 0)", ... new_feature_name="total_sales" ... ) >>> recipe.update(operations=[compute_new_op])
- class datarobot.models.recipe_operation.RenameColumnsOperation
Data wrangling operation to rename one or more columns.
- Parameters:
column_mappings (
Dict[str,str]) – Mapping of original column names to new column names.Examples
Rename columns “old_name1” to “new_name1” and “old_name2” to “new_name2”:
>>> import datarobot as dr >>> from datarobot.models.recipe_operation import RenameColumnsOperation >>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f') >>> rename_op = RenameColumnsOperation( ... column_mappings={'old_name1': 'new_name1', 'old_name2': 'new_name2'} ... ) >>> recipe.update(operations=[rename_op])
- class datarobot.models.recipe_operation.FilterOperation
Data wrangling operation to filter rows based on one or more conditions.
- Parameters:
conditions (
List[FilterCondition]) – List of conditions to filter on.keep_rows (
Optional[bool]) – If matching rows should be kept or dropped.operator (
Optional[str]) – Operator to use between conditions when using multiple conditions. Allowed values: [and, or].Examples
Filter input to only keep users older than 18:
>>> import datarobot as dr >>> from datarobot.models.recipe_operation import FilterOperation, FilterCondition >>> from datarobot.enums import FilterOperationFunctions >>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f') >>> condition = FilterCondition( ... column="age", ... function=FilterOperationFunctions.GREATER_THAN, ... function_arguments=[18] ... ) >>> filter_op = FilterOperation(conditions=[condition], keep_rows=True) >>> recipe.update(operations=[filter_op])Filter input to filter out rows where “status” is either “inactive” or “banned”:
>>> import datarobot as dr >>> from datarobot.models.recipe_operation import FilterOperation, FilterCondition >>> from datarobot.enums import FilterOperationFunctions >>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f') >>> inactive_cond = FilterCondition( ... column="status", ... function=FilterOperationFunctions.EQUALS, ... function_arguments=["inactive"] ... ) >>> banned_cond = FilterCondition( ... column="status", ... function=FilterOperationFunctions.EQUALS, ... function_arguments=["banned"] ... ) >>> filter_op = FilterOperation( ... conditions=[inactive_cond, banned_cond], ... keep_rows=False, ... operator="or" ... ) >>> recipe.update(operations=[filter_op])
- class datarobot.models.recipe_operation.DropColumnsOperation
Data wrangling operation to drop one or more columns.
- Parameters:
columns (
List[str]) – Columns to drop.Examples
>>> import datarobot as dr >>> from datarobot.models.recipe_operation import DropColumnsOperation >>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f') >>> drop_op = DropColumnsOperation(columns=['col1', 'col2']) >>> recipe.update(operations=[drop_op])
- class datarobot.models.recipe_operation.DedupeRowsOperation
Data wrangling operation to remove duplicate rows. Uses values from all columns.
Examples
>>> import datarobot as dr >>> from datarobot.models.recipe_operation import DedupeRowsOperation >>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f') >>> dedupe_op = DedupeRowsOperation() >>> recipe.update(operations=[dedupe_op])
- class datarobot.models.recipe_operation.FindAndReplaceOperation
Data wrangling operation to find and replace strings in a column.
- Parameters:
column (
str) – Column name to perform find and replace on.find (
str) – String or expression to find.replace_with (
str) – String to replace with.match_mode (
FindAndReplaceMatchMode) – Match mode to use when finding strings.is_case_sensitive (
bool) – Whether the find operation should be case sensitive.Examples
Set Recipe operations to search for exact match of “old_value” in column “col1” and replace with “new_value”:
>>> import datarobot as dr >>> from datarobot.models.recipe_operation import FindAndReplaceOperation >>> from datarobot.enums importFindAndReplaceMatchMode >>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f') >>> find_replace_op = FindAndReplaceOperation( ... column="col1", ... find="old_value", ... replace_with="new_value", ... match_mode=FindAndReplaceMatchMode.EXACT, ... is_case_sensitive=True ... ) >>> recipe.update(operations=[find_replace_op])Set Recipe operations to use regular expression to replace names starting with “Brand” in column “name” and replace with “Lyra”:
>>> import datarobot as dr >>> from datarobot.models.recipe_operation import FindAndReplaceOperation >>> from datarobot.enums import FindAndReplaceMatchMode >>> recipe = dr.Recipe.get('690bbf77aa31530d8287ae5f') >>> find_replace_op = FindAndReplaceOperation( ... column="name", ... find="^Brand.*", ... replace_with="Lyra", ... match_mode=FindAndReplaceMatchMode.REGEX ... ) >>> recipe.update(operations=[find_replace_op])
Enums and Helpers
- class datarobot.models.recipe_operation.TaskPlanElement
Represents a task plan element for a specific column in a time series operation.
- Parameters:
column (
str) – Column name for which the task plan is defined.task_list (
List[BaseTimeAwareTask]) – List of time-aware tasks to be applied to the column.
- class datarobot.models.recipe_operation.BaseTimeAwareTask
Base class for time-aware tasks in time series operation task plan.
- class datarobot.models.recipe_operation.CategoricalStats
Time-aware task to compute categorical statistics for a rolling window.
- Parameters:
methods (
List[CategoricalStatsMethods]) – List of categorical statistical methods to apply for rolling statistics.window_size (
int) – Number of rows to include in the rolling window.
- class datarobot.models.recipe_operation.NumericStats
Time-aware task to compute numeric statistics for a rolling window.
- Parameters:
methods (
List[NumericStatsMethods]) – List of numeric statistical methods to apply for rolling statistics.window_size (
int) – Number of rows to include in the rolling window.
- class datarobot.models.recipe_operation.Lags
Time-aware task to create one or more lags for a feature.
- Parameters:
orders (
List[int]) – List of lag orders to create.
- class datarobot.enums.CategoricalStatsMethods
Supported categorical stats methods for data wrangling.
- class datarobot.enums.NumericStatsMethods
Supported numeric stats methods for data wrangling.
- class datarobot.models.recipe_operation.FilterCondition
Condition to filter rows in a FilterOperation.
- Parameters:
column (
str) – Column name to apply the condition on.function (
FilterOperationFunctions) – The filtering function to use.function_arguments (
List[Union[str,int,float]]) – The list of arguments for the filtering function.Examples
FilterCondition to filter rows where “age” is between 18 and 65:
>>> from datarobot.models.recipe_operation import FilterCondition >>> from datarobot.enums import FilterOperationFunctions >>> condition = FilterCondition( ... column="age", ... function=FilterOperationFunctions.BETWEEN, ... function_arguments=[18, 65] ... )
- class datarobot.enums.FilterOperationFunctions
Operations supported in a FilterCondition.
- class datarobot.enums.RecipeType
- class datarobot.enums.DataWranglingDialect
- class datarobot.enums.FindAndReplaceMatchMode
Find and replace modes used when searching for strings to replace.