Data Engine Query Generator

class datarobot.DataEngineQueryGenerator(**generator_kwargs)

DataEngineQueryGenerator is used to set up time series data prep.

Added in version v2.27.

Attributes:
id: str

id of the query generator

query: str

text of the generated Spark SQL query

datasets: list(QueryGeneratorDataset)

datasets associated with the query generator

generator_settings: QueryGeneratorSettings

the settings used to define the query

generator_type: str

“TimeSeries” is the only supported type

classmethod create(generator_type, datasets, generator_settings)

Creates a query generator entity.

Added in version v2.27.

Parameters:
generator_typestr

Type of data engine query generator

datasetsList[QueryGeneratorDataset]

Source datasets in the Data Engine workspace.

generator_settingsdict

Data engine generator settings of the given generator_type.

Returns:
query_generatorDataEngineQueryGenerator

The created generator

Examples

import datarobot as dr
from datarobot.models.data_engine_query_generator import (
   QueryGeneratorDataset,
   QueryGeneratorSettings,
)
dataset = QueryGeneratorDataset(
   alias='My_Awesome_Dataset_csv',
   dataset_id='61093144cabd630828bca321',
   dataset_version_id=1,
)
settings = QueryGeneratorSettings(
   datetime_partition_column='date',
   time_unit='DAY',
   time_step=1,
   default_numeric_aggregation_method='sum',
   default_categorical_aggregation_method='mostFrequent',
)
g = dr.DataEngineQueryGenerator.create(
   generator_type='TimeSeries',
   datasets=[dataset],
   generator_settings=settings,
)
g.id
>>>'54e639a18bd88f08078ca831'
g.generator_type
>>>'TimeSeries'
classmethod get(generator_id)

Gets information about a query generator.

Parameters:
generator_idstr

The identifier of the query generator you want to load.

Returns:
query_generatorDataEngineQueryGenerator

The queried generator

Examples

import datarobot as dr
g = dr.DataEngineQueryGenerator.get(generator_id='54e639a18bd88f08078ca831')
g.id
>>>'54e639a18bd88f08078ca831'
g.generator_type
>>>'TimeSeries'
create_dataset(dataset_id=None, dataset_version_id=None, max_wait=600)

A blocking call that creates a new Dataset from the query generator. Returns when the dataset has been successfully processed. If optional parameters are not specified the query is applied to the dataset_id and dataset_version_id stored in the query generator. If specified they will override the stored dataset_id/dataset_version_id, i.e. to prep a prediction dataset.

Parameters:
dataset_id: str, optional

The id of the unprepped dataset to apply the query to

dataset_version_id: str, optional

The version_id of the unprepped dataset to apply the query to

Returns:
response: Dataset

The Dataset created from the query generator

prepare_prediction_dataset_from_catalog(project_id, dataset_id, dataset_version_id=None, max_wait=600, relax_known_in_advance_features_check=None)

Apply time series data prep to a catalog dataset and upload it to the project as a PredictionDataset. :rtype: PredictionDataset

Added in version v3.1.

Parameters:
project_idstr

The id of the project to which you upload the prediction dataset.

dataset_idstr

The identifier of the dataset.

dataset_version_idstr, optional

The version id of the dataset to use.

max_waitint, optional

Optional, the maximum number of seconds to wait before giving up.

relax_known_in_advance_features_checkbool, optional

For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

Returns:
datasetPredictionDataset

The newly uploaded dataset.

prepare_prediction_dataset(sourcedata, project_id, max_wait=600, relax_known_in_advance_features_check=None)

Apply time series data prep and upload the PredictionDataset to the project. :rtype: PredictionDataset

Added in version v3.1.

Parameters:
sourcedatastr, file or pandas.DataFrame

Data to be used for predictions. If it is a string, it can be either a path to a local file, or raw file content. If using a file on disk, the filename must consist of ASCII characters only.

project_idstr

The id of the project to which you upload the prediction dataset.

max_waitint, optional

The maximum number of seconds to wait for the uploaded dataset to be processed before raising an error.

relax_known_in_advance_features_checkbool, optional

For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

Returns
——-
datasetPredictionDataset

The newly uploaded dataset.

Raises:
InputNotUnderstoodError

Raised if sourcedata isn’t one of supported types.

AsyncFailureError

Raised if polling for the status of an async process resulted in a response with an unsupported status code.

AsyncProcessUnsuccessfulError

Raised if project creation was unsuccessful (i.e. the server reported an error in uploading the dataset).

AsyncTimeoutError

Raised if processing the uploaded dataset took more time than specified by the max_wait parameter.