Data Engine Query Generator
- class datarobot.DataEngineQueryGenerator(**generator_kwargs)
DataEngineQueryGenerator is used to set up time series data prep.
Added in version v2.27.
- Attributes:
- id: str
id of the query generator
- query: str
text of the generated Spark SQL query
- datasets: list(QueryGeneratorDataset)
datasets associated with the query generator
- generator_settings: QueryGeneratorSettings
the settings used to define the query
- generator_type: str
“TimeSeries” is the only supported type
- classmethod create(generator_type, datasets, generator_settings)
Creates a query generator entity.
Added in version v2.27.
- Parameters:
- generator_typestr
Type of data engine query generator
- datasetsList[QueryGeneratorDataset]
Source datasets in the Data Engine workspace.
- generator_settingsdict
Data engine generator settings of the given generator_type.
- Returns:
- query_generatorDataEngineQueryGenerator
The created generator
Examples
import datarobot as dr from datarobot.models.data_engine_query_generator import ( QueryGeneratorDataset, QueryGeneratorSettings, ) dataset = QueryGeneratorDataset( alias='My_Awesome_Dataset_csv', dataset_id='61093144cabd630828bca321', dataset_version_id=1, ) settings = QueryGeneratorSettings( datetime_partition_column='date', time_unit='DAY', time_step=1, default_numeric_aggregation_method='sum', default_categorical_aggregation_method='mostFrequent', ) g = dr.DataEngineQueryGenerator.create( generator_type='TimeSeries', datasets=[dataset], generator_settings=settings, ) g.id >>>'54e639a18bd88f08078ca831' g.generator_type >>>'TimeSeries'
- classmethod get(generator_id)
Gets information about a query generator.
- Parameters:
- generator_idstr
The identifier of the query generator you want to load.
- Returns:
- query_generatorDataEngineQueryGenerator
The queried generator
Examples
import datarobot as dr g = dr.DataEngineQueryGenerator.get(generator_id='54e639a18bd88f08078ca831') g.id >>>'54e639a18bd88f08078ca831' g.generator_type >>>'TimeSeries'
- create_dataset(dataset_id=None, dataset_version_id=None, max_wait=600)
A blocking call that creates a new Dataset from the query generator. Returns when the dataset has been successfully processed. If optional parameters are not specified the query is applied to the dataset_id and dataset_version_id stored in the query generator. If specified they will override the stored dataset_id/dataset_version_id, i.e. to prep a prediction dataset.
- Parameters:
- dataset_id: str, optional
The id of the unprepped dataset to apply the query to
- dataset_version_id: str, optional
The version_id of the unprepped dataset to apply the query to
- Returns:
- response: Dataset
The Dataset created from the query generator
- prepare_prediction_dataset_from_catalog(project_id, dataset_id, dataset_version_id=None, max_wait=600, relax_known_in_advance_features_check=None)
Apply time series data prep to a catalog dataset and upload it to the project as a PredictionDataset. :rtype:
PredictionDataset
Added in version v3.1.
- Parameters:
- project_idstr
The id of the project to which you upload the prediction dataset.
- dataset_idstr
The identifier of the dataset.
- dataset_version_idstr, optional
The version id of the dataset to use.
- max_waitint, optional
Optional, the maximum number of seconds to wait before giving up.
- relax_known_in_advance_features_checkbool, optional
For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- Returns:
- datasetPredictionDataset
The newly uploaded dataset.
- prepare_prediction_dataset(sourcedata, project_id, max_wait=600, relax_known_in_advance_features_check=None)
Apply time series data prep and upload the PredictionDataset to the project. :rtype:
PredictionDataset
Added in version v3.1.
- Parameters:
- sourcedatastr, file or pandas.DataFrame
Data to be used for predictions. If it is a string, it can be either a path to a local file, or raw file content. If using a file on disk, the filename must consist of ASCII characters only.
- project_idstr
The id of the project to which you upload the prediction dataset.
- max_waitint, optional
The maximum number of seconds to wait for the uploaded dataset to be processed before raising an error.
- relax_known_in_advance_features_checkbool, optional
For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- Returns
- ——-
- datasetPredictionDataset
The newly uploaded dataset.
- Raises:
- InputNotUnderstoodError
Raised if
sourcedata
isn’t one of supported types.- AsyncFailureError
Raised if polling for the status of an async process resulted in a response with an unsupported status code.
- AsyncProcessUnsuccessfulError
Raised if project creation was unsuccessful (i.e. the server reported an error in uploading the dataset).
- AsyncTimeoutError
Raised if processing the uploaded dataset took more time than specified by the
max_wait
parameter.