Data engine query generator
- class datarobot.DataEngineQueryGenerator
DataEngineQueryGenerator is used to set up time series data prep.
Added in version v2.27.
- Variables:
id (
str
) – id of the query generatorquery (
str
) – text of the generated Spark SQL querydatasets (
list(QueryGeneratorDataset)
) – datasets associated with the query generatorgenerator_settings (
QueryGeneratorSettings
) – the settings used to define the querygenerator_type (
str
) – “TimeSeries” is the only supported type
- classmethod create(generator_type, datasets, generator_settings)
Creates a query generator entity.
Added in version v2.27.
- Parameters:
generator_type (
str
) – Type of data engine query generatordatasets (
List[QueryGeneratorDataset]
) – Source datasets in the Data Engine workspace.generator_settings (
dict
) – Data engine generator settings of the given generator_type.
- Returns:
query_generator – The created generator
- Return type:
Examples
import datarobot as dr from datarobot.models.data_engine_query_generator import ( QueryGeneratorDataset, QueryGeneratorSettings, ) dataset = QueryGeneratorDataset( alias='My_Awesome_Dataset_csv', dataset_id='61093144cabd630828bca321', dataset_version_id=1, ) settings = QueryGeneratorSettings( datetime_partition_column='date', time_unit='DAY', time_step=1, default_numeric_aggregation_method='sum', default_categorical_aggregation_method='mostFrequent', ) g = dr.DataEngineQueryGenerator.create( generator_type='TimeSeries', datasets=[dataset], generator_settings=settings, ) g.id >>>'54e639a18bd88f08078ca831' g.generator_type >>>'TimeSeries'
- classmethod get(generator_id)
Gets information about a query generator.
- Parameters:
generator_id (
str
) – The identifier of the query generator you want to load.- Returns:
query_generator – The queried generator
- Return type:
Examples
import datarobot as dr g = dr.DataEngineQueryGenerator.get(generator_id='54e639a18bd88f08078ca831') g.id >>>'54e639a18bd88f08078ca831' g.generator_type >>>'TimeSeries'
- create_dataset(dataset_id=None, dataset_version_id=None, max_wait=600)
A blocking call that creates a new Dataset from the query generator. Returns when the dataset has been successfully processed. If optional parameters are not specified the query is applied to the dataset_id and dataset_version_id stored in the query generator. If specified they will override the stored dataset_id/dataset_version_id, i.e. to prep a prediction dataset.
- Parameters:
dataset_id (
Optional[str]
) – The id of the unprepped dataset to apply the query todataset_version_id (
Optional[str]
) – The version_id of the unprepped dataset to apply the query to
- Returns:
response – The Dataset created from the query generator
- Return type:
Dataset
- prepare_prediction_dataset_from_catalog(project_id, dataset_id, dataset_version_id=None, max_wait=600, relax_known_in_advance_features_check=None)
Apply time series data prep to a catalog dataset and upload it to the project as a PredictionDataset.
Added in version v3.1.
- Parameters:
project_id (
str
) – The id of the project to which you upload the prediction dataset.dataset_id (
str
) – The identifier of the dataset.dataset_version_id (
Optional[str]
) – The version id of the dataset to use.max_wait (
Optional[int]
) – Optional, the maximum number of seconds to wait before giving up.relax_known_in_advance_features_check (
Optional[bool]
) – For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- Returns:
dataset – The newly uploaded dataset.
- Return type:
PredictionDataset
- prepare_prediction_dataset(sourcedata, project_id, max_wait=600, relax_known_in_advance_features_check=None)
Apply time series data prep and upload the PredictionDataset to the project.
Added in version v3.1.
- Parameters:
sourcedata (
str
,file
orpandas.DataFrame
) – Data to be used for predictions. If it is a string, it can be either a path to a local file, or raw file content. If using a file on disk, the filename must consist of ASCII characters only.project_id (
str
) – The id of the project to which you upload the prediction dataset.max_wait (
Optional[int]
) – The maximum number of seconds to wait for the uploaded dataset to be processed before raising an error.relax_known_in_advance_features_check (
Optional[bool]
) – For time series projects only. If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- Returns:
dataset – The newly uploaded dataset.
- Return type:
PredictionDataset
- Raises:
InputNotUnderstoodError – Raised if
sourcedata
isn’t one of supported types.AsyncFailureError – Raised if polling for the status of an async process resulted in a response with an unsupported status code.
AsyncProcessUnsuccessfulError – Raised if project creation was unsuccessful (i.e. the server reported an error in uploading the dataset).
AsyncTimeoutError – Raised if processing the uploaded dataset took more time than specified by the
max_wait
parameter.