Batch Predictions
- class datarobot.models.BatchPredictionJob(data, completed_resource_url=None)
A Batch Prediction Job is used to score large data sets on prediction servers using the Batch Prediction API.
- Attributes:
- idstr
the id of the job
- classmethod score(deployment, intake_settings=None, output_settings=None, csv_settings=None, timeseries_settings=None, num_concurrent=None, chunk_size=None, passthrough_columns=None, passthrough_columns_set=None, max_explanations=None, max_ngram_explanations=None, explanation_algorithm=None, threshold_high=None, threshold_low=None, prediction_threshold=None, prediction_warning_enabled=None, include_prediction_status=False, skip_drift_tracking=False, prediction_instance=None, abort_on_error=True, column_names_remapping=None, include_probabilities=True, include_probabilities_classes=None, download_timeout=120, download_read_timeout=660, upload_read_timeout=600, explanations_mode=None)
Create new batch prediction job, upload the scoring dataset and return a batch prediction job.
The default intake and output options are both localFile which requires the caller to pass the file parameter and either download the results using the download() method afterwards or pass a path to a file where the scored data will be downloaded to afterwards.
- Returns:
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes:
- deploymentDeployment or string ID
Deployment which will be used for scoring.
- intake_settingsdict (optional)
A dict configuring how data is coming from. Supported options:
type : string, either localFile, s3, azure, gcp, dataset, jdbc snowflake, synapse or bigquery
Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a dr.Dataset object.
To score from a local file, add the this parameter to the settings:
file : file-like object, string path to file or a pandas.DataFrame of scoring data
To score from S3, add the next parameters to the settings:
url : string, the URL to score (e.g.: s3://bucket/key)
credential_id : string (optional)
endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default)
To score from JDBC, add the next parameters to the settings:
data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).
query : string (optional if table, schema and/or catalog is specified), a self-supplied SELECT statement of the data set you wish to predict.
table : string (optional if query is specified), the name of specified database table.
schema : string (optional if query is specified), the name of specified database schema.
catalog : string (optional if query is specified), (new in v2.22) the name of specified database catalog.
fetch_size : int (optional), Changing the fetchSize can be used to balance throughput and memory usage.
credential_id : string (optional) the ID of the credentials holding information about a user with read-access to the JDBC data source (see Credentials).
- output_settingsdict (optional)
A dict configuring how scored data is to be saved. Supported options:
type : string, either localFile, s3, azure, gcp, jdbc, snowflake, synapse or bigquery
To save scored data to a local file, add this parameters to the settings:
path : string (optional), path to save the scored data as CSV. If a path is not specified, you must download the scored data yourself with job.download(). If a path is specified, the call will block until the job is done. if there are no other jobs currently processing for the targeted prediction instance, uploading, scoring, downloading will happen in parallel without waiting for a full job to complete. Otherwise, it will still block, but start downloading the scored data as soon as it starts generating data. This is the fastest method to get predictions.
To save scored data to S3, add the next parameters to the settings:
url : string, the URL for storing the results (e.g.: s3://bucket/key)
credential_id : string (optional)
endpoint_url : string (optional), any non-default endpoint URL for S3 access (omit to use the default)
To save scored data to JDBC, add the next parameters to the settings:
data_store_id : string, the ID of the external data store connected to the JDBC data source (see Database Connectivity).
table : string, the name of specified database table.
schema : string (optional), the name of specified database schema.
catalog : string (optional), (new in v2.22) the name of specified database catalog.
statement_type : string, the type of insertion statement to create, one of
datarobot.enums.AVAILABLE_STATEMENT_TYPES
.update_columns : list(string) (optional), a list of strings containing those column names to be updated in case statement_type is set to a value related to update or upsert.
where_columns : list(string) (optional), a list of strings containing those column names to be selected in case statement_type is set to a value related to insert or update.
credential_id : string, the ID of the credentials holding information about a user with write-access to the JDBC data source (see Credentials).
create_table_if_not_exists : bool (optional), If no existing table is detected, attempt to create it before writing data with the strategy defined in the statementType parameter.
- csv_settingsdict (optional)
CSV intake and output settings. Supported options:
delimiter : string (optional, default ,), fields are delimited by this character. Use the string tab to denote TSV (TAB separated values). Must be either a one-character string or the string tab.
quotechar : string (optional, default “), fields containing the delimiter must be quoted using this character.
encoding : string (optional, default utf-8), encoding for the CSV files. For example (but not limited to): shift_jis, latin_1 or mskanji.
- timeseries_settingsdict (optional)
Configuration for time-series scoring. Supported options:
type : string, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point or rows in the dataset without target. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.
forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions, by default value will be inferred from the dataset. May be passed if
timeseries_settings.type=forecast
.predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- num_concurrentint (optional)
Number of concurrent chunks to score simultaneously. Defaults to the available number of cores of the deployment. Lower it to leave resources for real-time scoring.
- chunk_sizestring or int (optional)
Which strategy should be used to determine the chunk size. Can be either a named strategy or a fixed size in bytes. - auto: use fixed or dynamic based on flipper - fixed: use 1MB for explanations, 5MB for regular requests - dynamic: use dynamic chunk sizes - int: use this many bytes per chunk
- passthrough_columnslist[string] (optional)
Keep these columns from the scoring dataset in the scored dataset. This is useful for correlating predictions with source data.
- passthrough_columns_setstring (optional)
To pass through every column from the scoring dataset, set this to all. Takes precedence over passthrough_columns if set.
- max_explanationsint (optional)
Compute prediction explanations for this amount of features.
- max_ngram_explanationsint or str (optional)
Compute text explanations for this amount of ngrams. Set to all to return all ngram explanations, or set to a positive integer value to limit the amount of ngram explanations returned. By default no ngram explanations will be computed and returned.
- threshold_highfloat (optional)
Only compute prediction explanations for predictions above this threshold. Can be combined with threshold_low.
- threshold_lowfloat (optional)
Only compute prediction explanations for predictions below this threshold. Can be combined with threshold_high.
- explanations_modePredictionExplanationsMode, optional
Mode of prediction explanations calculation for multiclass and clustering models, if not specified - server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).
- prediction_warning_enabledboolean (optional)
Add prediction warnings to the scored data. Currently only supported for regression models.
- include_prediction_statusboolean (optional)
Include the prediction_status column in the output, defaults to False.
- skip_drift_trackingboolean (optional)
Skips drift tracking on any predictions made from this job. This is useful when running non-production workloads to not affect drift tracking and cause unnecessary alerts. Defaults to False.
- prediction_instancedict (optional)
Defaults to instance specified by deployment or system configuration. Supported options:
hostName : string
sslEnabled : boolean (optional, default true). Set to false to run prediction requests from the batch prediction job without SSL.
datarobotKey : string (optional), if running a job against a prediction instance in the Managed AI Cloud, you must provide the organization level DataRobot-Key
apiKey : string (optional), by default, prediction requests will use the API key of the user that created the job. This allows you to make requests on behalf of other users.
- abort_on_errorboolean (optional)
Default behavior is to abort the job if too many rows fail scoring. This will free up resources for other jobs that may score successfully. Set to false to unconditionally score every row no matter how many errors are encountered. Defaults to True.
- column_names_remappingdict (optional)
Mapping with column renaming for output table. Defaults to {}.
- include_probabilitiesboolean (optional)
Flag that enables returning of all probability columns. Defaults to True.
- include_probabilities_classeslist (optional)
List the subset of classes if a user doesn’t want all the classes. Defaults to [].
- download_timeoutint (optional)
Added in version 2.22.
If using localFile output, wait this many seconds for the download to become available. See download().
- download_read_timeoutint (optional, default 660)
Added in version 2.22.
If using localFile output, wait this many seconds for the server to respond between chunks.
- upload_read_timeout: int (optional, default 600)
Added in version 2.28.
If using localFile intake, wait this many seconds for the server to respond after whole dataset upload.
- prediction_threshold: float (optional)
Added in version 3.4.0.
Threshold is the point that sets the class boundary for a predicted value. The model classifies an observation below the threshold as FALSE, and an observation above the threshold as TRUE. In other words, DataRobot automatically assigns the positive class label to any prediction exceeding the threshold. This value can be set between 0.0 and 1.0.
- Return type:
- classmethod apply_time_series_data_prep_and_score(deployment, intake_settings, timeseries_settings, **kwargs)
Prepare the dataset with time series data prep, create new batch prediction job, upload the scoring dataset, and return a batch prediction job.
The supported intake_settings are of type localFile or dataset.
For timeseries_settings of type forecast the forecast_point must be specified.
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters. :rtype:BatchPredictionJob
Added in version v3.1.
- Returns:
- BatchPredictionJob
Instance of BatchPredictionJob
- Raises:
- InvalidUsageError
If the deployment does not support time series data prep. If the intake type is not supported for time series data prep.
- Attributes:
- deploymentDeployment
Deployment which will be used for scoring.
- intake_settingsdict
A dict configuring where data is coming from. Supported options:
type : string, either localFile, dataset
Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a
Dataset
object.To score from a local file, add this parameter to the settings:
file : file-like object, string path to file or a pandas.DataFrame of scoring data.
- timeseries_settingsdict
Configuration for time-series scoring. Supported options:
type : string, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.
forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions. Must be passed if
timeseries_settings.type=forecast
.predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- classmethod score_to_file(deployment, intake_path, output_path, **kwargs)
Create new batch prediction job, upload the scoring dataset and download the scored CSV file concurrently.
Will block until the entire file is scored.
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters.- Returns:
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes:
- deploymentDeployment or string ID
Deployment which will be used for scoring.
- intake_pathfile-like object/string path to file/pandas.DataFrame
Scoring data
- output_pathstr
Filename to save the result under
- classmethod apply_time_series_data_prep_and_score_to_file(deployment, intake_path, output_path, timeseries_settings, **kwargs)
Prepare the input dataset with time series data prep. Then, create a new batch prediction job using the prepared AI catalog item as input and concurrently download the scored CSV file.
The function call will return when the entire file is scored.
For timeseries_settings of type forecast the forecast_point must be specified.
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters. :rtype:BatchPredictionJob
Added in version v3.1.
- Returns:
- BatchPredictionJob
Instance of BatchPredictionJob.
- Raises:
- InvalidUsageError
If the deployment does not support time series data prep.
- Attributes:
- deploymentDeployment
The deployment which will be used for scoring.
- intake_pathfile-like object/string path to file/pandas.DataFrame
The scoring data.
- output_pathstr
The filename under which you save the result.
- timeseries_settingsdict
Configuration for time-series scoring. Supported options:
type : string, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.
forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions. Must be passed if
timeseries_settings.type=forecast
.predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- classmethod score_s3(deployment, source_url, destination_url, credential=None, endpoint_url=None, **kwargs)
Create new batch prediction job, with a scoring dataset from S3 and writing the result back to S3.
This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job)
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters.- Returns:
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes:
- deploymentDeployment or string ID
Deployment which will be used for scoring.
- source_urlstring
The URL for the prediction dataset (e.g.: s3://bucket/key)
- destination_urlstring
The URL for the scored dataset (e.g.: s3://bucket/key)
- credentialstring or Credential (optional)
The AWS Credential object or credential id
- endpoint_urlstring (optional)
Any non-default endpoint URL for S3 access (omit to use the default)
- classmethod score_azure(deployment, source_url, destination_url, credential=None, **kwargs)
Create new batch prediction job, with a scoring dataset from Azure blob storage and writing the result back to Azure blob storage.
This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job).
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters.- Returns:
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes:
- deploymentDeployment or string ID
Deployment which will be used for scoring.
- source_urlstring
The URL for the prediction dataset (e.g.: https://storage_account.blob.endpoint/container/blob_name)
- destination_urlstring
The URL for the scored dataset (e.g.: https://storage_account.blob.endpoint/container/blob_name)
- credentialstring or Credential (optional)
The Azure Credential object or credential id
- classmethod score_gcp(deployment, source_url, destination_url, credential=None, **kwargs)
Create new batch prediction job, with a scoring dataset from Google Cloud Storage and writing the result back to one.
This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job).
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters.- Returns:
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes:
- deploymentDeployment or string ID
Deployment which will be used for scoring.
- source_urlstring
The URL for the prediction dataset (e.g.: http(s)://storage.googleapis.com/[bucket]/[object])
- destination_urlstring
The URL for the scored dataset (e.g.: http(s)://storage.googleapis.com/[bucket]/[object])
- credentialstring or Credential (optional)
The GCP Credential object or credential id
- classmethod score_from_existing(batch_prediction_job_id)
Create a new batch prediction job based on the settings from a previously created one
- Returns:
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes:
- batch_prediction_job_id: str
ID of the previous batch prediction job
- Return type:
- classmethod score_pandas(deployment, df, read_timeout=660, **kwargs)
Run a batch prediction job, with a scoring dataset from a pandas dataframe. The output from the prediction will be joined to the passed DataFrame and returned.
Use columnNamesRemapping to drop or rename columns in the output
This method blocks until the job has completed or raises an exception on errors.
Refer to the
datarobot.models.BatchPredictionJob.score()
method for details on the other kwargs parameters.- Returns:
- BatchPredictionJob
Instance of BatchPredictonJob
- pandas.DataFrame
The original dataframe merged with the predictions
- Attributes:
- deploymentDeployment or string ID
Deployment which will be used for scoring.
- dfpandas.DataFrame
The dataframe to score
- Return type:
Tuple
[BatchPredictionJob
,DataFrame
]
- classmethod score_with_leaderboard_model(model, intake_settings=None, output_settings=None, csv_settings=None, timeseries_settings=None, passthrough_columns=None, passthrough_columns_set=None, max_explanations=None, max_ngram_explanations=None, explanation_algorithm=None, threshold_high=None, threshold_low=None, prediction_threshold=None, prediction_warning_enabled=None, include_prediction_status=False, abort_on_error=True, column_names_remapping=None, include_probabilities=True, include_probabilities_classes=None, download_timeout=120, download_read_timeout=660, upload_read_timeout=600, explanations_mode=None)
Creates a new batch prediction job for a Leaderboard model by uploading the scoring dataset. Returns a batch prediction job.
The default intake and output options are both localFile, which requires the caller to pass the file parameter and either download the results using the download() method afterwards or pass a path to a file where the scored data will be downloaded to.
- Returns:
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes:
- modelModel or DatetimeModel or string ID
Model which will be used for scoring.
- intake_settingsdict (optional)
A dict configuring how data is coming from. Supported options:
type : string, either localFile, dataset, or dss.
Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a dr.Dataset object.
To score from a local file, add the this parameter to the settings:
file : file-like object, string path to file or a pandas.DataFrame of scoring data.
To score subset of training data, use dss intake type and specify following parameters:
project_id : project to fetch training data from. Access to project is required.
partition : subset of training data to score, one of
datarobot.enums.TrainingDataSubsets
.
- output_settingsdict (optional)
A dict configuring how scored data is to be saved. Supported options:
type : string, localFile
To save scored data to a local file, add this parameters to the settings:
path : string (optional) The path to save the scored data as a CSV file. If a path is not specified, you must download the scored data yourself with job.download(). If a path is specified, the call is blocked until the job is done. If there are no other jobs currently processing for the targeted prediction instance, uploading, scoring, and downloading will happen in parallel without waiting for a full job to complete. Otherwise, it will still block, but start downloading the scored data as soon as it starts generating data. This is the fastest method to get predictions.
- csv_settingsdict (optional)
CSV intake and output settings. Supported options:
delimiter : string (optional, default ,), fields are delimited by this character. Use the string tab to denote TSV (TAB separated values). Must be either a one-character string or the string tab.
quotechar : string (optional, default “), fields containing the delimiter must be quoted using this character.
encoding : string (optional, default utf-8), encoding for the CSV files. For example (but not limited to): shift_jis, latin_1 or mskanji.
- timeseries_settingsdict (optional)
Configuration for time-series scoring. Supported options:
type : string, must be forecast, historical (default if not passed is forecast), or training. forecast mode makes predictions using forecast_point or rows in the dataset without target. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range. training mode is a special case for predictions on subsets of training data. Note, that it must be used in conjunction with dss intake type only.
forecast_point : datetime (optional), forecast point for the dataset, used for the forecast predictions, by default value will be inferred from the dataset. May be passed if
timeseries_settings.type=forecast
.predictions_start_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.predictions_end_date : datetime (optional), used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if
timeseries_settings.type=historical
.relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
- passthrough_columnslist[string] (optional)
Keep these columns from the scoring dataset in the scored dataset. This is useful for correlating predictions with source data.
- passthrough_columns_setstring (optional)
To pass through every column from the scoring dataset, set this to all. Takes precedence over passthrough_columns if set.
- max_explanationsint (optional)
Compute prediction explanations for this amount of features.
- max_ngram_explanationsint or str (optional)
Compute text explanations for this amount of ngrams. Set to all to return all ngram explanations, or set to a positive integer value to limit the amount of ngram explanations returned. By default no ngram explanations will be computed and returned.
- threshold_highfloat (optional)
Only compute prediction explanations for predictions above this threshold. Can be combined with threshold_low.
- threshold_lowfloat (optional)
Only compute prediction explanations for predictions below this threshold. Can be combined with threshold_high.
- explanations_modePredictionExplanationsMode, optional
Mode of prediction explanations calculation for multiclass and clustering models, if not specified - server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).
- prediction_warning_enabledboolean (optional)
Add prediction warnings to the scored data. Currently only supported for regression models.
- include_prediction_statusboolean (optional)
Include the prediction_status column in the output, defaults to False.
- abort_on_errorboolean (optional)
Default behavior is to abort the job if too many rows fail scoring. This will free up resources for other jobs that may score successfully. Set to false to unconditionally score every row no matter how many errors are encountered. Defaults to True.
- column_names_remappingdict (optional)
Mapping with column renaming for output table. Defaults to {}.
- include_probabilitiesboolean (optional)
Flag that enables returning of all probability columns. Defaults to True.
- include_probabilities_classeslist (optional)
List the subset of classes if you do not want all the classes. Defaults to [].
- download_timeoutint (optional)
Added in version 2.22.
If using localFile output, wait this many seconds for the download to become available. See download().
- download_read_timeoutint (optional, default 660)
Added in version 2.22.
If using localFile output, wait this many seconds for the server to respond between chunks.
- upload_read_timeout: int (optional, default 600)
Added in version 2.28.
If using localFile intake, wait this many seconds for the server to respond after whole dataset upload.
- prediction_threshold: float (optional)
Added in version 3.4.0.
Threshold is the point that sets the class boundary for a predicted value. The model classifies an observation below the threshold as FALSE, and an observation above the threshold as TRUE. In other words, DataRobot automatically assigns the positive class label to any prediction exceeding the threshold. This value can be set between 0.0 and 1.0.
- Return type:
- classmethod get(batch_prediction_job_id)
Get batch prediction job
- Returns:
- BatchPredictionJob
Instance of BatchPredictionJob
- Attributes:
- batch_prediction_job_id: str
ID of batch prediction job
- Return type:
- download(fileobj, timeout=120, read_timeout=660)
Downloads the CSV result of a prediction job
- Attributes:
- fileobj: A file-like object where the CSV prediction results will be
written to. Examples include an in-memory buffer (e.g., io.BytesIO) or a file on disk (opened for binary writing).
- timeoutint (optional, default 120)
Added in version 2.22.
Seconds to wait for the download to become available.
The download will not be available before the job has started processing. In case other jobs are occupying the queue, processing may not start immediately.
If the timeout is reached, the job will be aborted and RuntimeError is raised.
Set to -1 to wait infinitely.
- read_timeoutint (optional, default 660)
Added in version 2.22.
Seconds to wait for the server to respond between chunks.
- Return type:
None
- delete(ignore_404_errors=False)
Cancel this job. If this job has not finished running, it will be removed and canceled.
- Return type:
None
- get_status()
Get status of batch prediction job
- Returns:
- BatchPredictionJob status data
Dict with job status
- classmethod list_by_status(statuses=None)
Get jobs collection for specific set of statuses
- Returns:
- BatchPredictionJob statuses
List of job statuses dicts with specific statuses
- Attributes:
- statuses
List of statuses to filter jobs ([ABORTED|COMPLETED…]) if statuses is not provided, returns all jobs for user
- Return type:
List
[BatchPredictionJob
]
- class datarobot.models.BatchPredictionJobDefinition(id=None, name=None, enabled=None, schedule=None, batch_prediction_job=None, created=None, updated=None, created_by=None, updated_by=None, last_failed_run_time=None, last_successful_run_time=None, last_started_job_status=None, last_scheduled_run_time=None)
- classmethod get(batch_prediction_job_definition_id)
Get batch prediction job definition
- Returns:
- BatchPredictionJobDefinition
Instance of BatchPredictionJobDefinition
- Return type:
Examples
>>> import datarobot as dr >>> definition = dr.BatchPredictionJobDefinition.get('5a8ac9ab07a57a0001be501f') >>> definition BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
- Attributes:
- batch_prediction_job_definition_id: str
ID of batch prediction job definition
- classmethod list(search_name=None, deployment_id=None, limit=<datarobot.models.batch_prediction_job.MissingType object>, offset=0)
Get job all definitions
- Parameters:
- search_namestr, optional
String for filtering job definitions Job definitions that contain the string in name will be returned. If not specified, all available job definitions will be returned.
- deployment_id: str
The ID of the deployment record belongs to.
- limit: int, optional
0 by default. At most this many results are returned.
- offset: int, optional
This many results will be skipped.
- Returns:
- List[BatchPredictionJobDefinition]
List of job definitions the user has access to see
- Return type:
Examples
>>> import datarobot as dr >>> definition = dr.BatchPredictionJobDefinition.list() >>> definition [ BatchPredictionJobDefinition(60912e09fd1f04e832a575c1), BatchPredictionJobDefinition(6086ba053f3ef731e81af3ca) ]
- classmethod create(enabled, batch_prediction_job, name=None, schedule=None)
Creates a new batch prediction job definition to be run either at scheduled interval or as a manual run.
- Returns:
- BatchPredictionJobDefinition
Instance of BatchPredictionJobDefinition
- Return type:
Examples
>>> import datarobot as dr >>> job_spec = { ... "num_concurrent": 4, ... "deployment_id": "foobar", ... "intake_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ... "output_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ...} >>> schedule = { ... "day_of_week": [ ... 1 ... ], ... "month": [ ... "*" ... ], ... "hour": [ ... 16 ... ], ... "minute": [ ... 0 ... ], ... "day_of_month": [ ... 1 ... ] ...} >>> definition = BatchPredictionJobDefinition.create( ... enabled=False, ... batch_prediction_job=job_spec, ... name="some_definition_name", ... schedule=schedule ... ) >>> definition BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
- Attributes:
- enabledbool (default False)
Whether or not the definition should be active on a scheduled basis. If True, schedule is required.
- batch_prediction_job: dict
The job specifications for your batch prediction job. It requires the same job input parameters as used with
score()
, only it will not initialize a job scoring, only store it as a definition for later use.- namestring (optional)
The name you want your job to be identified with. Must be unique across the organization’s existing jobs. If you don’t supply a name, a random one will be generated for you.
- scheduledict (optional)
The
schedule
payload defines at what intervals the job should run, which can be combined in various ways to construct complex scheduling terms if needed. In all of the elements in the objects, you can supply either an asterisk["*"]
denoting “every” time denomination or an array of integers (e.g.[1, 2, 3]
) to define a specific interval.The
schedule
payload is split up in the following items:Minute:
The minute(s) of the day that the job will run. Allowed values are either
["*"]
meaning every minute of the day or[0 ... 59]
Hour: The hour(s) of the day that the job will run. Allowed values are either
["*"]
meaning every hour of the day or[0 ... 23]
.Day of Month: The date(s) of the month that the job will run. Allowed values are either
[1 ... 31]
or["*"]
for all days of the month. This field is additive withdayOfWeek
, meaning the job will run both on the date(s) defined in this field and the day specified bydayOfWeek
(for example, dates 1st, 2nd, 3rd, plus every Tuesday). IfdayOfMonth
is set to["*"]
anddayOfWeek
is defined, the scheduler will trigger on every day of the month that matchesdayOfWeek
(for example, Tuesday the 2nd, 9th, 16th, 23rd, 30th). Invalid dates such as February 31st are ignored.Month: The month(s) of the year that the job will run. Allowed values are either
[1 ... 12]
or["*"]
for all months of the year. Strings, either 3-letter abbreviations or the full name of the month, can be used interchangeably (e.g., “jan” or “october”). Months that are not compatible withdayOfMonth
are ignored, for example{"dayOfMonth": [31], "month":["feb"]}
Day of Week: The day(s) of the week that the job will run. Allowed values are
[0 .. 6]
, where (Sunday=0), or["*"]
, for all days of the week. Strings, either 3-letter abbreviations or the full name of the day, can be used interchangeably (e.g., “sunday”, “Sunday”, “sun”, or “Sun”, all map to[0]
. This field is additive withdayOfMonth
, meaning the job will run both on the date specified bydayOfMonth
and the day defined in this field.
- update(enabled, batch_prediction_job=None, name=None, schedule=None)
Updates a job definition with the changed specs.
Takes the same input as
create()
- Returns:
- BatchPredictionJobDefinition
Instance of the updated BatchPredictionJobDefinition
- Return type:
Examples
>>> import datarobot as dr >>> job_spec = { ... "num_concurrent": 5, ... "deployment_id": "foobar_new", ... "intake_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ... "output_settings": { ... "url": "s3://foobar/123", ... "type": "s3", ... "format": "csv" ... }, ...} >>> schedule = { ... "day_of_week": [ ... 1 ... ], ... "month": [ ... "*" ... ], ... "hour": [ ... "*" ... ], ... "minute": [ ... 30, 59 ... ], ... "day_of_month": [ ... 1, 2, 6 ... ] ...} >>> definition = BatchPredictionJobDefinition.create( ... enabled=False, ... batch_prediction_job=job_spec, ... name="updated_definition_name", ... schedule=schedule ... ) >>> definition BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
- run_on_schedule(schedule)
Sets the run schedule of an already created job definition.
If the job was previously not enabled, this will also set the job to enabled.
- Returns:
- BatchPredictionJobDefinition
Instance of the updated BatchPredictionJobDefinition with the new / updated schedule.
- Return type:
Examples
>>> import datarobot as dr >>> definition = dr.BatchPredictionJobDefinition.create('...') >>> schedule = { ... "day_of_week": [ ... 1 ... ], ... "month": [ ... "*" ... ], ... "hour": [ ... "*" ... ], ... "minute": [ ... 30, 59 ... ], ... "day_of_month": [ ... 1, 2, 6 ... ] ...} >>> definition.run_on_schedule(schedule) BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)
- Attributes:
- scheduledict
Same as
schedule
increate()
.
- run_once()
Manually submits a batch prediction job to the queue, based off of an already created job definition.
- Returns:
- BatchPredictionJob
Instance of BatchPredictionJob
- Return type:
Examples
>>> import datarobot as dr >>> definition = dr.BatchPredictionJobDefinition.create('...') >>> job = definition.run_once() >>> job.wait_for_completion()
- delete()
Deletes the job definition and disables any future schedules of this job if any. If a scheduled job is currently running, this will not be cancelled.
- Return type:
None
Examples
>>> import datarobot as dr >>> definition = dr.BatchPredictionJobDefinition.get('5a8ac9ab07a57a0001be501f') >>> definition.delete()