Batch predictions

class datarobot.models.BatchPredictionJob

A Batch Prediction Job is used to score large data sets on prediction servers using the Batch Prediction API.

Variables:: id (str) – the id of the job

classmethod score(deployment, intake_settings=None, output_settings=None, csv_settings=None, timeseries_settings=None, num_concurrent=None, chunk_size=None, passthrough_columns=None, passthrough_columns_set=None, max_explanations=None, max_ngram_explanations=None, explanation_algorithm=None, threshold_high=None, threshold_low=None, prediction_threshold=None, prediction_warning_enabled=None, include_prediction_status=False, skip_drift_tracking=False, prediction_instance=None, abort_on_error=True, column_names_remapping=None, include_probabilities=True, include_probabilities_classes=None, download_timeout=120, download_read_timeout=660, upload_read_timeout=600, explanations_mode=None)

Create new batch prediction job, upload the scoring dataset and return a batch prediction job.

The default intake and output options are both localFile which requires the caller to pass the file parameter and either download the results using the download() method afterwards or pass a path to a file where the scored data will be downloaded to afterwards.

Variables:

deployment (Deployment or string ID) – Deployment which will be used for scoring.
intake_settings (Optional[IntakeSettings]) –
A dict configuring how data is coming from. Supported options:
- type : str, either localFile, s3, azure, gcp, dataset, jdbc snowflake, synapse, bigquery, or datasphere
Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a dr.Dataset object.

To score from a local file, add the this parameter to the settings:
- file : file-like object, string path to file or a pandas.DataFrame of scoring data
To score from S3, add the next parameters to the settings:
- url : str, the URL to score (e.g.: s3://bucket/key)
- credential_id : Optional[str]
- endpoint_url : Optional[str], any non-default endpoint URL for S3 access (omit to use the default)
To score from JDBC, add the next parameters to the settings:
- data_store_id : str, the ID of the external data store connected to the JDBC data source (see Database Connectivity).
- query : str (optional if table, schema and/or catalog is specified), a self-supplied SELECT statement of the data set you wish to predict.
- table : str (optional if query is specified), the name of specified database table.
- schema : str (optional if query is specified), the name of specified database schema.
- catalog : str (optional if query is specified), (new in v2.22) the name of specified database catalog.
- fetch_size : Optional[int], Changing the fetchSize can be used to balance throughput and memory usage.
- credential_id : Optional[str] the ID of the credentials holding information about a user with read-access to the JDBC data source (see Credentials).
To score from Datasphere, add the next parameters to the settings:
- data_store_id : str, the ID of the external data store connected to the Datasphere data source (see Database Connectivity).
- table : str, the name of specified database table.
- schema : str, the name of specified database schema.
- credential_id : str, the ID of the credentials holding information about a user with read-access to the Datasphere data source (see Credentials).
output_settings (Optional[OutputSettings]) –
A dict configuring how scored data is to be saved. Supported options:
- type : str, either localFile, s3, azure, gcp, jdbc, snowflake, synapse, bigquery, or datasphere
To save scored data to a local file, add this parameters to the settings:
- path : Optional[str], path to save the scored data as CSV. If a path is not specified, you must download the scored data yourself with job.download(). If a path is specified, the call will block until the job is done. if there are no other jobs currently processing for the targeted prediction instance, uploading, scoring, downloading will happen in parallel without waiting for a full job to complete. Otherwise, it will still block, but start downloading the scored data as soon as it starts generating data. This is the fastest method to get predictions.
To save scored data to S3, add the next parameters to the settings:
- url : str, the URL for storing the results (e.g.: s3://bucket/key)
- credential_id : Optional[str]
- endpoint_url : Optional[str], any non-default endpoint URL for S3 access (omit to use the default)
To save scored data to JDBC, add the next parameters to the settings:
- data_store_id : str, the ID of the external data store connected to the JDBC data source (see Database Connectivity).
- table : str, the name of specified database table.
- schema : Optional[str], the name of specified database schema.
- catalog : Optional[str], (new in v2.22) the name of specified database catalog.
- statement_type : str, the type of insertion statement to create, one of datarobot.enums.AVAILABLE_STATEMENT_TYPES.
- update_columns : list(string) (optional), a list of strings containing those column names to be updated in case statement_type is set to a value related to update or upsert.
- where_columns : list(string) (optional), a list of strings containing those column names to be selected in case statement_type is set to a value related to insert or update.
- credential_id : str, the ID of the credentials holding information about a user with write-access to the JDBC data source (see Credentials).
To save scored data to Datasphere, add the following parameters to the settings:
- data_store_id : str, the ID of the external data store connected to the Datasphere data source (see Database Connectivity).
- table : str, the name of specified database table.
- schema : str, the name of specified database schema.
- credential_id : str, the ID of the credentials holding information about a user with write-access to the Datasphere data source (see Credentials).
csv_settings (Optional[CsvSettings]) –
CSV intake and output settings. Supported options:
- delimiter : str (optional, default ,), fields are delimited by this character. Use the string tab to denote TSV (TAB separated values). Must be either a one-character string or the string tab.
- quotechar : str (optional, default “), fields containing the delimiter must be quoted using this character.
- encoding : str (optional, default utf-8), encoding for the CSV files. For example (but not limited to): shift_jis, latin_1 or mskanji.
timeseries_settings (Optional[TimeSeriesSettings]) –
Configuration for time-series scoring. Supported options:
- type : str, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point or rows in the dataset without target. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.
- forecast_point : Optional[datetime.datetime], forecast point for the dataset, used for the forecast predictions, by default value will be inferred from the dataset. May be passed if timeseries_settings.type=forecast.
- predictions_start_date : Optional[datetime.datetime], used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
- predictions_end_date : Optional[datetime.datetime], used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
- relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
num_concurrent (Optional[int]) – Number of concurrent chunks to score simultaneously. Defaults to the available number of cores of the deployment. Lower it to leave resources for real-time scoring.
chunk_size (str or Optional[int]) – Which strategy should be used to determine the chunk size. Can be either a named strategy or a fixed size in bytes. - auto: use fixed or dynamic based on flipper - fixed: use 1MB for explanations, 5MB for regular requests - dynamic: use dynamic chunk sizes - int: use this many bytes per chunk
passthrough_columns (list[string] (optional)) – Keep these columns from the scoring dataset in the scored dataset. This is useful for correlating predictions with source data.
passthrough_columns_set (Optional[str]) – To pass through every column from the scoring dataset, set this to all. Takes precedence over passthrough_columns if set.
max_explanations (Optional[int]) – Compute prediction explanations for this amount of features.
max_ngram_explanations (int or str (optional)) – Compute text explanations for this amount of ngrams. Set to all to return all ngram explanations, or set to a positive integer value to limit the amount of ngram explanations returned. By default no ngram explanations will be computed and returned.
threshold_high (Optional[float]) – Only compute prediction explanations for predictions above this threshold. Can be combined with threshold_low.
threshold_low (Optional[float]) – Only compute prediction explanations for predictions below this threshold. Can be combined with threshold_high.
explanations_mode (PredictionExplanationsMode, optional) – Mode of prediction explanations calculation for multiclass and clustering models, if not specified - server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).
prediction_warning_enabled (Optional[bool]) – Add prediction warnings to the scored data. Currently only supported for regression models.
include_prediction_status (Optional[bool]) – Include the prediction_status column in the output, defaults to False.
skip_drift_tracking (Optional[bool]) – Skips drift tracking on any predictions made from this job. This is useful when running non-production workloads to not affect drift tracking and cause unnecessary alerts. Defaults to False.
prediction_instance (Optional[PredictionInstance]) –
Defaults to instance specified by deployment or system configuration. Supported options:
- hostName : str
- sslEnabled : boolean (optional, default true). Set to false to run prediction requests from the batch prediction job without SSL.
- datarobotKey : Optional[str], if running a job against a prediction instance in the Managed AI Cloud, you must provide the organization level DataRobot-Key
- apiKey : Optional[str], by default, prediction requests will use the API key of the user that created the job. This allows you to make requests on behalf of other users.
abort_on_error (Optional[bool]) – Default behavior is to abort the job if too many rows fail scoring. This will free up resources for other jobs that may score successfully. Set to false to unconditionally score every row no matter how many errors are encountered. Defaults to True.
column_names_remapping (Optional[Dict[str, str]]) – Mapping with column renaming for output table. Defaults to {}.
include_probabilities (Optional[bool]) – Flag that enables returning of all probability columns. Defaults to True.
include_probabilities_classes (list (optional)) – List the subset of classes if a user doesn’t want all the classes. Defaults to [].
download_timeout (Optional[int]) –

Added in version 2.22.

If using localFile output, wait this many seconds for the download to become available. See download().
download_read_timeout (Optional[int], default 660) –

Added in version 2.22.

If using localFile output, wait this many seconds for the server to respond between chunks.
upload_read_timeout (Optional[int], default 600) –

Added in version 2.28.

If using localFile intake, wait this many seconds for the server to respond after whole dataset upload.
prediction_threshold (Optional[float]) –

Added in version 3.4.0.

Threshold is the point that sets the class boundary for a predicted value. The model classifies an observation below the threshold as FALSE, and an observation above the threshold as TRUE. In other words, DataRobot automatically assigns the positive class label to any prediction exceeding the threshold. This value can be set between 0.0 and 1.0.

Returns:

Instance of BatchPredictionJob

Return type:

BatchPredictionJob

classmethod apply_time_series_data_prep_and_score(deployment, intake_settings, timeseries_settings, **kwargs)

Prepare the dataset with time series data prep, create new batch prediction job, upload the scoring dataset, and return a batch prediction job.

The supported intake_settings are of type localFile or dataset.

For timeseries_settings of type forecast the forecast_point must be specified.

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Added in version v3.1.

Variables:

deployment (Deployment) – Deployment which will be used for scoring.
intake_settings (dict) –
A dict configuring where data is coming from. Supported options:
- type : str, either localFile, dataset
Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a Dataset object.

To score from a local file, add this parameter to the settings:
- file : file-like object, string path to file or a pandas.DataFrame of scoring data.
timeseries_settings (dict) –
Configuration for time-series scoring. Supported options:
- type : str, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.
- forecast_point : Optional[datetime.datetime], forecast point for the dataset, used for the forecast predictions. Must be passed if timeseries_settings.type=forecast.
- predictions_start_date : Optional[datetime.datetime], used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
- predictions_end_date : Optional[datetime.datetime], used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
- relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

Returns:

Instance of BatchPredictionJob

Return type:

BatchPredictionJob

Raises:

InvalidUsageError – If the deployment does not support time series data prep. If the intake type is not supported for time series data prep.

classmethod score_to_file(deployment, intake_path, output_path, **kwargs)

Create new batch prediction job, upload the scoring dataset and download the scored CSV file concurrently.

Will block until the entire file is scored.

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Variables:

deployment (Deployment or string ID) – Deployment which will be used for scoring.
intake_path (file-like object/string path to file/pandas.DataFrame) – Scoring data
output_path (str) – Filename to save the result under

Returns:

Instance of BatchPredictionJob

Return type:

BatchPredictionJob

classmethod apply_time_series_data_prep_and_score_to_file(deployment, intake_path, output_path, timeseries_settings, **kwargs)

Prepare the input dataset with time series data prep. Then, create a new batch prediction job using the prepared AI catalog item as input and concurrently download the scored CSV file.

The function call will return when the entire file is scored.

For timeseries_settings of type forecast the forecast_point must be specified.

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Added in version v3.1.

Variables:

deployment (Deployment) – The deployment which will be used for scoring.
intake_path (file-like object/string path to file/pandas.DataFrame) – The scoring data.
output_path (str) – The filename under which you save the result.
timeseries_settings (dict) –
Configuration for time-series scoring. Supported options:
- type : str, must be forecast or historical (default if not passed is forecast). forecast mode makes predictions using forecast_point. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range.
- forecast_point : Optional[datetime.datetime], forecast point for the dataset, used for the forecast predictions. Must be passed if timeseries_settings.type=forecast.
- predictions_start_date : Optional[datetime.datetime], used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
- predictions_end_date : Optional[datetime.datetime], used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
- relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.

Returns:

Instance of BatchPredictionJob.

Return type:

BatchPredictionJob

Raises:

InvalidUsageError – If the deployment does not support time series data prep.

classmethod score_s3(deployment, source_url, destination_url, credential=None, endpoint_url=None, **kwargs)

Create new batch prediction job, with a scoring dataset from S3 and writing the result back to S3.

This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job)

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Variables:

deployment (Deployment or string ID) – Deployment which will be used for scoring.
source_url (str) – The URL for the prediction dataset (e.g.: s3://bucket/key)
destination_url (str) – The URL for the scored dataset (e.g.: s3://bucket/key)
credential (str or Credential (optional)) – The AWS Credential object or credential id
endpoint_url (Optional[str]) – Any non-default endpoint URL for S3 access (omit to use the default)

Returns:

Instance of BatchPredictionJob

Return type:

BatchPredictionJob

classmethod score_azure(deployment, source_url, destination_url, credential=None, **kwargs)

Create new batch prediction job, with a scoring dataset from Azure blob storage and writing the result back to Azure blob storage.

This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job).

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Variables:

deployment (Deployment or string ID) – Deployment which will be used for scoring.
source_url (str) – The URL for the prediction dataset (e.g.: https://storage_account.blob.endpoint/container/blob_name)
destination_url (str) – The URL for the scored dataset (e.g.: https://storage_account.blob.endpoint/container/blob_name)
credential (str or Credential (optional)) – The Azure Credential object or credential id

Returns:

Instance of BatchPredictionJob

Return type:

BatchPredictionJob

classmethod score_gcp(deployment, source_url, destination_url, credential=None, **kwargs)

Create new batch prediction job, with a scoring dataset from Google Cloud Storage and writing the result back to one.

This returns immediately after the job has been created. You must poll for job completion using get_status() or wait_for_completion() (see datarobot.models.Job).

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Variables:

deployment (Deployment or string ID) – Deployment which will be used for scoring.
source_url (str) – The URL for the prediction dataset (e.g.: http(s)://storage.googleapis.com/[bucket]/[object])
destination_url (str) – The URL for the scored dataset (e.g.: http(s)://storage.googleapis.com/[bucket]/[object])
credential (str or Credential (optional)) – The GCP Credential object or credential id

Returns:

Instance of BatchPredictionJob

Return type:

BatchPredictionJob

classmethod score_from_existing(batch_prediction_job_id)

Create a new batch prediction job based on the settings from a previously created one

Variables:: batch_prediction_job_id (str) – ID of the previous batch prediction job
Returns:: Instance of BatchPredictionJob
Return type:: BatchPredictionJob

classmethod score_pandas(deployment, df, read_timeout=660, **kwargs)

Run a batch prediction job, with a scoring dataset from a pandas dataframe. The output from the prediction will be joined to the passed DataFrame and returned.

Use columnNamesRemapping to drop or rename columns in the output

This method blocks until the job has completed or raises an exception on errors.

Refer to the datarobot.models.BatchPredictionJob.score() method for details on the other kwargs parameters.

Variables:

deployment (Deployment or string ID) – Deployment which will be used for scoring.
df (pandas.DataFrame) – The dataframe to score

Return type:

Tuple[BatchPredictionJob, DataFrame]

Returns:

BatchPredictionJob – Instance of BatchPredictonJob
pandas.DataFrame – The original dataframe merged with the predictions

classmethod score_with_leaderboard_model(model, intake_settings=None, output_settings=None, csv_settings=None, timeseries_settings=None, passthrough_columns=None, passthrough_columns_set=None, max_explanations=None, max_ngram_explanations=None, explanation_algorithm=None, threshold_high=None, threshold_low=None, prediction_threshold=None, prediction_warning_enabled=None, include_prediction_status=False, abort_on_error=True, column_names_remapping=None, include_probabilities=True, include_probabilities_classes=None, download_timeout=120, download_read_timeout=660, upload_read_timeout=600, explanations_mode=None)

Creates a new batch prediction job for a Leaderboard model by uploading the scoring dataset. Returns a batch prediction job.

The default intake and output options are both localFile, which requires the caller to pass the file parameter and either download the results using the download() method afterwards or pass a path to a file where the scored data will be downloaded to.

Variables:

model (Model or DatetimeModel or string ID) – Model which will be used for scoring.
intake_settings (Optional[IntakeSettings]) –
A dict configuring how data is coming from. Supported options:
- type : str, either localFile, dataset, or dss.
Note that to pass a dataset, you not only need to specify the type parameter as dataset, but you must also set the dataset parameter as a dr.Dataset object.

To score from a local file, add the this parameter to the settings:
- file : file-like object, string path to file or a pandas.DataFrame of scoring data.
To score subset of training data, use dss intake type and specify following parameters:
- project_id : project to fetch training data from. Access to project is required.
- partition : subset of training data to score, one of datarobot.enums.TrainingDataSubsets.
output_settings (Optional[OutputSettings]) –
A dict configuring how scored data is to be saved. Supported options:
- type : str, localFile
To save scored data to a local file, add this parameters to the settings:
- path : Optional[str] The path to save the scored data as a CSV file. If a path is not specified, you must download the scored data yourself with job.download(). If a path is specified, the call is blocked until the job is done. If there are no other jobs currently processing for the targeted prediction instance, uploading, scoring, and downloading will happen in parallel without waiting for a full job to complete. Otherwise, it will still block, but start downloading the scored data as soon as it starts generating data. This is the fastest method to get predictions.
csv_settings (Optional[CsvSettings]) –
CSV intake and output settings. Supported options:
- delimiter : str (optional, default ,), fields are delimited by this character. Use the string tab to denote TSV (TAB separated values). Must be either a one-character string or the string tab.
- quotechar : str (optional, default “), fields containing the delimiter must be quoted using this character.
- encoding : str (optional, default utf-8), encoding for the CSV files. For example (but not limited to): shift_jis, latin_1 or mskanji.
timeseries_settings (Optional[TimeSeriesSettings]) –
Configuration for time-series scoring. Supported options:
- type : str, must be forecast, historical (default if not passed is forecast), or training. forecast mode makes predictions using forecast_point or rows in the dataset without target. historical enables bulk prediction mode which calculates predictions for all possible forecast points and forecast distances in the dataset within predictions_start_date/predictions_end_date range. training mode is a special case for predictions on subsets of training data. Note, that it must be used in conjunction with dss intake type only.
- forecast_point : Optional[datetime.datetime], forecast point for the dataset, used for the forecast predictions, by default value will be inferred from the dataset. May be passed if timeseries_settings.type=forecast.
- predictions_start_date : Optional[datetime.datetime], used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
- predictions_end_date : Optional[datetime.datetime], used for historical predictions in order to override date from which predictions should be calculated. By default value will be inferred automatically from the dataset. May be passed if timeseries_settings.type=historical.
- relax_known_in_advance_features_check : bool, (default False). If True, missing values in the known in advance features are allowed in the forecast window at the prediction time. If omitted or False, missing values are not allowed.
passthrough_columns (list[string] (optional)) – Keep these columns from the scoring dataset in the scored dataset. This is useful for correlating predictions with source data.
passthrough_columns_set (Optional[str]) – To pass through every column from the scoring dataset, set this to all. Takes precedence over passthrough_columns if set.
max_explanations (Optional[int]) – Compute prediction explanations for this amount of features.
max_ngram_explanations (int or str (optional)) – Compute text explanations for this amount of ngrams. Set to all to return all ngram explanations, or set to a positive integer value to limit the amount of ngram explanations returned. By default no ngram explanations will be computed and returned.
threshold_high (Optional[float]) – Only compute prediction explanations for predictions above this threshold. Can be combined with threshold_low.
threshold_low (Optional[float]) – Only compute prediction explanations for predictions below this threshold. Can be combined with threshold_high.
explanations_mode (PredictionExplanationsMode, optional) – Mode of prediction explanations calculation for multiclass and clustering models, if not specified - server default is to explain only the predicted class, identical to passing TopPredictionsMode(1).
prediction_warning_enabled (Optional[bool]) – Add prediction warnings to the scored data. Currently only supported for regression models.
include_prediction_status (Optional[bool]) – Include the prediction_status column in the output, defaults to False.
abort_on_error (Optional[bool]) – Default behavior is to abort the job if too many rows fail scoring. This will free up resources for other jobs that may score successfully. Set to false to unconditionally score every row no matter how many errors are encountered. Defaults to True.
column_names_remapping (Optional[Dict]) – Mapping with column renaming for output table. Defaults to {}.
include_probabilities (Optional[bool]) – Flag that enables returning of all probability columns. Defaults to True.
include_probabilities_classes (list (optional)) – List the subset of classes if you do not want all the classes. Defaults to [].
download_timeout (Optional[int]) –

Added in version 2.22.

If using localFile output, wait this many seconds for the download to become available. See download().
download_read_timeout (int (optional, default 660)) –

Added in version 2.22.

If using localFile output, wait this many seconds for the server to respond between chunks.
upload_read_timeout (int (optional, default 600)) –

Added in version 2.28.

If using localFile intake, wait this many seconds for the server to respond after whole dataset upload.
prediction_threshold (Optional[float]) –

Added in version 3.4.0.

Threshold is the point that sets the class boundary for a predicted value. The model classifies an observation below the threshold as FALSE, and an observation above the threshold as TRUE. In other words, DataRobot automatically assigns the positive class label to any prediction exceeding the threshold. This value can be set between 0.0 and 1.0.

Returns:

Instance of BatchPredictionJob

Return type:

BatchPredictionJob

classmethod get(batch_prediction_job_id)

Get batch prediction job

Variables:: batch_prediction_job_id (str) – ID of batch prediction job
Returns:: Instance of BatchPredictionJob
Return type:: BatchPredictionJob

download(fileobj, timeout=120, read_timeout=660)

Downloads the CSV result of a prediction job

Variables:

fileobj (A file-like object where the CSV prediction results will be) – written to. Examples include an in-memory buffer (e.g., io.BytesIO) or a file on disk (opened for binary writing).
timeout (int (optional, default 120)) –

Added in version 2.22.

Seconds to wait for the download to become available.

The download will not be available before the job has started processing. In case other jobs are occupying the queue, processing may not start immediately.

If the timeout is reached, the job will be aborted and RuntimeError is raised.

Set to -1 to wait infinitely.
read_timeout (int (optional, default 660)) –

Added in version 2.22.

Seconds to wait for the server to respond between chunks.

Return type:

None

delete(ignore_404_errors=False)

Cancel this job. If this job has not finished running, it will be removed and canceled.

Return type:: None

get_status()

Get status of batch prediction job

Returns:: Dict with job status
Return type:: BatchPredictionJob status data

classmethod list_by_status(statuses=None)

Get jobs collection for specific set of statuses

Variables:: statuses – List of statuses to filter jobs ([ABORTED|COMPLETED…]) if statuses is not provided, returns all jobs for user
Returns:: List of job statuses dicts with specific statuses
Return type:: BatchPredictionJob statuses

class datarobot.models.BatchPredictionJobDefinition

classmethod get(batch_prediction_job_definition_id)

Get batch prediction job definition

Variables:: batch_prediction_job_definition_id (str) – ID of batch prediction job definition
Returns:: Instance of BatchPredictionJobDefinition
Return type:: BatchPredictionJobDefinition

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.get('5a8ac9ab07a57a0001be501f')
>>> definition
BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)

classmethod list(search_name=None, deployment_id=None, limit=<datarobot.models.batch_prediction_job.MissingType object>, offset=0)

Get job all definitions

Parameters:

search_name (Optional[str]) – String for filtering job definitions Job definitions that contain the string in name will be returned. If not specified, all available job definitions will be returned.
deployment_id (str) – The ID of the deployment record belongs to.
limit (Optional[int]) – 0 by default. At most this many results are returned.
offset (Optional[int]) – This many results will be skipped.

Returns:

List of job definitions the user has access to see

Return type:

List[BatchPredictionJobDefinition]

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.list()
>>> definition
[
    BatchPredictionJobDefinition(60912e09fd1f04e832a575c1),
    BatchPredictionJobDefinition(6086ba053f3ef731e81af3ca)
]

classmethod create(enabled, batch_prediction_job, name=None, schedule=None)

Creates a new batch prediction job definition to be run either at scheduled interval or as a manual run.

Variables:

enabled (bool (default False)) – Whether or not the definition should be active on a scheduled basis. If True, schedule is required.
batch_prediction_job (dict) – The job specifications for your batch prediction job. It requires the same job input parameters as used with score(), only it will not initialize a job scoring, only store it as a definition for later use.
name (Optional[str]) – The name you want your job to be identified with. Must be unique across the organization’s existing jobs. If you don’t supply a name, a random one will be generated for you.
schedule (Optional[Dict]) –
The schedule payload defines at what intervals the job should run, which can be combined in various ways to construct complex scheduling terms if needed. In all of the elements in the objects, you can supply either an asterisk ["*"] denoting “every” time denomination or an array of integers (e.g. [1, 2, 3]) to define a specific interval.

The schedule payload is split up in the following items:

Minute:

The minute(s) of the day that the job will run. Allowed values are either ["*"] meaning every minute of the day or [0 ... 59]

Hour: The hour(s) of the day that the job will run. Allowed values are either ["*"] meaning every hour of the day or [0 ... 23].

Day of Month: The date(s) of the month that the job will run. Allowed values are either [1 ... 31] or ["*"] for all days of the month. This field is additive with dayOfWeek, meaning the job will run both on the date(s) defined in this field and the day specified by dayOfWeek (for example, dates 1st, 2nd, 3rd, plus every Tuesday). If dayOfMonth is set to ["*"] and dayOfWeek is defined, the scheduler will trigger on every day of the month that matches dayOfWeek (for example, Tuesday the 2nd, 9th, 16th, 23rd, 30th). Invalid dates such as February 31st are ignored.

Month: The month(s) of the year that the job will run. Allowed values are either [1 ... 12] or ["*"] for all months of the year. Strings, either 3-letter abbreviations or the full name of the month, can be used interchangeably (e.g., “jan” or “october”). Months that are not compatible with dayOfMonth are ignored, for example {"dayOfMonth": [31], "month":["feb"]}

Day of Week: The day(s) of the week that the job will run. Allowed values are [0 .. 6], where (Sunday=0), or ["*"], for all days of the week. Strings, either 3-letter abbreviations or the full name of the day, can be used interchangeably (e.g., “sunday”, “Sunday”, “sun”, or “Sun”, all map to [0]. This field is additive with dayOfMonth, meaning the job will run both on the date specified by dayOfMonth and the day defined in this field.

Returns:

Instance of BatchPredictionJobDefinition

Return type:

BatchPredictionJobDefinition

Examples

>>> import datarobot as dr
>>> job_spec = {
...    "num_concurrent": 4,
...    "deployment_id": "foobar",
...    "intake_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...    "output_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...}
>>> schedule = {
...    "day_of_week": [
...        1
...    ],
...    "month": [
...        "*"
...    ],
...    "hour": [
...        16
...    ],
...    "minute": [
...        0
...    ],
...    "day_of_month": [
...        1
...    ]
...}
>>> definition = BatchPredictionJobDefinition.create(
...    enabled=False,
...    batch_prediction_job=job_spec,
...    name="some_definition_name",
...    schedule=schedule
... )
>>> definition
BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)

update(enabled, batch_prediction_job=None, name=None, schedule=None)

Updates a job definition with the changed specs.

Takes the same input as create()

Variables:

enabled (bool (default False)) – Same as enabled in create().
batch_prediction_job (dict) – Same as batch_prediction_job in create().
name (Optional[str]) – Same as name in create().
schedule (dict) – Same as schedule in create().

Returns:

Instance of the updated BatchPredictionJobDefinition

Return type:

BatchPredictionJobDefinition

Examples

>>> import datarobot as dr
>>> job_spec = {
...    "num_concurrent": 5,
...    "deployment_id": "foobar_new",
...    "intake_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...    "output_settings": {
...        "url": "s3://foobar/123",
...        "type": "s3",
...        "format": "csv"
...    },
...}
>>> schedule = {
...    "day_of_week": [
...        1
...    ],
...    "month": [
...        "*"
...    ],
...    "hour": [
...        "*"
...    ],
...    "minute": [
...        30, 59
...    ],
...    "day_of_month": [
...        1, 2, 6
...    ]
...}
>>> definition = BatchPredictionJobDefinition.create(
...    enabled=False,
...    batch_prediction_job=job_spec,
...    name="updated_definition_name",
...    schedule=schedule
... )
>>> definition
BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)

run_on_schedule(schedule)

Sets the run schedule of an already created job definition.

If the job was previously not enabled, this will also set the job to enabled.

Variables:: schedule (dict) – Same as schedule in create().
Returns:: Instance of the updated BatchPredictionJobDefinition with the new / updated schedule.
Return type:: BatchPredictionJobDefinition

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.create('...')
>>> schedule = {
...    "day_of_week": [
...        1
...    ],
...    "month": [
...        "*"
...    ],
...    "hour": [
...        "*"
...    ],
...    "minute": [
...        30, 59
...    ],
...    "day_of_month": [
...        1, 2, 6
...    ]
...}
>>> definition.run_on_schedule(schedule)
BatchPredictionJobDefinition(60912e09fd1f04e832a575c1)

run_once()

Manually submits a batch prediction job to the queue, based off of an already created job definition.

Returns:: Instance of BatchPredictionJob
Return type:: BatchPredictionJob

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.create('...')
>>> job = definition.run_once()
>>> job.wait_for_completion()

delete()

Deletes the job definition and disables any future schedules of this job if any. If a scheduled job is currently running, this will not be cancelled. :rtype: None

Examples

>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.get('5a8ac9ab07a57a0001be501f')
>>> definition.delete()