Batch Prediction Job Definitions¶
To submit a working Batch Prediction job, you must supply a variety of elements to the datarobot.models.BatchPredictionJob.score()
request payload depending on what type of prediction is required. Additionally, you must consider the type of intake
and output adapters used for a given job.
Every time a new Batch Prediction is created, the same amount of information must be stored somewhere outside of DataRobot and re-submitted every time.
For example, a request could look like:
import datarobot as dr
deployment_id = "5dc5b1015e6e762a6241f9aa"
job = dr.BatchPredictionJob.score(
deployment_id,
intake_settings={
"type": "s3",
"url": "s3://bucket/container/file.csv",
"credential_id": "5dc5b1015e6e762a6241f9bb"
},
output_settings={
"type": "s3",
"url": "s3://bucket/container/output.csv",
"credential_id": "5dc5b1015e6e762a6241f9bb"
},
)
job.wait_for_completion()
with open("./predicted.csv", "wb") as f:
job.download(f)
Job Definitions¶
If your use case requires the same, or close to the same, type of prediction to be done multiple times, you can choose to create a Job Definition of the Batch Prediction job and store this inside DataRobot for future use.
The method for creating job definitions is identical to the existing datarobot.models.BatchPredictionJob.score()
method,
except for the addition of a enabled
, name
and schedule
parameter: datarobot.models.BatchPredictionJobDefinition.create()
>>> import datarobot as dr
>>> job_spec = {
... "num_concurrent": 4,
... "deployment_id": "5dc5b1015e6e762a6241f9aa",
... "intake_settings": {
... "url": "s3://foobar/123",
... "type": "s3",
... "format": "csv",
... "credential_id": "5dc5b1015e6e762a6241f9bb"
... },
... "output_settings": {
... "url": "s3://foobar/123",
... "type": "s3",
... "format": "csv",
... "credential_id": "5dc5b1015e6e762a6241f9bb"
... },
...}
>>> definition = BatchPredictionJobDefinition.create(
... enabled=False,
... batch_prediction_job=job_spec,
... name="some_definition_name",
... schedule=None
... )
>>> definition
BatchPredictionJobDefinition(foobar)
Note
The name
parameter must be unique across your organization. If you attempt to create multiple definitions
with the same name, the request will fail. If you wish to free up a name, you must first datarobot.models.BatchPredictionJobDefinition.delete()
the existing definition before creating this one. Alternatively you can just datarobot.models.BatchPredictionJobDefinition.update()
the existing definition with a new name.
Executing a job definition¶
Manual job execution¶
If you wish to submit a stored job definition for scoring, you can either choose to do so on a scheduled basis, described
below, or by manually submitting the definition ID using datarobot.models.BatchPredictionJobDefinition.run_once()
,
as such:
>>> import datarobot as dr
>>> definition = dr.BatchPredictionJob.get("5dc5b1015e6e762a6241f9aa")
>>> job = definition.run_once()
>>> job.wait_for_completion()
Scheduled job execution¶
A Scheduled Batch Prediction job works just like a regular Batch Prediction job, except DataRobot handles the execution of the job.
In order to schedule the execution of a Batch Prediction job, a definition must first be created, using
datarobot.models.BatchPredictionJobDefinition.create()
, or updated, using
datarobot.models.BatchPredictionJobDefinition.update()
, where enabled
is set to True
and a schedule
payload is provided.
Alternatively, you can use a short-hand version with datarobot.models.BatchPredictionJobDefinition.run_on_schedule()
as such:
>>> import datarobot as dr
>>> schedule = {
... "day_of_week": [
... 1
... ],
... "month": [
... "*"
... ],
... "hour": [
... 16
... ],
... "minute": [
... 0
... ],
... "day_of_month": [
... 1
... ]
...}
>>> definition = dr.BatchPredictionJob.get("5dc5b1015e6e762a6241f9aa")
>>> job = definition.run_on_schedule(schedule)
If the created job was not enabled previously, this method will also enable it.
The Schedule
payload¶
The schedule
payload defines at what intervals the job should run, which can be combined in various ways to construct
complex scheduling terms if needed. In all of the elements in the objects, you can supply either an asterisk ["*"]
denoting “every” time denomination or an array of integers (e.g. [1, 2, 3]
) to define a specific interval.
Key | Possible values | Example | Description |
---|---|---|---|
minute | ["*"] or [0 ... 59] |
[15, 30, 45] |
The job will run at these minute values for every hour of the day. |
hour | ["*"] or [0 ... 23] |
[12,23] |
The hour(s) of the day that the job will run. |
month | ["*"] or [1 ... 12] |
["jan"] |
Strings, either 3-letter abbreviations or the full name of the month, can be used interchangeably (e.g., “jan” or “october”). Months that are not compatible with |
day_of_week | ["*"] or [0 ... 6] where (Sunday=0) |
["sun"] |
The day(s) of the week that the job will run. Strings, either 3-letter abbreviations or the full name of the day, can be used interchangeably (e.g., “sunday”, “Sunday”, “sun”, or “Sun”, all map to NOTE: This field is additive with |
day_of_month | ["*"] or [1 ... 31] |
[1, 25] |
The date(s) of the month that the job will run. Allowed values are either NOTE: This field is additive with Invalid dates such as February 31st are ignored. |
Disabling a scheduled job¶
Job definitions are only be executed by the scheduler if enabled
is set to True
. If you have a job definition
that was previously running as a scheduled job, but should now be stopped, simply
datarobot.models.BatchPredictionJobDefinition.delete()
to remove it completely, or datarobot.models.BatchPredictionJobDefinition.update()
it with enabled=False
if you want to keep the definition, but stop the scheduled job from executing at intervals.
If a job is currently running, this will finish execution regardless.
>>> import datarobot as dr
>>> definition = dr.BatchPredictionJobDefinition.get("5dc5b1015e6e762a6241f9aa")
>>> definition.delete()