Datetime Partitioned Projects¶
If your dataset is modeling events taking place over time, datetime partitioning may be appropriate. Datetime partitioning ensures that when partitioning the dataset for training and validation, rows are ordered according to the value of the date partition feature.
Setting Up a Datetime Partitioned Project¶
After creating a project and before setting the target, create a
DatetimePartitioningSpecification to define how the project should
be partitioned. By passing the specification into
DatetimePartitioning.generate, the full
partitioning can be previewed before finalizing the partitioning. After verifying that the
partitioning is correct for the project dataset, pass the specification into
partitioning_method argument. Once modeling begins, the project can be used as normal.
The following code block shows the basic workflow for creating datetime partitioned projects.
import datarobot as dr project = dr.Project.create('some_data.csv') spec = dr.DatetimePartitioningSpecification('my_date_column') # can customize the spec as needed partitioning_preview = dr.DatetimePartitioning.generate(project.id, spec) # the preview generated is based on the project's data print partitioning_preview.to_dataframe() # hmm ... I want more backtests spec.number_of_backtests = 5 partitioning_preview = dr.DatetimePartitioning.generate(project.id, spec) print partitioning_preview.to_dataframe() # looks good project.set_target('target_column', partitioning_method=spec)
Modeling with a Datetime Partitioned Project¶
Model objects can still be used to interact with the project,
DatetimeModel objects, which are only retrievable from datetime partitioned
projects, provide more information including which date ranges and how many rows are used in
training and scoring the model as well as scores and statuses for individual backtests.
The autopilot workflow is the same as for other projects, but to manually train a model,
Model.train_datetime should be used in the place of
Model.train. To create frozen models,
Model.request_frozen_datetime_model should be used in place of
DatetimeModel.request_frozen_datetime_model. Unlike other projects, to trigger computation of
scores for all backtests use
DatetimeModel.score_backtests instead of using the scoring_type
argument in the
Dates, Datetimes, and Durations¶
When specifying a date or datetime for datetime partitioning, the client expects to receive and
will return a
datetime. Timezones may be specified, and will be assumed to be UTC if left
unspecified. All dates returned from DataRobot are in UTC with a timezone specified.
Datetimes may include a time, or specify only a date; however, they may have a non-zero time component only if the partition column included a time component in its date format. If the partition column included only dates like “24/03/2015”, then the time component of any datetimes, if present, must be zero.
When date ranges are specified with a start and an end date, the end date is exclusive, so only dates earlier than the end date are included, but the start date is inclusive, so dates equal to or later than the start date are included. If the start and end date are the same, then no dates are included in the range.
Durations are specified using a subset of ISO8601. Durations will be of the form PnYnMnDTnHnMnS where each “n” may be replaced with an integer value. Within the duration string,
- nY represents the number of years
- the nM following the “P” represents the number of months
- nD represents the number of days
- nH represents the number of hours
- the nM following the “T” represents the number of minutes
- nS represents the number of seconds
and “P” is used to indicate that the string represents a period and “T” indicates the beginning of the time component of the string. Any section with a value of 0 may be excluded. As with datetimes, if the partition column did not include a time component in its date format, the time component of any duration must be either unspecified or consist only of zeros.
- “P3Y6M” (three years, six months)
- “P1Y0M0DT0H0M0S” (one year)
- “P1Y5DT10H” (one year, 5 days, 10 hours)
datarobot.helpers.partitioning_methods.construct_duration_string is a helper method that can be used to construct appropriate duration strings.