Create a Dataset from DataSource:
Added support for Custom Model Dependency Management. Please see custom model documentation. New features added:
- Added new argument
- New fields
- New class
datarobot.CustomModelVersionDependencyBuildto prepare custom model versions with dependencies.
- Made argument
CustomModelTest.createoptional to enable using custom model versions with dependencies
- New field
image_typeadded to class
Deployment.create_from_custom_model_versioncan be used to create a deployment from a custom model version.
- Added new argument
Added new parameters for starting and re-running Autopilot with customizable settings within
Added a new method to trigger Feature Impact calculation for a Custom Inference Image:
Added new method to retrieve number of iterations trained for early stopping models. Currently supports only tree-based models.
- A description can now be added or updated for a project.
- Added new parameters read_timeout and max_wait to method
Dataset.create_from_file. Values larger than the default can be specified for both to avoid timeouts when uploading large files.
- Added new parameter metric to
- Addded new parameter timeout to
BatchPredictionJob.downloadto indicate how many seconds to wait for the download to start (in case the job doesn’t start processing immediately). Set to
-1to disable. This parameter can also be sent as download_timeout to
BatchPredictionJob.score. If the timeout occurs, the pending job will be aborted.
- Addded new parameter read_timeout to
BatchPredictionJob.downloadto indicate how many seconds to wait between each downloaded chunk. This parameter can also be sent as download_read_timeout to
- Added parameter
BatchPredictionJobto both intake and output adapters for type jdbc.
- Consider blenders in recommendation can now be specified in
AdvancedOptions. Blenders will be included when autopilot chooses a model to prepare and recommend for deployment.
- Added optional parameter
Deployment.replace_modelto indicate the maximum time to wait for model replacement job to complete before erroring.
predictionExplanationMetadata["shapRemainingTotal"]while converting a predictions response to a data frame.
- Removed an extra column
BatchPredictionJobas it caused issues with never version of Trafaret validation.
predicted_vs_actualoptional in Feature Effects data because a feature may have insufficient qualified samples.
jdbc_urloptional in Data Store data because some data stores will not have it.
- The method
Project.get_datetime_modelsnow correctly returns all
DatetimeModelobjects for the project, instead of just the first 100.
- Fixed a documentation error related to snake_case vs camelCase in the JDBC settings payload.
- Make trafaret validator for datasets use a syntax that works properly with a wider range of trafaret versions.
- Handle extra keys in CustomModelTests and CustomModelVersions
ImageActivationMapnow supports regression projects.
- The default value for the
Project.set_targethas been changed from
- Added links to classes with duration parameters such as validation_duration and holdout_duration to provide duration string examples to users.
- The models documentation has been revised to include section on how to train a new model and how to run cross-validation or backtesting for a model.
Added new arguments
Model.request_training_predictions. New fields
shap_warningshave been added to class
TrainingPredictions. New fields
shap_metadatahave been added to class
TrainingPredictionsIteratorthat is returned by method
Added new arguments
Model.request_predictions. New fields
shap_warningshave been added to class
Predictions.get_all_as_dataframehas new argument
serializerthat specifies the retrieval and results validation method (
csv) for the predictions.
Added support for accessing Visual AI images and insights. See the DataRobot Python Package documentation, Visual AI Projects, section for details.
Users can request SHAP based predictions explanations for a models that support SHAP scores using
Added two new methods to
Datasetto lazily retrieve paginated responses.
It’s possible to create an Interaction feature by combining two categorical features together using
Project.create_interaction_feature. Operation result represented by
models.InteractionFeature.. Specific information about an interaction feature may be retrieved by its name using
DatasetFeaturelistclass to support featurelists on datasets in the AI Catalog. DatasetFeaturelists can be updated or deleted. Two new methods were also added to
Datasetto interact with DatasetFeaturelists. These are
Dataset.create_featurelistwhich list existing featurelists and create new featurelists on a dataset, respectively.
DatetimePartitioning. This will allow users to control the jobs per model used when building models. A higher number of
model_splitswill result in less downsampling, allowing the use of more post-processed data.
Added support for unsupervised projects.
Added support for external test set. Please see testset documentation
A new workflow is available for assessing models on external test sets in time series unsupervised projects. More information can be found in the documentation.
actual_value_column- name of the actual value column, can be passed only with date range.
PredictionDatasetobjects now contain the following new fields:
actual_value_column: Actual value column which was selected for this dataset.
detected_actual_value_column: A list of detected actual value column info.
- New warning is added to
- Scores and insights on external test sets can be retrieved using
Users can create payoff matrices for generating profit curves for binary classification projects using
datarobot.models.TargetDriftcan be used to retrieve target drift information.
datarobot.models.FeatureDriftcan be used to retrieve feature drift information.
Deployment.submit_actualswill submit actuals in batches if the total number of actuals exceeds the limit of one single request.
Deployment.create_from_custom_model_imagecan be used to create a deployment from a custom model image.
- Deployments now support predictions data collection that enables prediction requests and results to be saved in Predictions Data Storage. See
include_feature_discovery_entitiesare added to
Now it is possible to specify the number of training rows to use in feature impact computation on supported project types (that is everything except unsupervised, multi-class, time-series). This does not affect SHAP based feature impact. Extended methods:
Added support for custom models. Please see custom model documentation. Classes added:
datarobot.ExecutionEnvironmentVersionto create and manage custom model executions environments
datarobot.CustomModelVersionto create and manage custom inference models
datarobot.CustomModelTestto perform testing of custom models
Batch Prediction jobs now support forecast and historical Time Series predictions using the new argument
- Now it’s possible to create Relationships Configurations to introduce secondary datasets to projects. A configuration specifies additional datasets to be included to a project and how these datasets are related to each other, and the primary dataset. When a relationships configuration is specified for a project, Feature Discovery will create features automatically from these datasets.
RelationshipsConfiguration.createcreates a new relationships configuration between datasets
RelationshipsConfiguration.retrieveretrieve the requested relationships configuration
RelationshipsConfiguration.replacereplace the relationships configuration details with new one
RelationshipsConfiguration.deletedelete the relationships configuration
Made creating projects from a dataset easier through the new
These methods now provide additional metadata fields in Feature Impact results if called with with_metadata=True. Fields added:
Secondary dataset configuration retrieve and deletion is easier now though new
SecondaryDatasetConfigurations.deletesoft deletes a Secondary dataset configuration.
SecondaryDatasetConfigurations.getretrieve a Secondary dataset configuration.
Retrieve relationships configuration which is applied on the given feature discovery project using
- An issue with input validation of the Batch Prediction module
- parent_model_id was not visible for all frozen models
- Batch Prediction jobs that used other output types than local_file failed when using .wait_for_completion()
- A race condition in the Batch Prediction file scoring logic
Three new fields were added to the
Datasetobject. This reflects the updated fields in the public API routes at api/v2/datasets/. The added fields are:
- processing_state: Current ingestion process state of the dataset
- row_count: The number of rows in the dataset.
- size: The size of the dataset as a CSV in bytes.
datarobot.enums.VARIABLE_TYPE_TRANSFORM.CATEGORICALfor is deprecated for the following and will be removed in v2.22.
There is a new
Datasetobject that implements some of the public API routes at api/v2/datasets/. This also adds two new feature classes and a details class.
Create a Dataset by uploading from a file, URL or in-memory datasource.
Get Datasets or elements of Dataset with:
Dataset.listlists available Datasets
Dataset.getgets a specified Dataset
Dataset.updateupdates the Dataset with the latest server information.
Dataset.get_detailsgets the DatasetDetails of the Dataset.
Dataset.get_all_featuresgets a list of the Dataset’s Features.
Dataset.get_filedownloads the Dataset as a csv file.
Dataset.get_projectsgets a list of Projects that use the Dataset.
Modify, delete or un-delete a Dataset:
You can also create a Project using a Dataset with:
Now it’s possible to connect two or more datasets by specifying the relationships between them using Feature Engineering Graph so that DataRobot can automatically generate features based on connection between datasets. The
FeatureEngineeringGraphclass can now create, update, retrieve, list, delete feature engineering graphs call to methods
FeatureEngineeringGraph.createcreates a new feature engineering graph
FeatureEngineeringGraph.updateupdates the name and description of the feature engineering graph
FeatureEngineeringGraph.replacereplace the content of the feature engineering graph
FeatureEngineeringGraph.deletedelete the feature engineering graph
FeatureEngineeringGraph.retrieveretrieve the feature engineering graph
FeatureEngineeringGraph.listlist all the feature engineering graphs
It’s possible to share the feature engineering graph with others and list all the users who have access to a given feature engineering graph.
It is possible to create an alternative configuration for the secondary dataset which can be used during the prediction
SecondaryDatasetConfigurations.createallow to create secondary dataset configuration
You can now filter the deployments returned by the
Deployment.listcommand. You can do this by passing an instance of the
DeploymentListFiltersclass to the
filterskeyword argument. The currently supported filters are:
A new workflow is available for making predictions in time series projects. To that end,
PredictionDatasetobjects now contain the following new fields:
forecast_point_range: The start and end date of the range of dates available for use as the forecast point, detected based on the uploaded prediction dataset
data_start_date: A datestring representing the minimum primary date of the prediction dataset
data_end_date: A datestring representing the maximum primary date of the prediction dataset
max_forecast_date: A datestring representing the maximum forecast date of this prediction dataset
Additionally, users no longer need to specify a
predictions_end_datewhen uploading datasets for predictions in time series projects. More information can be found in the time series predictions documentation.
Per-class lift chart data is now available for multiclass models using
Unsupervised projects can now be created using the
Project.set_targetmethods by providing
unsupervised_mode=True, provided that the user has access to unsupervised machine learning functionality. Contact support for more information.
A new boolean attribute
unsupervised_modewas added to
datarobot.DatetimePartitioningSpecification. When it is set to True, datetime partitioning for unsupervised time series projects will be constructed for nowcasting:
Users can now configure the start and end of the training partition as well as the end of the validation partition for backtests in a datetime-partitioned project. More information and example usage can be found in the backtesting documentation.
- Updated the user agent header to show which python version.
Model.get_frozen_child_modelscan be used to retrieve models that are frozen from a given model
datarobot.enums.TS_BLENDER_METHODto make it clearer which blender methods are allowed for use in time series projects.
- An issue where uploaded CSV’s would loose quotes during serialization causing issues when columns containing line terminators where loaded in a dataframe, has been fixed
Project.get_association_featurelistsis now using the correct endpoint name, but the old one will continue to work
- Python API
PredictionServersupports now on-premise format of API response.
Projects can be cloned using
Calendars used in time series projects now support having series-specific events, for instance if a holiday only affects some stores. This can be controlled by using new argument of the
CalendarFile.createmethod. If multiseries id columns are not provided, calendar is considered to be single series and all events are applied to all series.
We have expanded prediction intervals availability to the following use-cases:
- Time series model deployments now support prediction intervals. See
- Prediction intervals are now supported for model exports for time series. To that end, a new optional parameter
prediction_intervals_sizehas been added to
More details on prediction intervals can be found in the prediction intervals documentation.
- Time series model deployments now support prediction intervals. See
Allowed pairwise interaction groups can now be specified in
AdvancedOptions. They will be used in GAM models during training.
New deployments features:
For multiclass models now it’s possible to get feature impact for each individual target class using
Added support for new Batch Prediction API.
It is now possible to create and retrieve basic, oauth and s3 credentials with
It’s now possible to get feature association statuses for featurelists using
You can also pass a specific featurelist_id into
Added documentation to
Project.get_metricsto detail the new
ascendingfield that indicates how a metric should be sorted.
Retraining of a model is processed asynchronously and returns a
Blender models can be retrained on a different set of data or a different feature list.
Word cloud ngrams now has
variablefield representing the source of the ngram.
WordCloud.ngrams_per_classcan be used to split ngrams for better usability in multiclass projects.
Project.set_targetsupport new optional parameters
Series accuracy retrieval methods (
DatetimeModel.download_series_accuracy_as_csv) for multiseries time series projects now support additional parameters for specifying what data to retrieve, including:
metric: Which metric to retrieve scores for
multiseries_value: Only returns series with a matching multiseries ID
order_by: An attribute by which to sort the results
- The datarobot package is now no longer a namespace package.
datarobot.enums.BLENDER_METHOD.FORECAST_DISTANCEis removed (deprecated in 2.18.0).
- Residuals charts can now be retrieved for non-time-aware regression models.
- Deployment monitoring can now be used to retrieve service stats, service health, accuracy info, permissions, and feature lists for deployments.
- Time series projects now support the Average by Forecast Distance blender, configured with more than one Forecast Distance. The blender blends the selected models, selecting the best three models based on the backtesting score for each Forecast Distance and averaging their predictions. The new blender method
FORECAST_DISTANCE_AVGhas beed added to
Deployment.submit_actualscan now be used to submit data about actual results from a deployed model, which can be used to calculate accuracy metrics.
- Monotonic constraints are now supported for OTV projects. To that end, the parameters
monotonic_decreasing_featurelist_idcan be specified in calls to
retrieving information about features, information about summarized categorical variables is now available in a new
Word Cloudsin multiclass projects, values of the target class for corresponding word or ngram can now be passed using the new
- Listing deployments using
Deployment.listnow support sorting and searching the results using the new
- You can now get the model associated with a model job by getting the
modelvariable on the
model job object.
Blueprintclass can now retrieve the
recommended_featurelist_id, which indicates which feature list is recommended for this blueprint. If the field is not present, then there is no recommended feature list for this blueprint.
Modelclass now can be used to retrieve the
- The method
Model.get_supported_capabilitiesnow has an extra field
supportsCodeGenerationto explain whether the model supports code generation.
- Calls to
Project.upload_datasetnow support uploading data via S3 URI and pathlib.Path objects.
- Errors upon connecting to DataRobot are now clearer when an incorrect API Token is used.
- The datarobot package is now a namespace package.
datarobot.enums.BLENDER_METHOD.FORECAST_DISTANCEis deprecated and will be removed in 2.19. Use
- Deployments can now be managed via the API by using the new
- Users can now list available prediction servers using
specifying datetime partitioningsettings , time series projects can now mark individual features as excluded from feature derivation using the
FeatureSettings.do_not_deriveattribute. Any features not specified will be assigned according to the
- Users can now submit multiple feature type transformations in a single batch request using
- Advanced Tuning for non-Eureqa models (beta feature) is now enabled by default for all users. As of v2.17, all models are now supported other than blenders, open source, prime, scaleout, baseline and user-created.
- Information on feature clustering and the association strength between pairs of numeric or categorical features is now available.
Project.get_associationscan be used to retrieve pairwise feature association statistics and
Project.get_association_matrix_detailscan be used to get a sample of the actual values used to measure association strength.
- number_of_do_not_derive_features has been added to the
datarobot.DatetimePartitioningclass to specify the number of features that are marked as excluded from derivation.
- Users with PyYAML>=5.1 will no longer receive a warning when using the datarobot package
- It is now possible to use files with unicode names for creating projects and prediction jobs.
- Users can now embed DataRobot-generated content in a
ComplianceDocTemplateusing keyword tags. See here for more details.
- The field
calendar_namehas been added to
datarobot.DatetimePartitioningto display the name of the calendar used for a project.
- Prediction intervals are now supported for start-end retrained models in a time series project.
- Previously, all backtests had to be run before prediction intervals for a time series project could be requested with predictions. Now, backtests will be computed automatically if needed when prediction intervals are requested.
Three new methods for Series Accuracy have been added to the
Users can now access prediction intervals data for each prediction with a
DatetimeModel. For each model, prediction intervals estimate the range of values DataRobot expects actual values of the target to fall within. They are similar to a confidence interval of a prediction, but are based on the residual errors measured during the backtesting for the selected model.
Information on the effective feature derivation window is now available for time series projects to specify the full span of historical data required at prediction time. It may be longer than the feature derivation window of the project depending on the differencing settings used.
Additionally, more of the project partitioning settings are also available on the
DatetimeModelclass. The new attributes are:
Prediction metadata is now included in the return of
CalendarFile.get_access_listhas been added to the
CalendarFileclass to return a list of users with access to a calendar file.
roleattribute has been added to the
CalendarFileclass to indicate the access level a current user has to a calendar file. For more information on the specific access levels, see the sharing documentation.
- Previously, attempting to retrieve the
calendar_idof a project without a set target would result in an error. This has been fixed to return
- Previously available for only Eureqa models, Advanced Tuning methods and objects, including
AdvancedTuningSession, now support all models other than blender, open source, and user-created models. Use of Advanced Tuning via API for non-Eureqa models is in beta and not available by default, but can be enabled.
- Calendar Files for time series projects can now be created and managed through the
- The dataframe returned from
datarobot.PredictionExplanations.get_all_as_dataframe()will now have each class label class_X be the same from row to row.
- The client is now more robust to networking issues by default. It will retry on more errors and respects Retry-After headers in HTTP 413, 429, and 503 responses.
- Added Forecast Distance blender for Time-Series projects configured with more than one Forecast Distance. It blends the selected models creating separate linear models for each Forecast Distance.
Projectcan now be shared with other users.
Project.upload_dataset_from_data_sourcewill return a
data_quality_warningsif potential problems exist around the uploaded dataset.
relax_known_in_advance_features_checkhas been added to
Project.upload_dataset_from_data_sourceto allow missing values from the known in advance features in the forecast window at prediction time.
cross_series_group_by_columnshas been added to
datarobot.DatetimePartitioningto allow users the ability to indicate how to further split series into related groups.
- Information retrieval for
ROC Curvehas been extended to include
- Fixes an issue where the client would not be usable if it could not be sure it was compatible with the configured server
- Methods for creating
datarobot.models.Project: create_from_mysql, create_from_oracle, and create_from_postgresql, deprecated in 2.11, have now been removed. Use
datarobot.FeatureSettingsattribute apriori, deprecated in 2.11, has been removed. Use
datarobot.DatetimePartitioningattribute default_to_a_priori, deprecated in 2.11, has been removed. Use
datarobot.DatetimePartitioningSpecificationattribute default_to_a_priori, deprecated in 2.11, has been removed. Use
- Advanced model insights notebook extended to contain information on visualisation of cumulative gains and lift charts.
- Fixed an issue where searches of the HTML documentation would sometimes hang indefinitely
- Python3 is now the primary interpreter used to build the docs (this does not affect the ability to use the package with Python2)
- Documentation for the Model Deployment interface has been removed after the corresponding interface was removed in 2.13.0.
- The new method
Model.get_supported_capabilitiesretrieves a summary of the capabilities supported by a particular model, such as whether it is eligible for Prime and whether it has word cloud data available.
- New class for working with model compliance documentation feature of DataRobot:
- New class for working with compliance documentation templates:
- New class
FeatureHistogramhas been added to retrieve feature histograms for a requested maximum bin count
- Time series projects now support binary classification targets.
- Cross series features can now be created within time series multiseries projects using the
aggregation_typeattributes of the
datarobot.DatetimePartitioningSpecification. See the Time Series documentation for more info.
- Client instantiation now checks the endpoint configuration and provides more informative error messages. It also automatically corrects HTTP to HTTPS if the server responds with a redirect to HTTPS.
Project.createnow accept an optional parameter of
dataset_filenameto specify a file name for the dataset. This is ignored for url and file path sources.
- New optional parameter fallback_to_parent_insights has been added to
Model.get_all_roc_curves. When True, a frozen model with missing insights will attempt to retrieve the missing insight data from its parent model.
number_of_known_in_advance_featuresattribute has been added to the
datarobot.DatetimePartitioningclass. The attribute specifies number of features that are marked as known in advance.
Project.set_worker_countcan now update the worker count on a project to the maximum number available to the user.
- Recommended Models API can now be used to retrieve model recommendations for datetime partitioned projects
- Timeseries projects can now accept feature derivation and forecast windows intervals in terms of
number of the rows rather than a fixed time unit.
Project.set_targetsupport new optional parameter windowsBasisUnit, either ‘ROW’ or detected time unit.
- Timeseries projects can now accept feature derivation intervals, forecast windows, forecast points and prediction start/end dates in milliseconds.
DataStorescan now be shared with other users.
- Training predictions for datetime partitioned projects now support the new data subset dr.enums.DATA_SUBSET.ALL_BACKTESTS for requesting the predictions for all backtest validation folds.
- The model recommendation type “Recommended” (deprecated in version 2.13.0) has been removed.
- Example notebooks have been updated:
- Notebooks now work in Python 2 and Python 3
- A notebook illustrating time series capability has been added
- The financial data example has been replaced with an updated introductory example.
- To supplement the embedded Python notebooks in both the PDF and HTML docs bundles, the notebook files and supporting data can now be downloaded from the HTML docs bundle.
- Fixed a minor typo in the code sample for
- The new method
Model.get_or_request_feature_impactfunctionality will attempt to request feature impact and return the newly created feature impact object or the existing object so two calls are no longer required.
- New methods and objects, including
AdvancedTuningSession, were added to support the setting of Advanced Tuning parameters. This is currently supported for Eureqa models only.
is_starredattribute has been added to the
Modelclass. The attribute specifies whether a model has been marked as starred by user or not.
- Model can be marked as starred or being unstarred with
- When listing models with
Project.get_models, the model list can now be filtered by the
- A custom prediction threshold may now be configured for each model via
Model.set_prediction_threshold. When making predictions in binary classification projects, this value will be used when deciding between the positive and negative classes.
Project.check_blendablecan be used to confirm if a particular group of models are eligible for blending as some are not, e.g. scaleout models and datetime models with different training lengths.
- Individual cross validation scores can be retrieved for new models using
- Python 3.7 is now supported.
- Feature impact now returns not only the impact score for the features but also whether they were detected to be redundant with other high-impact features.
- A new
is_blockedattribute has been added to the
Jobclass, specifying whether a job is blocked from execution because one or more dependencies are not yet met.
Featurelistobject now has new attributes reporting its creation time, whether it was created by a user or by DataRobot, and the number of models using the featurelist, as well as a new description field.
- Featurelists can now be renamed and have their descriptions updated with
- Featurelists can now be deleted with
ModelRecommendation.getnow accepts an optional parameter of type
datarobot.enums.RECOMMENDED_MODEL_TYPEwhich can be used to get a specific kind of recommendation.
- Previously computed predictions can now be listed and retrieved with the
Predictionsclass, without requiring a reference to the original
- The Model Deployment interface which was previously visible in the client has been removed to allow the interface to mature, although the raw API is available as a “beta” API without full backwards compatibility support.
- The feature previously referred to as “Reason Codes” has been renamed to “Prediction
Explanations”, to provide increased clarity and accessibility. The old
ReasonCodesinterface has been deprecated and replaced with
- The recommendation type “Recommended” is deprecated and will no longer be returned in v2.14 of the API.
- The new
ModelRecommendationclass can be used to retrieve the recommended models for a project.
- A new helper method cross_validate was added to class Model. This method can be used to request Model’s Cross Validation score.
- Training a model with monotonic constraints is now supported. Training with monotonic constraints allows users to force models to learn monotonic relationships with respect to some features and the target. This helps users create accurate models that comply with regulations (e.g. insurance, banking). Currently, only certain blueprints (e.g. xgboost) support this feature, and it is only supported for regression and binary classification projects.
- DataRobot now supports “Database Connectivity”, allowing databases to be used as the source of data for projects and prediction datasets. The feature works on top of the JDBC standard, so a variety of databases conforming to that standard are available; a list of databases with tested support for DataRobot is available in the user guide in the web application. See Database Connectivity for details.
- Added a new feature to retrieve feature logs for time series projects. Check
- New attributes supporting monotonic constraints have been added to the
Blueprintclasses. See monotonic constraints for more information on how to configure monotonic constraints.
- New parameters predictions_start_date and predictions_end_date added to
Project.upload_datasetto support bulk predictions upload for time series projects.
- Methods for creating
datarobot.models.Project: create_from_mysql, create_from_oracle, and create_from_postgresql, have been deprecated and will be removed in 2.14. Use
datarobot.FeatureSettingsattribute apriori, has been deprecated and will be removed in 2.14. Use
datarobot.DatetimePartitioningattribute default_to_a_priori, has been deprecated and will be removed in 2.14.
datarobot.DatetimePartitioningSpecificationattribute default_to_a_priori, has been deprecated and will be removed in 2.14. Use
- Retry settings compatible with those offered by urllib3’s Retry interface can now be configured. By default, we will now retry connection errors that prevented requests from arriving at the server.
- “Advanced Model Insights” example has been updated to properly handle bin weights when rebinning.
ModelDeploymentclass can be used to track status and health of models deployed for predictions.
- DataRobot API now supports creating 3 new blender types - Random Forest, TensorFlow, LightGBM.
- Multiclass projects now support blenders creation for 3 new blender types as well as Average and ENET blenders.
- Models can be trained by requesting a particular row count using the new
training_row_countargument with Project.train, Model.train and Model.request_frozen_model in non-datetime partitioned projects, as an alternative to the previous option of specifying a desired percentage of the project dataset. Specifying model size by row count is recommended when the float precision of
sample_pctcould be problematic, e.g. when training on a small percentage of the dataset or when training up to partition boundaries.
- New attributes
scaleout_max_train_rowshave been added to
max_train_rowsspecified the equivalent value to the existing
max_train_pctas a row count. The scaleout fields can be used to see how far scaleout models can be trained on projects, which for projects taking advantage of scalable ingest may exceed the limits on the data available to non-scaleout blueprints.
- Individual features can now be marked as a priori or not a priori using the new feature_settings attribute when setting the target or specifying datetime partitioning settings on time series projects. Any features not specified in the feature_settings parameter will be assigned according to the default_to_a_priori value.
- Three new options have been made available in the
datarobot.DatetimePartitioningSpecificationclass to fine-tune how time-series projects derive modeling features. treat_as_exponential can control whether data is analyzed as an exponential trend and transformations like log-transform are applied. differencing_method can control which differencing method to use for stationary data. periodicities can be used to specify periodicities occuring within the data. All are optional and defaults will be chosen automatically if they are unspecified.
training_row_countis available on non-datetime models as well as “rowCount” based datetime models. It reports the number of rows used to train the model (equivalent to
- Features retrieved from
- The documented default connect_timeout will now be correctly set for all configuration mechanisms,
so that requests that fail to reach the DataRobot server in a reasonable amount of time will now
error instead of hanging indefinitely. If you observe that you have started seeing
ConnectTimeouterrors, please configure your connect_timeout to a larger value.
- Version of
trafaretlibrary this package depends on is now pinned to
trafaret>=0.7,<1.1since versions outside that range are known to be incompatible.
- The DataRobot API supports the creation, training, and predicting of multiclass classification projects. DataRobot, by default, handles a dataset with a numeric target column as regression. If your data has a numeric cardinality of fewer than 11 classes, you can override this behavior to instead create a multiclass classification project from the data. To do so, use the set_target function, setting target_type=’Multiclass’. If DataRobot recognizes your data as categorical, and it has fewer than 11 classes, using multiclass will create a project that classifies which label the data belongs to.
- The DataRobot API now includes Rating Tables. A rating table is an exportable csv representation of a model. Users can influence predictions by modifying them and creating a new model with the modified table. See the documentation for more information on how to use rating tables.
- scaleout_modeling_mode has been added to the AdvancedOptions class used when setting a project target. It can be used to control whether scaleout models appear in the autopilot and/or available blueprints. Scaleout models are only supported in the Hadoop enviroment with the corresponding user permission set.
- A new premium add-on product, Time Series, is now available. New projects can be created as time series projects which automatically derive features from past data and forecast the future. See the time series documentation for more information.
- The Feature object now returns the EDA summary statistics (i.e., mean, median, minum, maximum, and standard deviation) for features where this is available (e.g., numeric, date, time, currency, and length features). These summary statistics will be formatted in the same format as the data it summarizes.
- The DataRobot API now supports Training Predictions workflow. Training predictions are made by a model for a subset of data from original dataset. User can start a job which will make those predictions and retrieve them. See the documentation for more information on how to use training predictions.
- DataRobot now supports retrieving a model blueprint chart and a model blueprint docs.
- With the introduction of Multiclass Classification projects, DataRobot needed a better way to explain the performance of a multiclass model so we created a new Confusion Chart. The API now supports retrieving and interacting with confusion charts.
- DatetimePartitioningSpecification now includes the optional disable_holdout flag that can be used to disable the holdout fold when creating a project with datetime partitioning.
- When retrieving reason codes on a project using an exposure column, predictions that are adjusted for exposure can be retrieved.
- File URIs can now be used as sourcedata when creating a project or uploading a prediction dataset. The file URI must refer to an allowed location on the server, which is configured as described in the user guide documentation.
- The advanced options available when setting the target have been extended to include the new parameter ‘events_count’ as a part of the AdvancedOptions object to allow specifying the events count column. See the user guide documentation in the webapp for more information on events count.
- PredictJob.get_predictions now returns predicted probability for each class in the dataframe.
- PredictJob.get_predictions now accepts prefix parameter to prefix the classes name returned in the predictions dataframe.
- Add target_type parameter to set_target() and start(), used to override the project default.
- Online documentation hosting has migrated from PythonHosted to Read The Docs. Minor code changes have been made to support this.
- Lift chart data for models can be retrieved using the Model.get_lift_chart and Model.get_all_lift_charts methods.
- ROC curve data for models in classification projects can be retrieved using the Model.get_roc_curve and Model.get_all_roc_curves methods.
- Semi-automatic autopilot mode is removed.
- Word cloud data for text processing models can be retrieved using Model.get_word_cloud method.
- Scoring code JAR file can be downloaded for models supporting code generation.
- A __repr__ method has been added to the PredictionDataset class to improve readability when using the client interactively.
- Model.get_parameters now includes an additional key in the derived features it includes, showing the coefficients for individual stages of multistage models (e.g. Frequency-Severity models).
- When training a DatetimeModel on a window of data, a time_window_sample_pct can be specified to take a uniform random sample of the training data instead of using all data within the window.
- Installing of DataRobot package now has an “Extra Requirements” section that will install all of the dependencies needed to run the example notebooks.
- A new example notebook describing how to visualize some of the newly available model insights including lift charts, ROC curves, and word clouds has been added to the examples section.
- A new section for Common Issues has been added to Getting Started to help debug issues related to client installation and usage.
- Fixed a bug with Model.get_parameters raising an exception on some valid parameter values.
- Fixed sorting order in Feature Impact example code snippet.
- A new partitioning method (datetime partitioning) has been added. The recommended workflow is to preview the partitioning by creating a DatetimePartitioningSpecification and passing it into DatetimePartitioning.generate, inspect the results and adjust as needed for the specific project dataset by adjusting the DatetimePartitioningSpecification and re-generating, and then set the target by passing the final DatetimePartitioningSpecification object to the partitioning_method parameter of Project.set_target.
- When interacting with datetime partitioned projects, DatetimeModel can be used to access more information specific to models in datetime partitioned projects. See the documentation for more information on differences in the modeling workflow for datetime partitioned projects.
- The advanced options available when setting the target have been extended to include the new parameters ‘offset’ and ‘exposure’ (part of the AdvancedOptions object) to allow specifying offset and exposure columns to apply to predictions generated by models within the project. See the user guide documentation in the webapp for more information on offset and exposure columns.
- Blueprints can now be retrieved directly by project_id and blueprint_id via Blueprint.get.
- Blueprint charts can now be retrieved directly by project_id and blueprint_id via BlueprintChart.get. If you already have an instance of Blueprint you can retrieve its chart using Blueprint.get_chart.
- Model parameters can now be retrieved using ModelParameters.get. If you already have an instance of Model you can retrieve its parameters using Model.get_parameters.
- Blueprint documentation can now be retrieved using Blueprint.get_documents. It will contain information about the task, its parameters and (when available) links and references to additional sources.
- The DataRobot API now includes Reason Codes. You can now compute reason codes for prediction datasets. You are able to specify thresholds on which rows to compute reason codes for to speed up computation by skipping rows based on the predictions they generate. See the reason codes documentation for more information.
- A new parameter has been added to the AdvancedOptions used with Project.set_target. By specifying accuracyOptimizedMb=True when creating AdvancedOptions, longer-running models that may have a high accuracy will be included in the autopilot and made available to run manually.
- A new option for Project.create_type_transform_feature has been added which explicitly truncates data when casting numerical data as categorical data.
- Added 2 new blenders for projects that use MAD or Weighted MAD as a metric. The MAE blender uses BFGS optimization to find linear weights for the blender that minimize mean absolute error (compared to the GLM blender, which finds linear weights that minimize RMSE), and the MAEL1 blender uses BFGS optimization to find linear weights that minimize MAE + a L1 penalty on the coefficients (compared to the ENET blender, which minimizes RMSE + a combination of the L1 and L2 penalty on the coefficients).
- Fixed a bug (affecting Python 2 only) with printing any model (including frozen and prime models) whose model_type is not ascii.
- FrozenModels were unable to correctly use methods inherited from Model. This has been fixed.
- When calling get_result for a Job, ModelJob, or PredictJob that has errored, AsyncProcessUnsuccessfulError will now be raised instead of JobNotFinished, consistently with the behaviour of get_result_when_complete.
- Support for the experimental Recommender Problems projects has been removed. Any code relying on RecommenderSettings or the recommender_settings argument of Project.set_target and Project.start will error.
Project.update, deprecated in v2.2.32, has been removed in favor of specific updates:
- The link to Configuration from the Quickstart page has been fixed.
- Fixed a bug (affecting Python 2 only) with printing blueprints whose names are not ascii.
- Fixed an issue where the weights column (for weighted projects) did not appear in the advanced_options of a Project.
- Methods to work with blender models have been added. Use Project.blend method to create new blenders, Project.get_blenders to get the list of existing blenders and BlenderModel.get to retrieve a model with blender-specific information.
- Projects created via the API can now use smart downsampling when setting the target by passing smart_downsampled and majority_downsampling_rate into the AdvancedOptions object used with Project.set_target. The smart sampling options used with an existing project will be available as part of Project.advanced_options.
- Support for frozen models, which use tuning parameters from a parent model for more efficient training, has been added. Use Model.request_frozen_model to create a new frozen model, Project.get_frozen_models to get the list of existing frozen models and FrozenModel.get to retrieve a particular frozen model.
- The inferred date format (e.g. “%Y-%m-%d %H:%M:%S”) is now included in the Feature object. For non-date features, it will be None.
- When specifying the API endpoint in the configuration, the client will now behave correctly for endpoints with and without trailing slashes.
- The premium add-on product DataRobot Prime has been added. You can now approximate a model on the leaderboard and download executable code for it. See documentation for further details, or talk to your account representative if the feature is not available on your account.
- (Only relevant for on-premise users with a Standalone Scoring cluster.) Methods (request_transferable_export and download_export) have been added to the Model class for exporting models (which will only work if model export is turned on). There is a new class ImportedModel for managing imported models on a Standalone Scoring cluster.
- It is now possible to create projects from a WebHDFS, PostgreSQL, Oracle or MySQL data source. For more information see the documentation for the relevant Project classmethods: create_from_hdfs, create_from_postgresql, create_from_oracle and create_from_mysql.
- Job.wait_for_completion, which waits for a job to complete without returning anything, has been added.
- The client will now check the API version offered by the server specified in configuration, and give a warning if the client version is newer than the server version. The DataRobot server is always backwards compatible with old clients, but new clients may have functionality that is not implemented on older server versions. This issue mainly affects users with on-premise deployments of DataRobot.
- Fixed an issue where Model.request_predictions might raise an error when predictions finished very quickly instead of returning the job.
- To set the target with quickrun autopilot, call Project.set_target with mode=AUTOPILOT_MODE.QUICK instead of specifying quickrun=True.
- Semi-automatic mode for autopilot has been deprecated and will be removed in 3.0. Use manual or fully automatic instead.
- Use of the quickrun argument in Project.set_target has been deprecated and will be removed in 3.0. Use mode=AUTOPILOT_MODE.QUICK instead.
- It is now possible to control the SSL certificate verification by setting the parameter ssl_verify in the config file.
- The “Modeling Airline Delay” example notebook has been updated to work with the new 2.3 enhancements.
- Documentation for the generic Job class has been added.
- Class attributes are now documented in the API Reference section of the documentation.
- The changelog now appears in the documentation.
- There is a new section dedicated to configuration, which lists all of the configuration options and their meanings.
- The DataRobot API now includes Feature Impact, an approach to measuring the relevance of each feature that can be applied to any model. The Model class now includes methods request_feature_impact (which creates and returns a feature impact job) and get_feature_impact (which can retrieve completed feature impact results).
- A new improved workflow for predictions now supports first uploading a dataset via Project.upload_dataset, then requesting predictions via Model.request_predictions. This allows us to better support predictions on larger datasets and non-ascii files.
- Datasets previously uploaded for predictions (represented by the PredictionDataset class) can be listed from Project.get_datasets and retrieve and deleted via PredictionDataset.get and PredictionDataset.delete.
- You can now create a new feature by re-interpreting the type of an existing feature in a project by using the Project.create_type_transform_feature method.
- The Job class now includes a get method for retrieving a job and a cancel method for canceling a job.
- All of the jobs classes (Job, ModelJob, PredictJob) now include the following new methods: refresh (for refreshing the data in the job object), get_result (for getting the completed resource resulting from the job), and get_result_when_complete (which waits until the job is complete and returns the results, or times out).
- A new method Project.refresh can be used to update Project objects with the latest state from the server.
- A new function datarobot.async.wait_for_async_resolution can be used to poll for the resolution of any generic asynchronous operation on the server.
- The JOB_TYPE enum now includes FEATURE_IMPACT.
- The QUEUE_STATUS enum now includes ABORTED and COMPLETED.
- The Project.create method now has a read_timeout parameter which can be used to keep open the connection to DataRobot while an uploaded file is being processed. For very large files this time can be substantial. Appropriately raising this value can help avoid timeouts when uploading large files.
- The method Project.wait_for_autopilot has been enhanced to error if the project enters a state where autopilot may not finish. This avoids a situation that existed previously where users could wait indefinitely on their project that was not going to finish. However, users are still responsible to make sure a project has more than zero workers, and that the queue is not paused.
- Feature.get now supports retrieving features by feature name. (For backwards compatibility, feature IDs are still supported until 3.0.)
- File paths that have unicode directory names can now be used for creating projects and PredictJobs. The filename itself must still be ascii, but containing directory names can have other encodings.
- Now raises more specific JobAlreadyRequested exception when we refuse a model fitting request as a duplicate. Users can explicitly catch this exception if they want it to be ignored.
- A file_name attribute has been added to the Project class, identifying the file name associated with the original project dataset. Note that if the project was created from a data frame, the file name may not be helpful.
- The connect timeout for establishing a connection to the server can now be set directly. This can be done in the yaml configuration of the client, or directly in the code. The default timeout has been lowered from 60 seconds to 6 seconds, which will make detecting a bad connection happen much quicker.
- Fixed a bug (affecting Python 2 only) with printing features and featurelists whose names are not ascii.
- Job class hierarchy is rearranged to better express the relationship between these objects. See documentation for datarobot.models.job for details.
- Featurelist objects now have a project_id attribute to indicate which project they belong to. Directly accessing the project attribute of a Featurelist object is now deprecated
- Support INI-style configuration, which was deprecated in v2.1, has been removed. yaml is the only supported configuration format.
- The method Project.get_jobs method, which was deprecated in v2.1, has been removed. Users should use the Project.get_model_jobs method instead to get the list of model jobs.
- PredictJob.create has been deprecated in favor of the alternate workflow using Model.request_predictions.
- Feature.converter (used internally for object construction) has been made private.
- Model.fetch_resource_data has been deprecated and will be removed in 3.0. To fetch a model from
- its ID, use Model.get.
- The ability to use Feature.get with feature IDs (rather than names) is deprecated and will be removed in 3.0.
- Instantiating a Project, Model, Blueprint, Featurelist, or Feature instance from a dict of data is now deprecated. Please use the from_data classmethod of these classes instead. Additionally, instantiating a Model from a tuple or by using the keyword argument data is also deprecated.
- Use of the attribute Featurelist.project is now deprecated. You can use the project_id attribute of a Featurelist to instantiate a Project instance using Project.get.
- Use of the attributes Model.project, Model.blueprint, and Model.featurelist are all deprecated now to avoid use of partially instantiated objects. Please use the ids of these objects instead.
- Using a Project instance as an argument in Featurelist.get is now deprecated. Please use a project_id instead. Similarly, using a Project instance in Model.get is also deprecated, and a project_id should be used in its place.
- Previously it was possible (though unintended) that the client configuration could be mixed through environment variables, configuration files, and arguments to datarobot.Client. This logic is now simpler - please see the Getting Started section of the documentation for more information.
- Fixed a bug with non-ascii project names using the package with Python 2.
- Fixed an error that occurred when printing projects that had been constructed from an ID only or printing printing models that had been constructed from a tuple (which impacted printing PredictJobs).
- Fixed a bug with project creation from non-ascii file names. Project creation from non-ascii file names is not supported, so this now raises a more informative exception. The project name is no longer used as the file name in cases where we do not have a file name, which prevents non-ascii project names from causing problems in those circumstances.
- Fixed a bug (affecting Python 2 only) with printing projects, features, and featurelists whose names are not ascii.
Feature.getmethods have been added for feature retrieval.
- A generic
Jobentity has been added for use in retrieving the entire queue at once. Calling
Project.get_all_jobswill retrieve all (appropriately filtered) jobs from the queue. Those can be cancelled directly as generic jobs, or transformed into instances of the specific job class using
PredictJob.from_job, which allow all functionality previously available via the ModelJob and PredictJob interfaces.
scoring_typeparameters, similar to
- Deprecation warning filters have been updated. By default, a filter will be added ensuring that usage of deprecated features will display a warning once per new usage location. In order to hide deprecation warnings, a filter like warnings.filterwarnings(‘ignore’, category=DataRobotDeprecationWarning) can be added to a script so no such warnings are shown. Watching for deprecation warnings to avoid reliance on deprecated features is recommended.
- If your client is misconfigured and does not specify an endpoint, the cloud production server is no longer used as the default as in many cases this is not the correct default.
- This changelog is now included in the distributable of the client.
- Fixed an issue where updating the global client would not affect existing objects with cached clients. Now the global client is used for every API call.
- An issue where mistyping a filepath for use in a file upload has been resolved. Now an error will be raised if it looks like the raw string content for modeling or predictions is just one single line.
- Use of username and password to authenticate is no longer supported - use an API token instead.
- Usage of
Project.get_modelsis not supported both in filtering and ordering of models
- Default value of
Model.trainmethod is now
100. If the default value is used, models will be trained with all of the available training data based on project configuration, rather than with entire dataset including holdout for the previous default value of
Project.listwhich was deprecated in v2.0 has been removed.
Project.startwhich was deprecated in v0.2 has been removed.
Project.statusmethod which was deprecated in v0.2 has been removed.
Project.wait_for_aim_stagemethod which was deprecated in v0.2 has been removed.
retrymodule which were deprecated in v2.1 were removed.
- Package renamed to
Project.updatedeprecated in favor of specific updates:
- A new use case involving financial data has been added to the
- Added documentation for the partition methods.
- In Python 2, using a unicode token to instantiate the client will now work correctly.
- The minimum required version of
trafarethas been upgraded to 0.7.1 to get around an incompatibility between it and
- Default to reading YAML config file from ~/.config/datarobot/drconfig.yaml
- Allow config_path argument to client
wait_for_autopilotmethod added to Project. This method can be used to block execution until autopilot has finished running on the project.
- Support for specifying which featurelist to use with initial autopilot in
Project.get_predict_jobsmethod has been added, which looks up all prediction jobs for a project
Project.start_autopilotmethod has been added, which starts autopilot on specified featurelist
- The schema for
PredictJobin DataRobot API v2.1 now includes a
message. This attribute has been added to the PredictJob class.
PredictJob.cancelnow exists to cancel prediction jobs, mirroring
Project.from_asyncis a new classmethod that can be used to wait for an async resolution in project creation. Most users will not need to know about it as it is used behind the scenes in
Project.set_target, but power users who may run into periodic connection errors will be able to catch the new ProjectAsyncFailureError and decide if they would like to resume waiting for async process to resolve
AUTOPILOT_MODEenum now uses string names for autopilot modes instead of numbers
RetryManagerutils are now deprecated
- INI-style config files are now deprecated (in favor of YAML config files)
- Several functions in the utils submodule are now deprecated (they are being moved elsewhere and are not considered part of the public interface)
Project.get_jobshas been renamed
Project.get_model_jobsfor clarity and deprecated
- Support for the experimental date partitioning has been removed in DataRobot API, so it is being removed from the client immediately.
- In several places where
AppPlatformErrorwas being raised, now
InputNotUnderstoodErrorare now used. With this change, one can now safely assume that when catching an
AppPlatformErrorit is because of an unexpected response from the server.
AppPlatformErrorhas gained a two new attributes,
status_codewhich is the HTTP status code of the unexpected response from the server, and
error_codewhich is a DataRobot-defined error code.
error_codeis not used by any routes in DataRobot API 2.1, but will be in the future. In cases where it is not provided, the instance of
AppPlatformErrorwill have the attribute
- Two new subclasses of
AppPlatformErrorhave been introduced,
ClientError(for 400-level response status codes) and
ServerError(for 500-level response status codes). These will make it easier to build automated tooling that can recover from periodic connection issues while polling.
- If a
ServerErroroccurs during a call to
Project.from_async, then a
ProjectAsyncFailureError(a subclass of AsyncFailureError) will be raised. That exception will have the status_code of the unexpected response from the server, and the location that was being polled to wait for the asynchronous process to resolve.
PredictJobclass was added to work with prediction jobs
wait_for_async_predictionsfunction added to predict_job module
- The order_by parameter of the
Project.listis now deprecated.
Projet.set_targetwill re-fetch the project data after it succeeds, keeping the client side in sync with the state of the project on the server
DuplicateFeaturesErrorexception if passed list of features contains duplicates
Project.get_modelsnow supports snake_case arguments to its order_by keyword
Project.wait_for_aim_stageis now deprecated, as the REST Async flow is a more reliable method of determining that project creation has completed successfully
Project.statusis deprecated in favor of
Project.startis deprecated in favor of
Project.wait_for_aim_stagechanged to support Python 3
- Fixed incorrect value of
- Models returned by
Project.get_modelswill now be correctly ordered when the order_by keyword is used
- Pinned versions of required libraries
Official release of v0.2
- Updated documentation
- Renamed parameter name of Project.create and Project.start to project_name
- Removed Model.predict method
- wait_for_async_model_creation function added to modeljob module
- wait_for_async_status_service of Project class renamed to _wait_for_async_status_service
- Can now use auth_token in config file to configure SDK
- Fixes a method that pointed to a removed route
- Added featurelist_id attribute to ModelJob class
- Removes model attribute from ModelJob class
- Project creation raises AsyncProjectCreationError if it was unsuccessful
- Removed Model.list_prime_rulesets and Model.get_prime_ruleset methods
- Removed Model.predict_batch method
- Removed Project.create_prime_model method
- Removed PrimeRuleSet model
- Adds backwards compatibility bridge for ModelJob async
- Adds ModelJob.get and ModelJob.get_model
- Minor bugfixes in wait_for_async_status_service
- Removes submit_model from Project until serverside implementation is improved
- Switches training URLs for new resource-based route at /projects/<project_id>/models/
- Job renamed to ModelJob, and using modelJobs route
- Fixes an inconsistency in argument order for train methods
- wait_for_async_status_service timeout increased from 60s to 600s
- Project.create will now handle both async/sync project creation
- All routes pluralized to sync with changes in API
- Project.get_jobs will request all jobs when no param specified
- dataframes from predict method will have pythonic names
- Project.get_status created, Project.status now deprecated
- Project.unlock_holdout created.
- Added quickrun parameter to Project.set_target
- Added modelCategory to Model schema
- Add permalinks featrue to Project and Model objects.
- Project.create_prime_model created
- Project.set_worker_count fix for compatibility with API change in project update.
- Add positive class to set_target.
- Change attributes names of Project, Model, Job and Blueprint
- features in Model, Job and Blueprint are now processes
- dataset_id and dataset_name migrated to featurelist_id and featurelist_name.
- samplepct -> sample_pct
- Model has now blueprint, project, and featurlist attributes.
- Minor bugfixes.
- Minor fixes regarding rename Job attributes. features attributes now named processes, samplepct now is sample_pct.
(May 27, 2015)
- Minor fixes regarding migrating API from under_score names to camelCase.
(May 20, 2015)
- Remove Project.upload_file, Project.upload_file_from_url and Project.attach_file methods. Moved all logic that uploading file to Project.create method.
(May 15, 2015)
- Fix uploading file causing a lot of memory usage. Minor bugfixes.