Advanced Options
- class datarobot.helpers.AdvancedOptions(weights=None, response_cap=None, blueprint_threshold=None, seed=None, smart_downsampled=None, majority_downsampling_rate=None, offset=None, exposure=None, accuracy_optimized_mb=None, scaleout_modeling_mode=None, events_count=None, monotonic_increasing_featurelist_id=None, monotonic_decreasing_featurelist_id=None, only_include_monotonic_blueprints=None, allowed_pairwise_interaction_groups=None, blend_best_models=None, scoring_code_only=None, prepare_model_for_deployment=None, consider_blenders_in_recommendation=None, min_secondary_validation_model_count=None, shap_only_mode=None, autopilot_data_sampling_method=None, run_leakage_removed_feature_list=None, autopilot_with_feature_discovery=False, feature_discovery_supervised_feature_reduction=None, exponentially_weighted_moving_alpha=None, external_time_series_baseline_dataset_id=None, use_supervised_feature_reduction=True, primary_location_column=None, protected_features=None, preferable_target_value=None, fairness_metrics_set=None, fairness_threshold=None, bias_mitigation_feature_name=None, bias_mitigation_technique=None, include_bias_mitigation_feature_as_predictor_variable=None, default_monotonic_increasing_featurelist_id=None, default_monotonic_decreasing_featurelist_id=None, model_group_id=None, model_regime_id=None, model_baselines=None, series_id=None, forecast_distance=None, forecast_offsets=None, incremental_learning_only_mode=None, incremental_learning_on_best_model=None, number_of_incremental_learning_iterations_before_best_model_selection=None, chunk_definition_id=None, incremental_learning_early_stopping_rounds=None)
Used when setting the target of a project to set advanced options of modeling process.
- Parameters:
- weightsstring, optional
The name of a column indicating the weight of each row
- response_capbool or float in [0.5, 1), optional
Defaults to none here, but server defaults to False. If specified, it is the quantile of the response distribution to use for response capping.
- blueprint_thresholdint, optional
Number of hours models are permitted to run before being excluded from later autopilot stages Minimum 1
- seedint, optional
a seed to use for randomization
- smart_downsampledbool, optional
whether to use smart downsampling to throw away excess rows of the majority class. Only applicable to classification and zero-boosted regression projects.
- majority_downsampling_ratefloat, optional
the percentage between 0 and 100 of the majority rows that should be kept. Specify only if using smart downsampling. May not cause the majority class to become smaller than the minority class.
- offsetlist of str, optional
(New in version v2.6) the list of the names of the columns containing the offset of each row
- exposurestring, optional
(New in version v2.6) the name of a column containing the exposure of each row
- accuracy_optimized_mbbool, optional
(New in version v2.6) Include additional, longer-running models that will be run by the autopilot and available to run manually.
- scaleout_modeling_modestring, optional
(Deprecated in 2.28. Will be removed in 2.30) DataRobot no longer supports scaleout models. Please remove any usage of this parameter as it will be removed from the API soon.
- events_countstring, optional
(New in version v2.8) the name of a column specifying events count.
- monotonic_increasing_featurelist_idstring, optional
(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically increasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overridden at model submission time if desired.
- monotonic_decreasing_featurelist_idstring, optional
(new in version 2.11) the id of the featurelist that defines the set of features with a monotonically decreasing relationship to the target. If None, no such constraints are enforced. When specified, this will set a default for the project that can be overridden at model submission time if desired.
- only_include_monotonic_blueprintsbool, optional
(new in version 2.11) when true, only blueprints that support enforcing monotonic constraints will be available in the project or selected for the autopilot.
- allowed_pairwise_interaction_groupslist of tuple, optional
(New in version v2.19) For GA2M models - specify groups of columns for which pairwise interactions will be allowed. E.g. if set to [(A, B, C), (C, D)] then GA2M models will allow interactions between columns A x B, B x C, A x C, C x D. All others (A x D, B x D) will not be considered.
- blend_best_models: bool, optional
(New in version v2.19) blend best models during Autopilot run.
- scoring_code_only: bool, optional
(New in version v2.19) Keep only models that can be converted to scorable java code during Autopilot run
- shap_only_mode: bool, optional
(New in version v2.21) Keep only models that support SHAP values during Autopilot run. Use SHAP-based insights wherever possible. Defaults to False.
- prepare_model_for_deployment: bool, optional
(New in version v2.19) Prepare model for deployment during Autopilot run. The preparation includes creating reduced feature list models, retraining best model on higher sample size, computing insights and assigning “RECOMMENDED FOR DEPLOYMENT” label.
- consider_blenders_in_recommendation: bool, optional
(New in version 2.22.0) Include blenders when selecting a model to prepare for deployment in an Autopilot Run. Defaults to False.
- min_secondary_validation_model_count: int, optional
(New in version v2.19) Compute “All backtest” scores (datetime models) or cross validation scores for the specified number of the highest ranking models on the Leaderboard, if over the Autopilot default.
- autopilot_data_sampling_method: str, optional
(New in version v2.23) one of
datarobot.enums.DATETIME_AUTOPILOT_DATA_SAMPLING_METHOD
. Applicable for OTV projects only, defines if autopilot uses “random” or “latest” sampling when iteratively building models on various training samples. Defaults to “random” for duration-based projects and to “latest” for row-based projects.- run_leakage_removed_feature_list: bool, optional
(New in version v2.23) Run Autopilot on Leakage Removed feature list (if exists).
- autopilot_with_feature_discovery: bool, default ``False``, optional
(New in version v2.23) If true, autopilot will run on a feature list that includes features found via search for interactions.
- feature_discovery_supervised_feature_reduction: bool, optional
(New in version v2.23) Run supervised feature reduction for feature discovery projects.
- exponentially_weighted_moving_alpha: float, optional
(New in version v2.26) defaults to None, value between 0 and 1 (inclusive), indicates alpha parameter used in exponentially weighted moving average within feature derivation window.
- external_time_series_baseline_dataset_id: str, optional
(New in version v2.26) If provided, will generate metrics scaled by external model predictions metric for time series projects. The external predictions catalog must be validated before autopilot starts, see
Project.validate_external_time_series_baseline
and external baseline predictions documentation for further explanation.- use_supervised_feature_reduction: bool, default ``True` optional
Time Series only. When true, during feature generation DataRobot runs a supervised algorithm to retain only qualifying features. Setting to false can severely impact autopilot duration, especially for datasets with many features.
- primary_location_column: str, optional.
The name of primary location column.
- protected_features: list of str, optional.
(New in version v2.24) A list of project features to mark as protected for Bias and Fairness testing calculations. Max number of protected features allowed is 10.
- preferable_target_value: str, optional.
(New in version v2.24) A target value that should be treated as a favorable outcome for the prediction. For example, if we want to check gender discrimination for giving a loan and our target is named
is_bad
, then the positive outcome for the prediction would beNo
, which means that the loan is good and that’s what we treat as a favorable result for the loaner.- fairness_metrics_set: str, optional.
(New in version v2.24) Metric to use for calculating fairness. Can be one of
proportionalParity
,equalParity
,predictionBalance
,trueFavorableAndUnfavorableRateParity
orfavorableAndUnfavorablePredictiveValueParity
. Used and required only if Bias & Fairness in AutoML feature is enabled.- fairness_threshold: str, optional.
(New in version v2.24) Threshold value for the fairness metric. Can be in a range of
[0.0, 1.0]
. If the relative (i.e. normalized) fairness score is below the threshold, then the user will see a visual indication on the- bias_mitigation_feature_namestr, optional
The feature from protected features that will be used in a bias mitigation task to mitigate bias
- bias_mitigation_techniquestr, optional
One of datarobot.enums.BiasMitigationTechnique Options: - ‘preprocessingReweighing’ - ‘postProcessingRejectionOptionBasedClassification’ The technique by which we’ll mitigate bias, which will inform which bias mitigation task we insert into blueprints
- include_bias_mitigation_feature_as_predictor_variablebool, optional
Whether we should also use the mitigation feature as in input to the modeler just like any other categorical used for training, i.e. do we want the model to “train on” this feature in addition to using it for bias mitigation
- default_monotonic_increasing_featurelist_idstr, optional
Returned from server on Project GET request - not able to be updated by user
- default_monotonic_decreasing_featurelist_idstr, optional
Returned from server on Project GET request - not able to be updated by user
- model_group_id: Optional[str] = None,
(New in version v3.3) The name of a column containing the model group id for each row.
- model_regime_id: Optional[str] = None,
(New in version v3.3) The name of a column containing the model regime id for each row.
- model_baselines: Optional[List[str]] = None,
(New in version v3.3) The list of the names of the columns containing the model baselines
- series_id: Optional[str] = None,
(New in version v3.6) The name of a column containing the series id for each row.
- forecast_distance: Optional[str] = None,
(New in version v3.6) The name of a column containing the forecast distance for each row.
- forecast_offsets: Optional[List[str]] = None,
(New in version v3.6) The list of the names of the columns containing the forecast offsets for each row.
- incremental_learning_only_mode: Optional[bool] = None,
(New in version v3.4) Keep only models that support incremental learning during Autopilot run.
- incremental_learning_on_best_model: Optional[bool] = None,
(New in version v3.4) Run incremental learning on the best model during Autopilot run.
- chunk_definition_idstring, optional
(New in version v3.4) Unique definition for chunks needed to run automated incremental learning.
- incremental_learning_early_stopping_roundsOptional[int] = None
(New in version v3.4) Early stopping rounds used in the automated incremental learning service.
- number_of_incremental_learning_iterations_before_best_model_selection: Optional[int] = None
(New in version v3.6) Number of iterations top 5 models complete prior to best model selection. The minimum is 1, which means no additional iterations after the first iteration (initial model) will be run. The maximum is 10.
Examples
import datarobot as dr advanced_options = dr.AdvancedOptions( weights='weights_column', offset=['offset_column'], exposure='exposure_column', response_cap=0.7, blueprint_threshold=2, smart_downsampled=True, majority_downsampling_rate=75.0)
- get(_AdvancedOptions__key, _AdvancedOptions__default=None)
Return the value for key if key is in the dictionary, else default.
- Return type:
Optional
[Any
]
- pop(_AdvancedOptions__key)
If the key is not found, return the default if given; otherwise, raise a KeyError.
- Return type:
Optional
[Any
]
- update_individual_options(**kwargs)
Update individual attributes of an instance of
AdvancedOptions
.- Return type:
None