Advanced Model Tuning

Preparation

This notebook explores additional capabilities for tuning Eureqa models added in the 2.13 release of the DataRobot API.

Let’s start by importing the DataRobot API. (If you don’t have it installed already, you will need to install it in order to run this notebook.)

In [1]:
import datarobot as dr
from datarobot.enums import AUTOPILOT_MODE

Set Up

Now configure your DataRobot client (unless you’re using a configuration file)...

In [2]:
dr.Client(token='<API TOKEN>', endpoint='http://<YOUR ENDPOINT>/api/v2/')
Out[2]:
<datarobot.rest.RESTClientObject at 0x112f5d890>

Create Project with features

Create a new project using the 10K_diabetes dataset. This dataset contains a binary classification on the target readmitted.

In [3]:
url = 'https://s3.amazonaws.com/datarobot_public_datasets/10k_diabetes.xlsx'
project = dr.Project.create(url, project_name='10K Advanced Modeling')
print('Project ID: {}'.format(project.id))
Project ID: 5b624d97962d7442b5bdc014

Now, let’s set up the project and run Autopilot to get some models.

In [4]:
# Increase the worker count to make the project go faster.
project.set_worker_count(4)
Out[4]:
Project(10K Advanced Modeling)
In [5]:
project.set_target('readmitted', mode=AUTOPILOT_MODE.FULL_AUTO)
Out[5]:
Project(10K Advanced Modeling)
In [6]:
project.wait_for_autopilot()
In progress: 4, queued: 36 (waited: 0s)
In progress: 4, queued: 36 (waited: 1s)
In progress: 4, queued: 36 (waited: 1s)
In progress: 4, queued: 36 (waited: 2s)
In progress: 4, queued: 36 (waited: 3s)
In progress: 4, queued: 36 (waited: 5s)
In progress: 4, queued: 36 (waited: 9s)
In progress: 4, queued: 35 (waited: 16s)
In progress: 4, queued: 35 (waited: 29s)
In progress: 4, queued: 35 (waited: 50s)
In progress: 4, queued: 35 (waited: 70s)
In progress: 4, queued: 32 (waited: 91s)
In progress: 4, queued: 31 (waited: 111s)
In progress: 4, queued: 31 (waited: 131s)
In progress: 4, queued: 29 (waited: 152s)
In progress: 4, queued: 27 (waited: 172s)
In progress: 4, queued: 27 (waited: 192s)
In progress: 4, queued: 25 (waited: 213s)
In progress: 4, queued: 24 (waited: 233s)
In progress: 4, queued: 22 (waited: 254s)
In progress: 4, queued: 21 (waited: 274s)
In progress: 4, queued: 19 (waited: 295s)
In progress: 4, queued: 18 (waited: 315s)
In progress: 4, queued: 18 (waited: 335s)
In progress: 4, queued: 16 (waited: 356s)
In progress: 4, queued: 15 (waited: 376s)
In progress: 4, queued: 14 (waited: 397s)
In progress: 4, queued: 14 (waited: 417s)
In progress: 4, queued: 13 (waited: 437s)
In progress: 4, queued: 12 (waited: 458s)
In progress: 4, queued: 11 (waited: 478s)
In progress: 4, queued: 10 (waited: 499s)
In progress: 4, queued: 9 (waited: 519s)
In progress: 4, queued: 8 (waited: 539s)
In progress: 4, queued: 7 (waited: 559s)
In progress: 4, queued: 7 (waited: 580s)
In progress: 4, queued: 7 (waited: 600s)
In progress: 4, queued: 6 (waited: 620s)
In progress: 4, queued: 5 (waited: 641s)
In progress: 4, queued: 5 (waited: 661s)
In progress: 4, queued: 5 (waited: 681s)
In progress: 4, queued: 3 (waited: 701s)
In progress: 4, queued: 1 (waited: 721s)
In progress: 2, queued: 0 (waited: 742s)
In progress: 2, queued: 0 (waited: 762s)
In progress: 2, queued: 0 (waited: 782s)
In progress: 4, queued: 15 (waited: 802s)
In progress: 4, queued: 15 (waited: 823s)
In progress: 4, queued: 15 (waited: 843s)
In progress: 4, queued: 12 (waited: 863s)
In progress: 4, queued: 12 (waited: 883s)
In progress: 4, queued: 10 (waited: 904s)
In progress: 4, queued: 9 (waited: 924s)
In progress: 4, queued: 8 (waited: 944s)
In progress: 4, queued: 7 (waited: 964s)
In progress: 4, queued: 6 (waited: 985s)
In progress: 4, queued: 5 (waited: 1005s)
In progress: 4, queued: 5 (waited: 1025s)
In progress: 4, queued: 4 (waited: 1045s)
In progress: 4, queued: 3 (waited: 1066s)
In progress: 4, queued: 2 (waited: 1086s)
In progress: 4, queued: 0 (waited: 1106s)
In progress: 4, queued: 0 (waited: 1127s)
In progress: 3, queued: 0 (waited: 1147s)
In progress: 2, queued: 0 (waited: 1167s)
In progress: 1, queued: 0 (waited: 1187s)
In progress: 1, queued: 0 (waited: 1207s)
In progress: 1, queued: 0 (waited: 1227s)
In progress: 4, queued: 7 (waited: 1248s)
In progress: 4, queued: 7 (waited: 1268s)
In progress: 4, queued: 7 (waited: 1288s)
In progress: 4, queued: 7 (waited: 1309s)
In progress: 4, queued: 5 (waited: 1329s)
In progress: 4, queued: 4 (waited: 1349s)
In progress: 4, queued: 3 (waited: 1370s)
In progress: 4, queued: 1 (waited: 1390s)
In progress: 4, queued: 1 (waited: 1410s)
In progress: 4, queued: 0 (waited: 1431s)
In progress: 4, queued: 0 (waited: 1451s)
In progress: 2, queued: 0 (waited: 1471s)
In progress: 0, queued: 0 (waited: 1491s)
In progress: 4, queued: 28 (waited: 1511s)
In progress: 4, queued: 25 (waited: 1532s)
In progress: 4, queued: 25 (waited: 1552s)
In progress: 4, queued: 23 (waited: 1572s)
In progress: 4, queued: 21 (waited: 1593s)
In progress: 4, queued: 21 (waited: 1614s)
In progress: 4, queued: 21 (waited: 1634s)
In progress: 4, queued: 20 (waited: 1655s)
In progress: 4, queued: 20 (waited: 1675s)
In progress: 4, queued: 20 (waited: 1695s)
In progress: 4, queued: 18 (waited: 1716s)
In progress: 4, queued: 17 (waited: 1736s)
In progress: 4, queued: 13 (waited: 1756s)
In progress: 4, queued: 11 (waited: 1776s)
In progress: 4, queued: 9 (waited: 1797s)
In progress: 4, queued: 7 (waited: 1817s)
In progress: 4, queued: 3 (waited: 1838s)
In progress: 4, queued: 2 (waited: 1858s)
In progress: 4, queued: 0 (waited: 1878s)
In progress: 1, queued: 0 (waited: 1898s)
In progress: 0, queued: 0 (waited: 1918s)
In progress: 4, queued: 1 (waited: 1939s)
In progress: 1, queued: 0 (waited: 1959s)
In progress: 1, queued: 0 (waited: 1979s)
In progress: 1, queued: 0 (waited: 1999s)
In progress: 0, queued: 1 (waited: 2019s)
In progress: 4, queued: 0 (waited: 2039s)
In progress: 4, queued: 0 (waited: 2060s)
In progress: 4, queued: 0 (waited: 2080s)
In progress: 4, queued: 0 (waited: 2100s)
In progress: 2, queued: 0 (waited: 2120s)
In progress: 0, queued: 0 (waited: 2140s)

For the purposes of this example, let’s look at a Eureqa model.

In [7]:
models = project.get_models()
model = [
    m for m in models
    if m.model_type.startswith('Eureqa Generalized Additive Model')
][0]
model
Out[7]:
Model(u'Eureqa Generalized Additive Model Classifier (3000 Generations)')

Now that we have a model, we can start an advanced-tuning session based on that model.

In [8]:
tune = model.start_advanced_tuning_session()

Each model’s blueprint consists of a series of tasks. Each task contains some number of tunable parameters. Let’s take a look at the available (tunable) tasks.

In [9]:
tune.get_task_names()
Out[9]:
[u'Eureqa Generalized Additive Model Classifier (3000 Generations)',
 u'Matrix of word-grams occurrences',
 u'One-Hot Encoding']

Let’s drill down into the main Eureqa task, to see what parameters it has available.

In [10]:
task_name = 'Eureqa Generalized Additive Model Classifier (3000 Generations)'
tune.get_parameter_names(task_name)
Out[10]:
[u'XGB_subsample',
 u'EUREQA_timeout_sec',
 u'EUREQA_split_mode',
 u'EUREQA_building_block__subtraction',
 u'XGB_base_margin_initialize',
 u'EUREQA_building_block__two-argument_arctangent',
 u'EUREQA_building_block__modulo',
 u'EUREQA_building_block__division',
 u'XGB_scale_pos_weight',
 u'EUREQA_building_block__arcsine',
 u'XGB_tree_method',
 u'EUREQA_building_block__round',
 u'EUREQA_building_block__power',
 u'EUREQA_building_block__logical_not',
 u'EUREQA_building_block__minimum',
 u'EUREQA_max_generations',
 u'EUREQA_building_block__ceiling',
 u'XGB_n_estimators',
 u'EUREQA_building_block__gaussian_function',
 u'EUREQA_building_block__equal-to',
 u'XGB_colsample_bytree',
 u'EUREQA_building_block__inverse_hyperbolic_cosine',
 u'EUREQA_building_block__addition',
 u'XGB_min_child_weight',
 u'XGB_smooth_interval',
 u'EUREQA_building_block__arctangent',
 u'EUREQA_building_block__cosine',
 u'EUREQA_building_block__less-than-or-equal',
 u'EUREQA_building_block__logical_and',
 u'EUREQA_experimental__max_expression_ops',
 u'EUREQA_building_block__negation',
 u'EUREQA_building_block__square_root',
 u'EUREQA_weight_expr',
 u'EUREQA_building_block__logical_xor',
 u'XGB_learning_rate',
 u'XGB_interval',
 u'XGB_max_bin',
 u'EUREQA_building_block__greater-than',
 u'XGB_num_parallel_tree',
 u'EUREQA_building_block__input_variable',
 u'XGB_missing_value',
 u'EUREQA_target_expression_string',
 u'XGB_reg_alpha',
 u'EUREQA_building_block__sine',
 u'EUREQA_building_block__sign_function',
 u'XGB_min_split_loss',
 u'EUREQA_building_block__multiplication',
 u'EUREQA_building_block__hyperbolic_cosine',
 u'EUREQA_building_block__integer_constant',
 u'EUREQA_building_block__complementary_error_function',
 u'EUREQA_building_block__exponential',
 u'EUREQA_building_block__floor',
 u'XGB_random_state',
 u'EUREQA_building_block__greater-than-or-equal',
 u'EUREQA_building_block__step_function',
 u'EUREQA_building_block__tangent',
 u'EUREQA_building_block__natural_logarithm',
 u'EUREQA_building_block__logistic_function',
 u'EUREQA_random_seed',
 u'EUREQA_building_block__constant',
 u'EUREQA_building_block__absolute_value',
 u'EUREQA_prior_solutions',
 u'EUREQA_num_threads',
 u'EUREQA_building_block__hyperbolic_tangent',
 u'EUREQA_building_block__hyperbolic_sine',
 u'XGB_reg_lambda',
 u'EUREQA_building_block__inverse_hyperbolic_sine',
 u'EUREQA_training_split_expr',
 u'XGB_max_delta_step',
 u'EUREQA_building_block__inverse_hyperbolic_tangent',
 u'EUREQA_validation_split_expr',
 u'EUREQA_building_block__if-then-else',
 u'EUREQA_training_fraction',
 u'EUREQA_building_block__less-than',
 u'XGB_max_depth',
 u'EUREQA_sync_migrations',
 u'EUREQA_error_metric',
 u'EUREQA_building_block__logical_or',
 u'EUREQA_validation_fraction',
 u'EUREQA_building_block__maximum',
 u'EUREQA_building_block__factorial',
 u'EUREQA_building_block__arccosine',
 u'XGB_colsample_bylevel',
 u'EUREQA_building_block__error_function']

Eureqa does not search for periodic relationships in the data by default. Doing so would take time away from other types of modeling, so could reduce model quality if no periodic relationships are present. But let’s say we want to check whether Eureqa can find any strong periodic relationships in the data, by allowing it to consider models that use the mathematical sine() function.

In [11]:
tune.set_parameter(
    task_name=task_name,
    parameter_name='EUREQA_building_block__sine',
    value=1)

More values could be set if desired, using the same approach.

Now that some parameters have been set, the tuned model can be run:

In [12]:
job = tune.run()
new_model = job.get_result_when_complete()
new_model
Out[12]:
Model(u'Eureqa Generalized Additive Model Classifier (3000 Generations)')

You now have a new model that was run using your specified Advanced Tuning parameters.