Blueprints

Blueprints are a set of computation paths that a dataset passes through before producing predictions from data. A blueprint can be trained on a dataset to generate a model.

To modify blueprints using Python, reference the Blueprint Workshop documentation.

The following code block summarizes the interactions available for blueprints.

# Get the set of blueprints recommended by datarobot
import datarobot as dr
my_projects = dr.Project.list()
project = my_projects[0]
menu = project.get_blueprints()

first_blueprint = menu[0]
project.train(first_blueprint)

List blueprints

When you upload a file to a project and set a target, you receive a set of recommended blueprints that are appropriate for the task at hand.

Use get_blueprints to get the list of blueprints recommended for a project:

project = dr.Project.get('5506fcd38bd88f5953219da0')
menu = project.get_blueprints()
blueprint = menu[0]

Get a blueprint

If you already have a blueprint_id from a model you can retrieve the blueprint directly.

project_id = '5506fcd38bd88f5953219da0'
project = dr.Project.get(project_id)
models = project.get_models()
model = models[0]
blueprint = Blueprint.get(project_id, model.blueprint_id)

Get a blueprint chart

You can retrieve charts for all blueprints that are either from a blueprint menu or are already used in a model. You can also get a blueprint’s representation in Graphviz DOT format to render it into the format you need.

project_id = '5506fcd38bd88f5953219da0'
blueprint_id = '4321fcd38bd88f595321554223'
bp_chart = BlueprintChart.get(project_id, blueprint_id)
print(bp_chart.to_graphviz())

Get blueprint documentation

You can retrieve documentation for tasks used in a blueprint. The documentation contains information about the task, its parameters, and links and references to additional sources. All documents are instances of the BlueprintTaskDocument class.

project_id = '5506fcd38bd88f5953219da0'
blueprint_id = '4321fcd38bd88f595321554223'
bp = Blueprint.get(project_id, blueprint_id)
docs = bp.get_documents()
print(docs[0].task)
>>> Average Blend
print(docs[0].links[0]['url'])
>>> https://en.wikipedia.org/wiki/Ensemble_learning

Blueprint attributes

The Blueprint class holds the data required to use the blueprint for modeling. This includes the blueprint_id and project_id. There are also two attributes that help distinguish blueprints: model_type and processes.

print(blueprint.id)
>>> u'8956e1aeecffa0fa6db2b84640fb3848'
print(blueprint.project_id)
>>> u5506fcd38bd88f5953219da0'
print(blueprint.model_type)
>>> Logistic Regression
print(blueprint.processes)
>>> [u'One-Hot Encoding',
     u'Missing Values Imputed',
     u'Standardize',
     u'Logistic Regression']

Build a model from a blueprint

You can also use a blueprint to train a model. The model is trained on the associated project’s dataset by default. Note that Project.train is used for non-datetime partitioned projects. Project.train_datetime should be used for datetime partitioned projects.

model_job_id = project.train(blueprint)

# For datetime partitioned projects
model_job = project.train_datetime(blueprint.id)

Both Project.train and Project.train_datetime will put a new modeling job into the queue. However, note that Project.train returns the ID of the created ModelJob, while Project.train_datetime returns the ModelJob object itself. You can pass a ModelJob ID to wait_for_async_model_creation function, which polls the async model creation status and returns the newly created model when it’s finished.