Blueprints¶
The set of computation paths that a dataset passes through before producing predictions from data is called a blueprint. A blueprint can be trained on a dataset to generate a model.
To modify blueprints using python, please refer to the documentation for the Blueprint Workshop.
Quick Reference¶
The following code block summarizes the interactions available for blueprints.
# Get the set of blueprints recommended by datarobot
import datarobot as dr
my_projects = dr.Project.list()
project = my_projects[0]
menu = project.get_blueprints()
first_blueprint = menu[0]
project.train(first_blueprint)
List Blueprints¶
When a file is uploaded to a project and the target is set, DataRobot
recommends a set of blueprints that are appropriate for the task at hand.
You can use the get_blueprints
method to get the list of blueprints recommended for a project:
project = dr.Project.get('5506fcd38bd88f5953219da0')
menu = project.get_blueprints()
blueprint = menu[0]
Get a blueprint¶
If you already have a blueprint_id
from a model you can retrieve the blueprint directly.
project_id = '5506fcd38bd88f5953219da0'
project = dr.Project.get(project_id)
models = project.get_models()
model = models[0]
blueprint = Blueprint.get(project_id, model.blueprint_id)
Get a blueprint chart¶
For all blueprints - either from blueprint menu or already used in model - you can retrieve its chart. You can also get its representation in graphviz DOT format to render it into the format you need.
project_id = '5506fcd38bd88f5953219da0'
blueprint_id = '4321fcd38bd88f595321554223'
bp_chart = BlueprintChart.get(project_id, blueprint_id)
print(bp_chart.to_graphviz())
Get a blueprint’s documentation¶
You can retrieve documentation on tasks used in the blueprint. It will contain information about
task, its parameters and (when available) links and references to additional sources.
All documents are instances of BlueprintTaskDocument
class.
project_id = '5506fcd38bd88f5953219da0'
blueprint_id = '4321fcd38bd88f595321554223'
bp = Blueprint.get(project_id, blueprint_id)
docs = bp.get_documents()
print(docs[0].task)
>>> Average Blend
print(docs[0].links[0]['url'])
>>> https://en.wikipedia.org/wiki/Ensemble_learning
Blueprint Attributes¶
The Blueprint
class holds the data required to use the blueprint
for modeling. This includes the blueprint_id
and project_id
.
There are also two attributes that help distinguish blueprints: model_type
and processes
.
print(blueprint.id)
>>> u'8956e1aeecffa0fa6db2b84640fb3848'
print(blueprint.project_id)
>>> u5506fcd38bd88f5953219da0'
print(blueprint.model_type)
>>> Logistic Regression
print(blueprint.processes)
>>> [u'One-Hot Encoding',
u'Missing Values Imputed',
u'Standardize',
u'Logistic Regression']
Create a Model from a Blueprint¶
You can use a blueprint instance to train a model. The default dataset for the project is used.
Note that Project.train
is used for non-datetime-partitioned projects.
Project.train_datetime
should be used for datetime partitioned
projects.
model_job_id = project.train(blueprint)
# For datetime partitioned projects
model_job = project.train_datetime(blueprint.id)
Both Project.train
and Project.train_datetime
will put a new modeling job into the queue. However, note that Project.train
returns the id of the created
ModelJob, while Project.train_datetime
returns the ModelJob
object itself.
You can pass a ModelJob id to wait_for_async_model_creation function,
which polls the async model creation status and returns the newly created model when it’s finished.