Blueprints
Blueprints are a set of computation paths that a dataset passes through before producing predictions from data. A blueprint can be trained on a dataset to generate a model.
To modify blueprints using Python, reference the Blueprint Workshop documentation.
The following code block summarizes the interactions available for blueprints.
# Get the set of blueprints recommended by datarobot
import datarobot as dr
my_projects = dr.Project.list()
project = my_projects[0]
menu = project.get_blueprints()
first_blueprint = menu[0]
project.train(first_blueprint)
List blueprints
When you upload a file to a project and set a target, you receive a set of recommended blueprints that are appropriate for the task at hand.
Use get_blueprints
to get the list of blueprints recommended for a project:
project = dr.Project.get('5506fcd38bd88f5953219da0')
menu = project.get_blueprints()
blueprint = menu[0]
Get a blueprint
If you already have a blueprint_id
from a model you can retrieve the blueprint directly.
project_id = '5506fcd38bd88f5953219da0'
project = dr.Project.get(project_id)
models = project.get_models()
model = models[0]
blueprint = Blueprint.get(project_id, model.blueprint_id)
Get a blueprint chart
You can retrieve charts for all blueprints that are either from a blueprint menu or are already used in a model. You can also get a blueprint’s representation in Graphviz DOT format to render it into the format you need.
project_id = '5506fcd38bd88f5953219da0'
blueprint_id = '4321fcd38bd88f595321554223'
bp_chart = BlueprintChart.get(project_id, blueprint_id)
print(bp_chart.to_graphviz())
Get blueprint documentation
You can retrieve documentation for tasks used in a blueprint. The documentation contains information about
the task, its parameters, and links and references to additional sources. All documents are instances of the BlueprintTaskDocument
class.
project_id = '5506fcd38bd88f5953219da0'
blueprint_id = '4321fcd38bd88f595321554223'
bp = Blueprint.get(project_id, blueprint_id)
docs = bp.get_documents()
print(docs[0].task)
>>> Average Blend
print(docs[0].links[0]['url'])
>>> https://en.wikipedia.org/wiki/Ensemble_learning
Blueprint attributes
The Blueprint
class holds the data required to use the blueprint
for modeling. This includes the blueprint_id
and project_id
.
There are also two attributes that help distinguish blueprints: model_type
and processes
.
print(blueprint.id)
>>> u'8956e1aeecffa0fa6db2b84640fb3848'
print(blueprint.project_id)
>>> u5506fcd38bd88f5953219da0'
print(blueprint.model_type)
>>> Logistic Regression
print(blueprint.processes)
>>> [u'One-Hot Encoding',
u'Missing Values Imputed',
u'Standardize',
u'Logistic Regression']
Build a model from a blueprint
You can also use a blueprint to train a model. The model is trained on the associated project’s dataset by default.
Note that Project.train
is used for non-datetime partitioned projects.
Project.train_datetime
should be used for datetime partitioned
projects.
model_job_id = project.train(blueprint)
# For datetime partitioned projects
model_job = project.train_datetime(blueprint.id)
Both Project.train
and Project.train_datetime
will put a new modeling job into the queue. However, note that Project.train
returns the ID of the created
ModelJob, while Project.train_datetime
returns the ModelJob
object itself.
You can pass a ModelJob ID to wait_for_async_model_creation function,
which polls the async model creation status and returns the newly created model when it’s finished.