The set of computation paths that a dataset passes through before producing predictions from data is called a blueprint. A blueprint can be trained on a dataset to generate a model.
To modify blueprints using python, please refer to the documentation for the Blueprint Workshop.
The following code block summarizes the interactions available for blueprints.
# Get the set of blueprints recommended by datarobot import datarobot as dr my_projects = dr.Project.list() project = my_projects menu = project.get_blueprints() first_blueprint = menu project.train(first_blueprint)
When a file is uploaded to a project and the target is set, DataRobot
recommends a set of blueprints that are appropriate for the task at hand.
You can use the
get_blueprints method to get the list of blueprints recommended for a project:
project = dr.Project.get('5506fcd38bd88f5953219da0') menu = project.get_blueprints() blueprint = menu
Get a blueprint¶
If you already have a
blueprint_id from a model you can retrieve the blueprint directly.
project_id = '5506fcd38bd88f5953219da0' project = dr.Project.get(project_id) models = project.get_models() model = models blueprint = Blueprint.get(project_id, model.blueprint_id)
Get a blueprint chart¶
For all blueprints - either from blueprint menu or already used in model - you can retrieve its chart. You can also get its representation in graphviz DOT format to render it into format you need.
project_id = '5506fcd38bd88f5953219da0' blueprint_id = '4321fcd38bd88f595321554223' bp_chart = BlueprintChart.get(project_id, blueprint_id) print(bp_chart.to_graphviz())
Get a blueprint documentation¶
You can retrieve documentation on tasks used in blueprint. It will contain information about
task, its parameters and (when available) links and references to additional sources.
All documents are instances of
project_id = '5506fcd38bd88f5953219da0' blueprint_id = '4321fcd38bd88f595321554223' bp = Blueprint.get(project_id, blueprint_id) docs = bp.get_documents() print(docs.task) >>> Average Blend print(docs.links['url']) >>> https://en.wikipedia.org/wiki/Ensemble_learning
Blueprint class holds the data required to use the blueprint
for modeling. This includes the
There are also two attributes that help distinguish blueprints:
print(blueprint.id) >>> u'8956e1aeecffa0fa6db2b84640fb3848' print(blueprint.project_id) >>> u5506fcd38bd88f5953219da0' print(blueprint.model_type) >>> Logistic Regression print(blueprint.processes) >>> [u'One-Hot Encoding', u'Missing Values Imputed', u'Standardize', u'Logistic Regression']
Create a Model from a Blueprint¶
You can use a blueprint instance to train a model. The default dataset for the project is used.
Project.train is used for non-datetime-partitioned projects.
Project.train_datetime should be used for datetime partitioned
model_job_id = project.train(blueprint) # For datetime partitioned projects model_job = project.train_datetime(blueprint.id)
will put a new modeling job into the queue. However, note that
Project.train returns the id of the created
Project.train_datetime returns the
ModelJob object itself.
You can pass a ModelJob id to wait_for_async_model_creation function,
which polls the async model creation status and returns the newly created model when it’s finished.