Clustering

class datarobot.models.ClusteringModel

ClusteringModel extends Model class. It provides provides properties and methods specific to clustering projects.

compute_insights(max_wait=600)

Compute and retrieve cluster insights for model. This method awaits completion of job computing cluster insights and returns results after it is finished. If computation takes longer than specified max_wait exception will be raised.

Parameters:
  • project_id (str) – Project to start creation in.

  • model_id (str) – Project’s model to start creation in.

  • max_wait (int) – Maximum number of seconds to wait before giving up

Return type:

List of ClusterInsight

Raises:
  • ClientError – Server rejected creation due to client error. Most likely cause is bad project_id or model_id.

  • AsyncFailureError – If any of the responses from the server are unexpected

  • AsyncProcessUnsuccessfulError – If the cluster insights computation has failed or was cancelled.

  • AsyncTimeoutError – If the cluster insights computation did not resolve in time

property insights: List[ClusterInsight]

Return actual list of cluster insights if already computed.

Return type:

List of ClusterInsight

property clusters: List[Cluster]

Return actual list of Clusters.

Return type:

List of Cluster

update_cluster_names(cluster_name_mappings)

Change many cluster names at once based on list of name mappings.

Parameters:

cluster_name_mappings (List of tuples) –

Cluster names mapping consisting of current cluster name and old cluster name. Example:

cluster_name_mappings = [
    ("current cluster name 1", "new cluster name 1"),
    ("current cluster name 2", "new cluster name 2")]

Return type:

List of Cluster

Raises:

datarobot.errors.ClientError – Server rejected update of cluster names. Possible reasons include: incorrect format of mapping, mapping introduces duplicates.

update_cluster_name(current_name, new_name)

Change cluster name from current_name to new_name.

Parameters:
  • current_name (str) – Current cluster name.

  • new_name (str) – New cluster name.

Return type:

List of Cluster

Raises:

datarobot.errors.ClientError – Server rejected update of cluster names.

class datarobot.models.cluster.Cluster

Representation of a single cluster.

Variables:
  • name (str) – Current cluster name

  • percent (float) – Percent of data contained in the cluster. This value is reported after cluster insights are computed for the model.

classmethod list(project_id, model_id)

Retrieve a list of clusters in the model.

Parameters:
  • project_id (str) – ID of the project that the model is part of.

  • model_id (str) – ID of the model.

Return type:

List of clusters

classmethod update_multiple_names(project_id, model_id, cluster_name_mappings)

Update many clusters at once based on list of name mappings.

Parameters:
  • project_id (str) – ID of the project that the model is part of.

  • model_id (str) – ID of the model.

  • cluster_name_mappings (List of tuples) –

    Cluster name mappings, consisting of current and previous names for each cluster. Example:

    cluster_name_mappings = [
        ("current cluster name 1", "new cluster name 1"),
        ("current cluster name 2", "new cluster name 2")]
    

Return type:

List of clusters

Raises:
classmethod update_name(project_id, model_id, current_name, new_name)

Change cluster name from current_name to new_name

Parameters:
  • project_id (str) – ID of the project that the model is part of.

  • model_id (str) – ID of the model.

  • current_name (str) – Current cluster name

  • new_name (str) – New cluster name

Return type:

List of Cluster

class datarobot.models.cluster_insight.ClusterInsight

Holds data on all insights related to feature as well as breakdown per cluster.

Parameters:
  • feature_name (str) – Name of a feature from the dataset.

  • feature_type (str) – Type of feature.

  • insights (List[ClusterInsight]) – List provides information regarding the importance of a specific feature in relation to each cluster. Results help understand how the model is grouping data and what each cluster represents.

  • feature_impact (float) – Impact of a feature ranging from 0 to 1.

classmethod compute(project_id, model_id, max_wait=600)

Starts creation of cluster insights for the model and if successful, returns computed ClusterInsights. This method allows calculation to continue for a specified time and if not complete, cancels the request.

Parameters:
  • project_id (str) – ID of the project to begin creation of cluster insights for.

  • model_id (str) – ID of the project model to begin creation of cluster insights for.

  • max_wait (int) – Maximum number of seconds to wait canceling the request.

Return type:

List[ClusterInsight]

Raises:
  • ClientError – Server rejected creation due to client error. Most likely cause is bad project_id or model_id.

  • AsyncFailureError – Indicates whether any of the responses from the server are unexpected.

  • AsyncProcessUnsuccessfulError – Indicates whether the cluster insights computation failed or was cancelled.

  • AsyncTimeoutError – Indicates whether the cluster insights computation did not resolve within the specified time limit (max_wait).