Word Cloud

class datarobot.models.word_cloud.WordCloud(ngrams)

Word cloud data for the model.

Notes

WordCloudNgram is a dict containing the following:

  • ngram (str) Word or ngram value.

  • coefficient (float) Value from [-1.0, 1.0] range, describes effect of this ngram on the target. Large negative value means strong effect toward negative class in classification and smaller target value in regression models. Large positive - toward positive class and bigger value respectively.

  • count (int) Number of rows in the training sample where this ngram appears.

  • frequency (float) Value from (0.0, 1.0] range, relative frequency of given ngram to most frequent ngram.

  • is_stopword (bool) True for ngrams that DataRobot evaluates as stopwords.

  • class (str or None) For classification - values of the target class for corresponding word or ngram. For regression - None.

Attributes:
ngramslist of dicts

List of dicts with schema described as WordCloudNgram above.

most_frequent(top_n=5)

Return most frequent ngrams in the word cloud.

Parameters:
top_nint

Number of ngrams to return

Returns:
list of dict

Up to top_n top most frequent ngrams in the word cloud. If top_n bigger then total number of ngrams in word cloud - return all sorted by frequency in descending order.

Return type:

List[WordCloudNgram]

most_important(top_n=5)

Return most important ngrams in the word cloud.

Parameters:
top_nint

Number of ngrams to return

Returns:
list of dict

Up to top_n top most important ngrams in the word cloud. If top_n bigger then total number of ngrams in word cloud - return all sorted by absolute coefficient value in descending order.

Return type:

List[WordCloudNgram]

ngrams_per_class()

Split ngrams per target class values. Useful for multiclass models.

Returns:
dict

Dictionary in the format of (class label) -> (list of ngrams for that class)

Return type:

Dict[Optional[str], List[WordCloudNgram]]

class datarobot.models.word_cloud.WordCloudNgram(*args, **kwargs)