Word Cloud

class datarobot.models.word_cloud.WordCloud(ngrams)

Word cloud data for the model.

Notes

WordCloudNgram is a dict containing the following:

ngram (str) Word or ngram value.

coefficient (float) Value from [-1.0, 1.0] range, describes effect of this ngram on the target. Large negative value means strong effect toward negative class in classification and smaller target value in regression models. Large positive - toward positive class and bigger value respectively.

count (int) Number of rows in the training sample where this ngram appears.

frequency (float) Value from (0.0, 1.0] range, relative frequency of given ngram to most frequent ngram.

is_stopword (bool) True for ngrams that DataRobot evaluates as stopwords.

class (str or None) For classification - values of the target class for corresponding word or ngram. For regression - None.

Attributes:

ngramslist of dicts: List of dicts with schema described as WordCloudNgram above.

most_frequent(top_n=5)

Return most frequent ngrams in the word cloud.

Parameters:

top_nint: Number of ngrams to return

Returns:

list of dict: Up to top_n top most frequent ngrams in the word cloud. If top_n bigger then total number of ngrams in word cloud - return all sorted by frequency in descending order.

Return type:

List[WordCloudNgram]

most_important(top_n=5)

Return most important ngrams in the word cloud.

Parameters:

top_nint: Number of ngrams to return

Returns:

list of dict: Up to top_n top most important ngrams in the word cloud. If top_n bigger then total number of ngrams in word cloud - return all sorted by absolute coefficient value in descending order.

Return type:

List[WordCloudNgram]

ngrams_per_class()

Split ngrams per target class values. Useful for multiclass models.

Returns:

dict: Dictionary in the format of (class label) -> (list of ngrams for that class)

Return type:

Dict[Optional[str], List[WordCloudNgram]]

class datarobot.models.word_cloud.WordCloudNgram(*args, **kwargs)