Fix prediction data not honoring cluster_selection_epsilon #586

n9Mtq4 · 2023-03-22T19:31:12Z

This PR fixes an issue with prediction data not using cluster_selection_epsilon. This bug surfaces with wrong predictions from approximate_predict and incorrect exemplars_.

Code to reproduce the problem:

import hdbscan from sklearn.datasets import make_blobs blobs, _ = make_blobs(100, n_features=8, centers=10, random_state=42) # use a high epsilon to force fewer clusters. real world data this happens more easily clusterer = hdbscan.HDBSCAN(cluster_selection_epsilon=12.0, prediction_data=True) clusterer.fit(blobs) # 7 clusters from labels clusterer.labels_.max() + 1 # 10 clusters from exemplars len(clusterer.exemplars_) # [5, 4, 3, 0, 5, 5, 6, 0, 5, 1] clusterer.labels_[:10] # predicting assigns points to completely different clusters (and number of clusters!) # [6, 5, 4, 0, 6, 6, 9, 0, 6, 2] hdbscan.approximate_predict(clusterer, blobs[:10])

I tracked the issue down to prediction data selecting the clusters from the tree differently to how it's done in _hdbscan_tree.pyx. The fix is to return the selected clusters from get_clusters in _hdbscan_tree.pyx and use the same clusters for prediction.

With this PR:

# 7 clusters from labels clusterer.labels_.max() + 1 # 7 clusters from exemplars len(clusterer.exemplars_) # [5, 4, 3, 0, 5, 5, 6, 0, 5, 1] clusterer.labels_[:10] # predicting assigns points to correct clusters # [5, 4, 3, 0, 5, 5, 6, 0, 5, 1] hdbscan.approximate_predict(clusterer, blobs[:10])

This likely fixes #308

Fix prediction data not honoring cluster_selection_epsilon

809c35a

n9Mtq4 force-pushed the bug-cluster-selection-epsilon branch from 5a4944a to 809c35a Compare March 22, 2023 19:43

lucetka mentioned this pull request May 19, 2023

Wrong exemplars returned when using cluster_selection_epsilon (exemplars from eps=0 are returned) #593

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix prediction data not honoring cluster_selection_epsilon #586

Fix prediction data not honoring cluster_selection_epsilon #586

Uh oh!

n9Mtq4 commented Mar 22, 2023 •

edited

Loading

Labels

1 participant

Fix prediction data not honoring cluster_selection_epsilon #586

Are you sure you want to change the base?

Fix prediction data not honoring cluster_selection_epsilon #586

Uh oh!

Conversation

n9Mtq4 commented Mar 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Labels

1 participant

n9Mtq4 commented Mar 22, 2023 •

edited

Loading