Skip to content
Prev Previous commit
Add documentation for SMOTEN
  • Loading branch information
ThomasKluiters committed May 7, 2019
commit c7e7036d48684cfcef11f91a4c37779acf9dbe82
9 changes: 9 additions & 0 deletions doc/over_sampling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,15 @@ Therefore, it can be seen that the samples generated in the first and last
columns are belonging to the same categories originally presented without any
other extra interpolation.

Furthermore, if the dataset solely consists of categorical features one may use the :class:`SMOTEN` class. This class generates samples in an identical fashion to :class:`SMOTENC` - however - only categorical features are permitted. Each feature is treated as a categorical feature and therefore it is not advised to use `SMOTEN` for datasets that contain both categorical and continious features::

>>> from imblearn.over_sampling import SMOTEN
>>> smote_n = SMOTEN(random_state=0)
>>> X[:, 1] = rng.randint(2, size=n_samples)
>>> X_resampled, y_resampled = smote_n.fit_resample(X, y)
>>> print(sorted(Counter(y_resampled).items()))
[(0, 30), (1, 30)]

.. topic:: References

.. [HWB2005] H. Han, W. Wen-Yuan, M. Bing-Huan, "Borderline-SMOTE: a new
Expand Down
3 changes: 3 additions & 0 deletions doc/whats_new/v0.5.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,9 @@ Enhancement
and issue template showing how to print system and dependency information
from the command line. :issue:`557` by :user:`Alexander L. Hayes <batflyer>`.

- Add :class:`SMOTEN`. Add ability to use SMOTE on pure categorical features.
by :user:`Thomas Kluiters <ThomasKluiters`.

Maintenance
...........

Expand Down