Skip to content

Conversation

laurallu
Copy link

@laurallu laurallu commented Nov 4, 2019

This is an implementation of the safe-level SMOTE proposed in the following paper:

C. Bunkhumpornpat, K. Sinapiromsaran, C. Lursinsap, "Safe-level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem," In: Theeramunkong T.,
Kijsirikul B., Cercone N., Ho TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science, vol 5476. Springer, Berlin, Heidelberg, 475-482, 2009.

Todo list:

  • add unit tests
@lgtm-com
Copy link

lgtm-com bot commented Nov 4, 2019

This pull request introduces 1 alert when merging bcc3069 into 321b751 - view on LGTM.com

new alerts:

  • 1 for Redundant comparison
@codecov
Copy link

codecov bot commented Nov 4, 2019

Codecov Report

Merging #626 into master will increase coverage by 0.05%.
The diff coverage is 100%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #626 +/- ## ========================================== + Coverage 97.93% 97.98% +0.05%  ========================================== Files 83 84 +1 Lines 4784 4911 +127 ========================================== + Hits 4685 4812 +127  Misses 99 99
Impacted Files Coverage Δ
imblearn/over_sampling/_smote.py 97.73% <100%> (+0.52%) ⬆️
imblearn/over_sampling/__init__.py 100% <100%> (ø) ⬆️
...blearn/over_sampling/tests/test_safelevel_smote.py 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update afbf781...866a04f. Read the comment docs.

@glemaitre
Copy link
Member

Thanks for the contribution.

You will need to add tests to check that the new function is giving expected results.

@glemaitre
Copy link
Member

Oh I see that you mentioned it now :)

@laurallu
Copy link
Author

laurallu commented Nov 6, 2019

I just added some tests. Any suggestions?

@lgtm-com
Copy link

lgtm-com bot commented Nov 6, 2019

This pull request introduces 1 alert when merging 394d686 into 321b751 - view on LGTM.com

new alerts:

  • 1 for Redundant comparison
sampling_strategy=BaseOverSampler._sampling_strategy_docstring,
random_state=_random_state_docstring,
)
class SLSMOTE(BaseSMOTE):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glemaitre SafeLevelSMOTE vs SLSMOTE


self.m_neighbors = m_neighbors

def _assign_sl(self, nn_estimator, samples, target_class, y):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use the name _assign_safe_levels unless it hurts the readability in the calling code.

@chkoar
Copy link
Member

chkoar commented Nov 6, 2019

Thanks!

I just added some tests. Any suggestions?

  1. Please use full names in variables, if you can. E.g. sl should be safe_lavels. Unless it hurts the readability.
  2. Could you add a section in the documentation?
@lgtm-com
Copy link

lgtm-com bot commented Nov 11, 2019

This pull request introduces 1 alert when merging fd11e32 into afbf781 - view on LGTM.com

new alerts:

  • 1 for Redundant comparison
@laurallu laurallu changed the title [WIP] ENH: safe-level SMOTE [MRG] ENH: safe-level SMOTE Nov 12, 2019
@laurallu
Copy link
Author

1. Please use full names in variables, if you can. E.g. [`sl`](https://github.com/scikit-learn-contrib/imbalanced-learn/blob/394d686364725763de8ea2cc3f504d8c08fe111a/imblearn/over_sampling/_smote.py#L1469) should be `safe_lavels`. Unless it hurts the readability. 2. Could you add a section in the [documentation](https://github.com/scikit-learn-contrib/imbalanced-learn/blob/master/doc/over_sampling.rst)? 

I've made changes accordingly. I think it's probably ready to go through a detailed review.

@glemaitre
Copy link
Member

I would suggest moving this implementation into smote_variants. The idea behind this move is to benchmark the smote variants on a common benchmark on a large number of datasets and include in imbalanced-learn only the versions that show an advantage. You can see the discussion and contribute to it: https://github.com/gykovacs/smote_variants/issues/14

@laurallu would this strategy would be fine with you?

@chkoar
Copy link
Member

chkoar commented Nov 17, 2019

Since, Safe Level SMOTE exists there IMHO I believe that we should review @laurallu PR and merge it in imblearn.

@glemaitre
Copy link
Member

OK, let's do that. Let's open an issue to discuss the inclusion criterion to explain what we are expecting in the future. I will review this PR in a near future.

@laurallu
Copy link
Author

Thanks for pointing out the smote_variants to me. I will check it out. I would love to see the inclusion criterion too since I might code up more methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants