Skip to content

Conversation

@glemaitre
Copy link
Member

Address issue raised in https://github.com/scikit-learn-contrib/imbalanced-learn/pull/1023/files#r1259422379

We additionally detect if the pair of samples used to generate samples are from different classes. In this case, we multiplied by a random number between 0 and 0.5.

@glemaitre
Copy link
Member Author

@solegalli Do the fix looks okay to you?

I still find that the paper is ambiguous. Indeed, you could apply a random number for each feature of the X selected. However, in this case, you don't generate a sample in the segment defined by the two samples but in the "rectangle" (or hyperrectangle). So it comes back to the same ambiguity regarding SMOTE generation: on the segment or in the hyperrectangle.

Currently, we generate samples on the segments (the SMOTE paper is rather puzzling about this).

Any thoughts?

Co-authored-by: Soledad Galli <solegalli@protonmail.com>
Co-authored-by: Soledad Galli <solegalli@protonmail.com>
Synthetically generated samples.
"""
diffs = nn_data[nn_num[rows, cols]] - X[rows]
if y is not None: # only entering for BorderlineSMOTE-2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, clever implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thoughts, would it not be enough to just half the diffs if we are multiplying it by steps in 186/188?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The paper states to use a random number. If we take half, we always use 0.5.

@solegalli
Copy link
Contributor

Hey @glemaitre I agree the paper is vague for bordeline 2. The current code reflects what I also understand from the article. Thank you!

@glemaitre glemaitre merged commit ec27259 into scikit-learn-contrib:master Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants