- Notifications
You must be signed in to change notification settings - Fork 1.3k
[WIP] Instance hardness #56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| I saw some discrepancies with PEP8 standard. Could you make a check for that. |
| threshold : float, optional (default=0.3) | ||
| Threshold to be used when excluding samples (0.01 to 0.99). | ||
| | ||
| mode: str, optional (default='maj') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename mode by kind_sel. We used this keyword in the other part of the API for almost similar. I think this is in the ENN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
| yep you can push your changes |
| # Fit and transform x to visualise inside a 2D feature space | ||
| X_vis = pca.fit_transform(X) | ||
| | ||
| # Apply the random under-sampling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have some blanks lines. You should probably remove them
| @dvro Do you think that we could replace the parameter In this case I would also remove the possibility to remove some instances from the minority class. |
I came here to say that, can we make the API consistent? |
| @fmfn I agree but I would even enforce the same behaviour for this parameter: number of minority samples over number of majority samples. |
| @glemaitre Yes! Much better indeed. |
| I can rename it to ration, but imo ratio = #maj / #min (or the other way around); when I used threshold it means the probability threshold in which samples are removed. A user might think that, in setting ratio=0.5, the output would be 2*X samples of class A and X samples of class B ... when that is not the case. (I don't know if I am being clear about this). What you guys think? |
| I agree tha simply renaming it would cause confusion. I suppose what I mean was: is that a transformation that can be done to |
| For this case I would all the probability level such that the ratio is nearest of what we want. Roughly without trying it: |
Added Instance Hardness Threshold under sampling method.
The higher the threshold, the more samples of the majority class are removed.