[WIP] Instance hardness #56

dvro · 2016-05-29T16:55:35Z

Added Instance Hardness Threshold under sampling method.

unbalanced_dataset/under_sampling/instance_hardness_threshold.py
example/under-sampling/plot_instance_hardness_threshold.py

The higher the threshold, the more samples of the majority class are removed.

glemaitre · 2016-05-30T18:17:35Z

I saw some discrepancies with PEP8 standard. Could you make a check for that.

glemaitre · 2016-05-30T18:20:23Z

unbalanced_dataset/under_sampling/instance_hardness_threshold.py

+ threshold : float, optional (default=0.3)
+ Threshold to be used when excluding samples (0.01 to 0.99).
+
+ mode: str, optional (default='maj')


I would rename mode by kind_sel. We used this keyword in the other part of the API for almost similar. I think this is in the ENN

glemaitre · 2016-05-30T20:49:59Z

yep you can push your changes

glemaitre · 2016-05-30T20:51:11Z

example/under-sampling/plot_instance_hardness_threshold.py

+# Fit and transform x to visualise inside a 2D feature space
+X_vis = pca.fit_transform(X)
+
+# Apply the random under-sampling


You have some blanks lines. You should probably remove them

glemaitre · 2016-05-30T21:45:46Z

@dvro Do you think that we could replace the parameter threshold by the parameter ratio. In fact, ratio could determine the probability to apply in order to obtain the desirable ratio between minority and majority classes.

In this case I would also remove the possibility to remove some instances from the minority class.

fmfn · 2016-05-30T21:55:30Z

Do you think that we could replace the parameter threshold by the parameter ratio.

I came here to say that, can we make the API consistent? ratio is the choice so far, would it make sense to re-name threshold?

glemaitre · 2016-05-30T22:08:00Z

@fmfn I agree but I would even enforce the same behaviour for this parameter: number of minority samples over number of majority samples.

fmfn · 2016-05-30T22:26:11Z

@glemaitre Yes! Much better indeed.

dvro · 2016-05-31T00:24:35Z

I can rename it to ration, but imo ratio = #maj / #min (or the other way around); when I used threshold it means the probability threshold in which samples are removed. A user might think that, in setting ratio=0.5, the output would be 2*X samples of class A and X samples of class B ... when that is not the case. (I don't know if I am being clear about this).

What you guys think?

fmfn · 2016-05-31T00:51:28Z

I agree tha simply renaming it would cause confusion. I suppose what I mean was: is that a transformation that can be done to threshold though would allow us to translate it to a ratio?

glemaitre · 2016-05-31T08:28:35Z

For this case I would all the probability level such that the ratio is nearest of what we want. Roughly without trying it:

ratios = np.zeros(100, ) probs = np.linspace(0., 1., 100) for i, p in enumerate(probs): ratios[i] = self.stats_c_[self.min_c_] / np.count_nonzero(np.logical_or(probabilities >= p, y == self.min_c_)) ratios = np.abs(ratios - self.ratio) threshold = probs[ratios.argmin()]

…ss_threshold.py

dvro added 5 commits May 29, 2016 12:23

instance hardness threshold undersampling method

6ccc7b9

instance_hardness_threshold.py updated

2ac3b20

Instance Hardness Threshold example added

a727d93

instance hardness threshold reference updated

c8257d9

instance hardness docs updated

d39d640

glemaitre changed the title ~~Instance hardness~~ [WIP] Instance hardness May 30, 2016

glemaitre reviewed May 30, 2016
View reviewed changes

dvro added 3 commits May 31, 2016 21:39

formating under_sampling/instance_hardness_threshold.py

6aa1f30

removing blank lines from example/under-sampling/plot_instance_hardne…

2814d38

…ss_threshold.py

Instace hardness using ratio

d7b5c21

dvro added 3 commits June 17, 2016 09:02

Merge remote-tracking branch 'main/master' into instance_hardness

1dc5fef

instance hardness updated

ee51225

under-sampling instance hardness threshold pep8

542c76e

dvro closed this Jun 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Instance hardness #56

[WIP] Instance hardness #56

Uh oh!

dvro commented May 29, 2016

glemaitre commented May 30, 2016

glemaitre May 30, 2016

dvro May 30, 2016

glemaitre commented May 30, 2016

glemaitre May 30, 2016

glemaitre commented May 30, 2016

fmfn commented May 30, 2016

glemaitre commented May 30, 2016

fmfn commented May 30, 2016

dvro commented May 31, 2016

fmfn commented May 31, 2016

glemaitre commented May 31, 2016 •

edited

Loading

Labels

3 participants

[WIP] Instance hardness #56

[WIP] Instance hardness #56

Uh oh!

Conversation

dvro commented May 29, 2016

glemaitre commented May 30, 2016

glemaitre May 30, 2016

Choose a reason for hiding this comment

dvro May 30, 2016

Choose a reason for hiding this comment

glemaitre commented May 30, 2016

glemaitre May 30, 2016

Choose a reason for hiding this comment

glemaitre commented May 30, 2016

fmfn commented May 30, 2016

glemaitre commented May 30, 2016

fmfn commented May 30, 2016

dvro commented May 31, 2016

fmfn commented May 31, 2016

glemaitre commented May 31, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Labels

3 participants

glemaitre commented May 31, 2016 •

edited

Loading