[WIP] Change hash computatin to check data between fit and sample #357

zorroblue · 2017-10-14T11:16:09Z

Reference Issue

Fixes #344

What does this implement/fix? Explain your changes.

Adds an additional parameter rng to hash_X_y, defaulted as a zero seeded random state

Any other comments?

glemaitre · 2017-10-15T10:57:37Z

The random seed should not be defaulted. Each sampler will pass the random seed which is required.
Then you need to fix the tests if there is a need.

zorroblue · 2017-10-15T13:24:45Z

There are a few instances where hash_X_y is called. Wouldn't those too need to be changed if we don't give a default value?

glemaitre · 2017-10-15T13:57:19Z

The idea is to pass self.random_state for all those occurrences since each estimator should have a random_state.

zorroblue · 2017-10-16T11:38:10Z

I almost fixed it. It fails for the case where the random_state is passed as None. In this case, calling check_random_state(random_state) gives a randomly initialized RandomState each time. I suspect it's because fit and sample both call hash_X_y in self.fit(X, y).sample(X, y) with random_state passed as None(hence the check_random_state gives different states to each). How do we solve this? I can think of storing the actual RandomState object in random_state in the __init__ itself. What do you think? @glemaitre

chkoar · 2017-10-16T11:51:32Z

@zorroblue Nope. You will use the check_random_state function (from sklearn.utils) to get the RandomState object and pass that around. IMHO I would probably leave the random state static in the hash_X_y function.

glemaitre · 2017-10-16T11:54:19Z

@chkoar We could make an deterministic approach instead. We could take N equally space samples and it will not required the random state. WDYT?

chkoar · 2017-10-16T11:57:05Z

@glemaitre nice. To recap, we use that in order to be sure that the sample method is fed with the same data as the fit method, right?

glemaitre · 2017-10-16T12:12:43Z

@glemaitre nice. To recap, we use that in order to be sure tha the sample method is fed with the same data as the fit method, right?

Exactly. Apart of computing a hash on the whole data, there is not a good solution.
I think that making it deterministic on couple of point would be enough.

chkoar · 2017-10-16T12:14:01Z

@glemaitre agree

glemaitre · 2017-10-24T06:14:19Z

@zorroblue are you still willing to make the changes?

zorroblue · 2017-10-24T15:58:28Z

@glemaitre Sure! I still didn't get what needs to be done. Could you tell me?

glemaitre · 2017-11-01T18:02:19Z

Ups, I forgot to answer apparently. The idea is to select a number of samples which are equally spaced similarly to https://github.com/scikit-learn/scikit-learn/pull/9041/files#diff-4cbaa7df0d8c0f765f3b07c72b70b60eR135

Once those samples are selected we can compute the hash and we don't need the random_state anymore.

chkoar · 2017-11-01T18:20:33Z

Could we change the title of the PR?

glemaitre · 2017-11-01T18:48:44Z

@chkoar Fell free to rename the title

zorroblue · 2017-11-01T19:10:28Z

Sure @glemaitre I'll have a look at this in the weekend :)

glemaitre · 2017-11-22T13:14:56Z

@zorroblue do you plan to make the changes any time soon?

zorroblue · 2017-11-23T07:45:38Z

Sorry @glemaitre for the delay, I am a little occupied till Dec 4. I can take up this issue after that only :(
If someone else is interested in taking this up, feel free to :)

glemaitre · 2017-11-30T15:09:34Z

Thanks for your effort. Sorry that we changed our mind regarding what to implement. Feel free to contribute any time.

zorroblue · 2017-12-01T10:13:36Z

No problem at all @glemaitre
I am looking forward to contributing more to the scikit-learn community :)

Add random state as parameter

7b87d79

zorroblue added 2 commits October 16, 2017 17:09

random_state being None to be handled

15d482e

SMall change

b7a818d

chkoar mentioned this pull request Nov 1, 2017

Remove random_state from the base class with deprecation cycle #365

Closed

glemaitre changed the title ~~Add random state as parameter~~ [WIP] Change hash computatin to check data between fit and sample Nov 1, 2017

glemaitre closed this Nov 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Change hash computatin to check data between fit and sample #357

[WIP] Change hash computatin to check data between fit and sample #357

Uh oh!

zorroblue commented Oct 14, 2017

glemaitre commented Oct 15, 2017

zorroblue commented Oct 15, 2017

glemaitre commented Oct 15, 2017

zorroblue commented Oct 16, 2017

chkoar commented Oct 16, 2017 •

edited

Loading

glemaitre commented Oct 16, 2017

chkoar commented Oct 16, 2017 •

edited by massich

Loading

glemaitre commented Oct 16, 2017

chkoar commented Oct 16, 2017

glemaitre commented Oct 24, 2017

zorroblue commented Oct 24, 2017 •

edited

Loading

glemaitre commented Nov 1, 2017

chkoar commented Nov 1, 2017

glemaitre commented Nov 1, 2017

zorroblue commented Nov 1, 2017

glemaitre commented Nov 22, 2017

zorroblue commented Nov 23, 2017

glemaitre commented Nov 30, 2017

zorroblue commented Dec 1, 2017

Labels

3 participants

[WIP] Change hash computatin to check data between fit and sample #357

[WIP] Change hash computatin to check data between fit and sample #357

Uh oh!

Conversation

zorroblue commented Oct 14, 2017

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

glemaitre commented Oct 15, 2017

zorroblue commented Oct 15, 2017

glemaitre commented Oct 15, 2017

zorroblue commented Oct 16, 2017

chkoar commented Oct 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

glemaitre commented Oct 16, 2017

chkoar commented Oct 16, 2017 • edited by massich Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

glemaitre commented Oct 16, 2017

chkoar commented Oct 16, 2017

glemaitre commented Oct 24, 2017

zorroblue commented Oct 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

glemaitre commented Nov 1, 2017

chkoar commented Nov 1, 2017

glemaitre commented Nov 1, 2017

zorroblue commented Nov 1, 2017

glemaitre commented Nov 22, 2017

zorroblue commented Nov 23, 2017

glemaitre commented Nov 30, 2017

zorroblue commented Dec 1, 2017

Labels

3 participants

chkoar commented Oct 16, 2017 •

edited

Loading

chkoar commented Oct 16, 2017 •

edited by massich

Loading

zorroblue commented Oct 24, 2017 •

edited

Loading