- Notifications
You must be signed in to change notification settings - Fork 1.3k
[WIP] Change hash computatin to check data between fit and sample #357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The random seed should not be defaulted. Each sampler will pass the random seed which is required. |
There are a few instances where |
The idea is to pass |
I almost fixed it. It fails for the case where the random_state is passed as None. In this case, calling check_random_state(random_state) gives a randomly initialized RandomState each time. I suspect it's because |
@zorroblue Nope. You will use the |
@chkoar We could make an deterministic approach instead. We could take |
@glemaitre nice. To recap, we use that in order to be sure that the |
Exactly. Apart of computing a hash on the whole data, there is not a good solution. |
@glemaitre agree |
@zorroblue are you still willing to make the changes? |
@glemaitre Sure! I still didn't get what needs to be done. Could you tell me? |
Ups, I forgot to answer apparently. The idea is to select a number of samples which are equally spaced similarly to https://github.com/scikit-learn/scikit-learn/pull/9041/files#diff-4cbaa7df0d8c0f765f3b07c72b70b60eR135 Once those samples are selected we can compute the hash and we don't need the random_state anymore. |
Could we change the title of the PR? |
@chkoar Fell free to rename the title |
Sure @glemaitre I'll have a look at this in the weekend :) |
@zorroblue do you plan to make the changes any time soon? |
Sorry @glemaitre for the delay, I am a little occupied till Dec 4. I can take up this issue after that only :( |
Thanks for your effort. Sorry that we changed our mind regarding what to implement. Feel free to contribute any time. |
No problem at all @glemaitre |
Reference Issue
Fixes #344
What does this implement/fix? Explain your changes.
Adds an additional parameter
rng
tohash_X_y
, defaulted as a zero seeded random stateAny other comments?