[MRG] Crash when using SGDClassifier with early stopping in a parallel grid search #12122

ogrisel · 2018-09-20T14:35:28Z

This is a bug report with a reproduction case for a bug I found recently. I have not investigated on the cause or a fix yet.

ogrisel · 2018-09-20T14:38:34Z

Here is the test failure message for reference:

sklearn.externals.joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/volatile/ogrisel/code/scikit-learn/sklearn/externals/joblib/externals/loky/process_executor.py", line 420, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) File "/volatile/ogrisel/code/scikit-learn/sklearn/externals/joblib/_parallel_backends.py", line 563, in __call__ return self.func(*args, **kwargs) File "/volatile/ogrisel/code/scikit-learn/sklearn/externals/joblib/parallel.py", line 261, in __call__ for func, args, kwargs in self.items] File "/volatile/ogrisel/code/scikit-learn/sklearn/externals/joblib/parallel.py", line 261, in <listcomp> for func, args, kwargs in self.items] File "/volatile/ogrisel/code/scikit-learn/sklearn/model_selection/_validation.py", line 528, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "/volatile/ogrisel/code/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 714, in fit sample_weight=sample_weight) File "/volatile/ogrisel/code/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 572, in _fit classes, sample_weight, coef_init, intercept_init) File "/volatile/ogrisel/code/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 526, in _partial_fit max_iter=max_iter) File "/volatile/ogrisel/code/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 621, in _fit_multiclass for i in range(len(self.classes_))) File "/volatile/ogrisel/code/scikit-learn/sklearn/externals/joblib/parallel.py", line 996, in __call__ self.retrieve() File "/volatile/ogrisel/code/scikit-learn/sklearn/externals/joblib/parallel.py", line 899, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/volatile/ogrisel/code/scikit-learn/sklearn/externals/joblib/_parallel_backends.py", line 563, in __call__ return self.func(*args, **kwargs) File "/volatile/ogrisel/code/scikit-learn/sklearn/externals/joblib/parallel.py", line 261, in __call__ for func, args, kwargs in self.items] File "/volatile/ogrisel/code/scikit-learn/sklearn/externals/joblib/parallel.py", line 261, in <listcomp> for func, args, kwargs in self.items] File "/volatile/ogrisel/code/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 421, in fit_binary est.power_t, est.t_, intercept_decay) File "sklearn/linear_model/sgd_fast.pyx", line 427, in sklearn.linear_model.sgd_fast.plain_sgd _, _, n_iter_ = _plain_sgd(weights, File "sklearn/linear_model/sgd_fast.pyx", line 762, in sklearn.linear_model.sgd_fast._plain_sgd score = estimator._validation_score(weights, intercept) File "/volatile/ogrisel/code/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 310, in _validation_score score = self.score(self._X_val, self._y_val, self._sample_weight_val) AttributeError: 'SGDClassifier' object has no attribute '_X_val' """ The above exception was the direct cause of the following exception: def test_multi_core_gridsearch_and_early_stopping(): # This is a non-regression test for a bad interaction between # early stopping internal attribute and process-based multi-core parallism. param_grid = { 'alpha': np.logspace(-4, 4, 9), 'n_iter_no_change': [2, 5, 10, 20], } clf = SGDClassifier(tol=1e-3, max_iter=1000, early_stopping=True) search = RandomizedSearchCV(clf, param_grid, n_iter=10, cv=5, n_jobs=2, random_state=0) > search.fit(iris.data, iris.target) clf = SGDClassifier(alpha=0.0001, average=False, class_weight=None, early_stopping=True, epsilon=0.1, eta0=0.0, fit_i... power_t=0.5, random_state=None, shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0, warm_start=False) param_grid = {'alpha': array([1.e-04, 1.e-03, 1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02, 1.e+03, 1.e+04]), 'n_iter_no_change': [2, 5, 10, 20]} search = RandomizedSearchCV(cv=5, error_score='raise-deprecating', estimator=SGDClassifier(alpha=0.0001, average=Fals... pre_dispatch='2*n_jobs', random_state=0, refit=True, return_train_score='warn', scoring=None, verbose=0) sklearn/linear_model/tests/test_sgd.py:1487: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ sklearn/model_selection/_search.py:722: in fit self._run_search(evaluate_candidates) sklearn/model_selection/_search.py:1515: in _run_search random_state=self.random_state)) sklearn/model_selection/_search.py:711: in evaluate_candidates cv.split(X, y, groups))) sklearn/externals/joblib/parallel.py:996: in __call__ self.retrieve() sklearn/externals/joblib/parallel.py:899: in retrieve self._output.extend(job.get(timeout=self.timeout)) sklearn/externals/joblib/_parallel_backends.py:517: in wrap_future_result return future.result(timeout=timeout) /usr/local/lib/python3.6/concurrent/futures/_base.py:432: in result return self.__get_result() _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <Future at 0x7f0885a93828 state=finished raised AttributeError> def __get_result(self): if self._exception: > raise self._exception E AttributeError: 'SGDClassifier' object has no attribute '_X_val' self = <Future at 0x7f0885a93828 state=finished raised AttributeError> /usr/local/lib/python3.6/concurrent/futures/_base.py:384: AttributeError

Storing _X_val, _y_val and such on the main estimator is problematic: - the attributes are different for different binary subproblems - it's not thread safe when thread based multiclass parallelism is enabled - the classes_ attribute for the multiclass problem does not match the [-1, 1] classes of the binary subproblems. Instead let's create transient validation_score_cb callback on the fly without storing anything on the main estimator instance.

ogrisel · 2018-09-21T17:32:49Z

@TomDLT I pushed a potential fix for the early stopping bug I reported above.

ogrisel · 2018-09-21T17:41:49Z

I checked the plots of the early stopping example and they haven't changed much.

ogrisel · 2018-09-21T17:56:07Z

I think we should merge this fix in the 0.20 branch as well.

jnothman

Add a what's new to 0.20? Or is this changing a feature that is new in 0.20?

ogrisel · 2018-09-22T17:45:21Z

The early stopping feature was introduced in 0.20 and it's already documented in whats_new.rst.

TomDLT

LGTM

Passing a callback instead of the estimator is better indeed, nice !

#TODO: use stratified shufflesplit by default?

No strong feeling about this, but doing it now avoid breaking code in the future.

ogrisel · 2018-09-24T13:30:27Z

No strong feeling about this, but doing it now avoid breaking code in the future.

Let's me do it now then.

ogrisel · 2018-09-24T14:35:35Z

@TomDLT I did the change in my last commit, but that makes test_validation_set_not_used_for_training fail. However, I do not understand why. And actually, I don't understand how this test could pass with the train_test_split + np.sort(train_idx) combo before.

amueller · 2018-09-24T17:53:18Z

I think we should put this into 0.20.1 with the ColumnTransformer fixes

amueller · 2018-09-24T17:53:37Z

(aka I want to start building the wheels/binaries/etc)

TomDLT · 2018-09-25T13:07:12Z

In test_validation_set_not_used_for_training, we use shuffle=False, which means samples are seen in order by the SGD algorithm. Therefore, if we give the same training set (with idx_train), in the same order (with np.sort(idx_train)), with the same number of iterations, it should give the same coefficients.

The test was failing as the training set were different in the multiclass case, since StratifiedShuffleSplit was given the binarized y_i instead of y. I fixed it in e0ef556.

ogrisel · 2018-09-26T07:35:34Z

StratifiedShuffleSplit was given the binarized y_i instead of y. I fixed it in e0ef556.

Nice catch, thanks for your help @TomDLT . However if estimator.random_state is a RandomState instance this equivalence will break. I think we should precompute the validation_mask in fit_multiclass and pass it pre-initialized to fit_binary in this case.

I will give it a try.

…l grid search (#12122)

TST non regression test for early stopping bug

a9059b7

ogrisel added the Bug label Sep 20, 2018

ogrisel added this to the 0.20 milestone Sep 21, 2018

ogrisel changed the title ~~[WIP] Crash when using SGDClassifier with early stopping in a parallel grid search~~ [MRG] Crash when using SGDClassifier with early stopping in a parallel grid search Sep 21, 2018

jnothman approved these changes Sep 22, 2018

View reviewed changes

typo [ci skip]

e9678e3

TomDLT approved these changes Sep 24, 2018

View reviewed changes

eamanu approved these changes Sep 24, 2018

View reviewed changes

Use stratified shuffle split for the validation split

2d09c34

FIX change y_i into y for stratified shuffle split

e0ef556

ogrisel added 2 commits September 26, 2018 10:32

Precompute validation_mask in the OvR multiclass case

c02ee3d

typo in docstring format

5ad3dd2

ogrisel merged commit 7d72720 into scikit-learn:master Sep 26, 2018

ogrisel deleted the parallel-earlstopping-bug branch September 26, 2018 10:04

ogrisel added a commit that referenced this pull request Sep 26, 2018

[MRG] Crash when using SGDClassifier with early stopping in a paralle…

dbf3ed3

…l grid search (#12122)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] Crash when using SGDClassifier with early stopping in a parallel grid search #12122

[MRG] Crash when using SGDClassifier with early stopping in a parallel grid search #12122

Uh oh!

ogrisel commented Sep 20, 2018

ogrisel commented Sep 20, 2018 •

edited

Loading

ogrisel commented Sep 21, 2018

ogrisel commented Sep 21, 2018

ogrisel commented Sep 21, 2018

jnothman left a comment

ogrisel commented Sep 22, 2018

TomDLT left a comment •

edited

Loading

ogrisel commented Sep 24, 2018

ogrisel commented Sep 24, 2018 •

edited

Loading

amueller commented Sep 24, 2018

amueller commented Sep 24, 2018

TomDLT commented Sep 25, 2018 •

edited

Loading

ogrisel commented Sep 26, 2018

Labels

5 participants

Uh oh!

[MRG] Crash when using SGDClassifier with early stopping in a parallel grid search #12122

[MRG] Crash when using SGDClassifier with early stopping in a parallel grid search #12122

Uh oh!

Conversation

ogrisel commented Sep 20, 2018

ogrisel commented Sep 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ogrisel commented Sep 21, 2018

ogrisel commented Sep 21, 2018

ogrisel commented Sep 21, 2018

jnothman left a comment

Choose a reason for hiding this comment

ogrisel commented Sep 22, 2018

TomDLT left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

ogrisel commented Sep 24, 2018

ogrisel commented Sep 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

amueller commented Sep 24, 2018

amueller commented Sep 24, 2018

TomDLT commented Sep 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ogrisel commented Sep 26, 2018

Labels

5 participants

ogrisel commented Sep 20, 2018 •

edited

Loading

TomDLT left a comment •

edited

Loading

ogrisel commented Sep 24, 2018 •

edited

Loading

TomDLT commented Sep 25, 2018 •

edited

Loading