Skip to content

Conversation

@ogrisel
Copy link
Member

@ogrisel ogrisel commented Sep 20, 2018

This is a bug report with a reproduction case for a bug I found recently. I have not investigated on the cause or a fix yet.

@ogrisel ogrisel added the Bug label Sep 20, 2018
@ogrisel
Copy link
Member Author

ogrisel commented Sep 20, 2018

Here is the test failure message for reference:

sklearn.externals.joblib.externals.loky.process_executor._RemoteTraceback: """ Traceback (most recent call last): File "/volatile/ogrisel/code/scikit-learn/sklearn/externals/joblib/externals/loky/process_executor.py", line 420, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) File "/volatile/ogrisel/code/scikit-learn/sklearn/externals/joblib/_parallel_backends.py", line 563, in __call__ return self.func(*args, **kwargs) File "/volatile/ogrisel/code/scikit-learn/sklearn/externals/joblib/parallel.py", line 261, in __call__ for func, args, kwargs in self.items] File "/volatile/ogrisel/code/scikit-learn/sklearn/externals/joblib/parallel.py", line 261, in <listcomp> for func, args, kwargs in self.items] File "/volatile/ogrisel/code/scikit-learn/sklearn/model_selection/_validation.py", line 528, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "/volatile/ogrisel/code/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 714, in fit sample_weight=sample_weight) File "/volatile/ogrisel/code/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 572, in _fit classes, sample_weight, coef_init, intercept_init) File "/volatile/ogrisel/code/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 526, in _partial_fit max_iter=max_iter) File "/volatile/ogrisel/code/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 621, in _fit_multiclass for i in range(len(self.classes_))) File "/volatile/ogrisel/code/scikit-learn/sklearn/externals/joblib/parallel.py", line 996, in __call__ self.retrieve() File "/volatile/ogrisel/code/scikit-learn/sklearn/externals/joblib/parallel.py", line 899, in retrieve self._output.extend(job.get(timeout=self.timeout)) File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value File "/usr/local/lib/python3.6/multiprocessing/pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "/volatile/ogrisel/code/scikit-learn/sklearn/externals/joblib/_parallel_backends.py", line 563, in __call__ return self.func(*args, **kwargs) File "/volatile/ogrisel/code/scikit-learn/sklearn/externals/joblib/parallel.py", line 261, in __call__ for func, args, kwargs in self.items] File "/volatile/ogrisel/code/scikit-learn/sklearn/externals/joblib/parallel.py", line 261, in <listcomp> for func, args, kwargs in self.items] File "/volatile/ogrisel/code/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 421, in fit_binary est.power_t, est.t_, intercept_decay) File "sklearn/linear_model/sgd_fast.pyx", line 427, in sklearn.linear_model.sgd_fast.plain_sgd _, _, n_iter_ = _plain_sgd(weights, File "sklearn/linear_model/sgd_fast.pyx", line 762, in sklearn.linear_model.sgd_fast._plain_sgd score = estimator._validation_score(weights, intercept) File "/volatile/ogrisel/code/scikit-learn/sklearn/linear_model/stochastic_gradient.py", line 310, in _validation_score score = self.score(self._X_val, self._y_val, self._sample_weight_val) AttributeError: 'SGDClassifier' object has no attribute '_X_val' """ The above exception was the direct cause of the following exception: def test_multi_core_gridsearch_and_early_stopping(): # This is a non-regression test for a bad interaction between # early stopping internal attribute and process-based multi-core parallism. param_grid = { 'alpha': np.logspace(-4, 4, 9), 'n_iter_no_change': [2, 5, 10, 20], } clf = SGDClassifier(tol=1e-3, max_iter=1000, early_stopping=True) search = RandomizedSearchCV(clf, param_grid, n_iter=10, cv=5, n_jobs=2, random_state=0) > search.fit(iris.data, iris.target) clf = SGDClassifier(alpha=0.0001, average=False, class_weight=None, early_stopping=True, epsilon=0.1, eta0=0.0, fit_i... power_t=0.5, random_state=None, shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0, warm_start=False) param_grid = {'alpha': array([1.e-04, 1.e-03, 1.e-02, 1.e-01, 1.e+00, 1.e+01, 1.e+02, 1.e+03, 1.e+04]), 'n_iter_no_change': [2, 5, 10, 20]} search = RandomizedSearchCV(cv=5, error_score='raise-deprecating', estimator=SGDClassifier(alpha=0.0001, average=Fals... pre_dispatch='2*n_jobs', random_state=0, refit=True, return_train_score='warn', scoring=None, verbose=0) sklearn/linear_model/tests/test_sgd.py:1487: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ sklearn/model_selection/_search.py:722: in fit self._run_search(evaluate_candidates) sklearn/model_selection/_search.py:1515: in _run_search random_state=self.random_state)) sklearn/model_selection/_search.py:711: in evaluate_candidates cv.split(X, y, groups))) sklearn/externals/joblib/parallel.py:996: in __call__ self.retrieve() sklearn/externals/joblib/parallel.py:899: in retrieve self._output.extend(job.get(timeout=self.timeout)) sklearn/externals/joblib/_parallel_backends.py:517: in wrap_future_result return future.result(timeout=timeout) /usr/local/lib/python3.6/concurrent/futures/_base.py:432: in result return self.__get_result() _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <Future at 0x7f0885a93828 state=finished raised AttributeError> def __get_result(self): if self._exception: > raise self._exception E AttributeError: 'SGDClassifier' object has no attribute '_X_val' self = <Future at 0x7f0885a93828 state=finished raised AttributeError> /usr/local/lib/python3.6/concurrent/futures/_base.py:384: AttributeError
Storing _X_val, _y_val and such on the main estimator is problematic: - the attributes are different for different binary subproblems - it's not thread safe when thread based multiclass parallelism is enabled - the classes_ attribute for the multiclass problem does not match the [-1, 1] classes of the binary subproblems. Instead let's create transient validation_score_cb callback on the fly without storing anything on the main estimator instance.
@ogrisel
Copy link
Member Author

ogrisel commented Sep 21, 2018

@TomDLT I pushed a potential fix for the early stopping bug I reported above.

@ogrisel ogrisel added this to the 0.20 milestone Sep 21, 2018
@ogrisel
Copy link
Member Author

ogrisel commented Sep 21, 2018

I checked the plots of the early stopping example and they haven't changed much.

@ogrisel ogrisel changed the title [WIP] Crash when using SGDClassifier with early stopping in a parallel grid search [MRG] Crash when using SGDClassifier with early stopping in a parallel grid search Sep 21, 2018
@ogrisel
Copy link
Member Author

ogrisel commented Sep 21, 2018

I think we should merge this fix in the 0.20 branch as well.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a what's new to 0.20? Or is this changing a feature that is new in 0.20?

@ogrisel
Copy link
Member Author

ogrisel commented Sep 22, 2018

The early stopping feature was introduced in 0.20 and it's already documented in whats_new.rst.

Copy link
Member

@TomDLT TomDLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Passing a callback instead of the estimator is better indeed, nice !

#TODO: use stratified shufflesplit by default?

No strong feeling about this, but doing it now avoid breaking code in the future.

@ogrisel
Copy link
Member Author

ogrisel commented Sep 24, 2018

No strong feeling about this, but doing it now avoid breaking code in the future.

Let's me do it now then.

@ogrisel
Copy link
Member Author

ogrisel commented Sep 24, 2018

@TomDLT I did the change in my last commit, but that makes test_validation_set_not_used_for_training fail. However, I do not understand why. And actually, I don't understand how this test could pass with the train_test_split + np.sort(train_idx) combo before.

@amueller
Copy link
Member

I think we should put this into 0.20.1 with the ColumnTransformer fixes

@amueller
Copy link
Member

(aka I want to start building the wheels/binaries/etc)

@TomDLT
Copy link
Member

TomDLT commented Sep 25, 2018

In test_validation_set_not_used_for_training, we use shuffle=False, which means samples are seen in order by the SGD algorithm. Therefore, if we give the same training set (with idx_train), in the same order (with np.sort(idx_train)), with the same number of iterations, it should give the same coefficients.

The test was failing as the training set were different in the multiclass case, since StratifiedShuffleSplit was given the binarized y_i instead of y. I fixed it in e0ef556.

@ogrisel
Copy link
Member Author

ogrisel commented Sep 26, 2018

StratifiedShuffleSplit was given the binarized y_i instead of y. I fixed it in e0ef556.

Nice catch, thanks for your help @TomDLT . However if estimator.random_state is a RandomState instance this equivalence will break. I think we should precompute the validation_mask in fit_multiclass and pass it pre-initialized to fit_binary in this case.

I will give it a try.

@ogrisel ogrisel merged commit 7d72720 into scikit-learn:master Sep 26, 2018
@ogrisel ogrisel deleted the parallel-earlstopping-bug branch September 26, 2018 10:04
ogrisel added a commit that referenced this pull request Sep 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 participants