peterjc
diff --git a/‎doc/modules/linear_model.rst‎
Lines changed: 29 additions & 18 deletions b/‎doc/modules/linear_model.rst‎
Lines changed: 29 additions & 18 deletions
diff --git a/‎doc/tutorial/statistical_inference/supervised_learning.rst‎
Lines changed: 2 additions & 1 deletion b/‎doc/tutorial/statistical_inference/supervised_learning.rst‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎doc/whats_new/v0.21.rst‎
Lines changed: 12 additions & 0 deletions b/‎doc/whats_new/v0.21.rst‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎examples/linear_model/plot_logistic_l1_l2_sparsity.py‎
Lines changed: 35 additions & 24 deletions b/‎examples/linear_model/plot_logistic_l1_l2_sparsity.py‎
Lines changed: 35 additions & 24 deletions
@@ -338,7 +338,7 @@ the algorithm to fit the coefficients.
 
 .. _elastic_net:
 
-Elastic Net
+Elastic-Net
 ===========
 :class:`ElasticNet` is a linear regression model trained with L1 and L2 prior
 as regularizer. This combination allows for learning a sparse model where
@@ -390,7 +390,7 @@ the duality gap computation used for convergence control.
 
 .. _multi_task_elastic_net:
 
-Multi-task Elastic Net
+Multi-task Elastic-Net
 ======================
 
 The :class:`MultiTaskElasticNet` is an elastic-net model that estimates sparse
@@ -730,7 +730,7 @@ or the log-linear classifier. In this model, the probabilities describing the po
 
 The implementation of logistic regression in scikit-learn can be accessed from
 class :class:`LogisticRegression`. This implementation can fit binary, One-vs-
-Rest, or multinomial logistic regression with optional L2 or L1
+Rest, or multinomial logistic regression with optional L2, L1 or Elastic-Net
 regularization.
 
 As an optimization problem, binary class L2 penalized logistic regression
@@ -739,12 +739,22 @@ minimizes the following cost function:
 .. math:: \min_{w, c} \frac{1}{2}w^T w + C \sum_{i=1}^n \log(\exp(- y_i (X_i^T w + c)) + 1) .
 
 Similarly, L1 regularized logistic regression solves the following
-optimization problem
+optimization problem:
 
 .. math:: \min_{w, c} \|w\|_1 + C \sum_{i=1}^n \log(\exp(- y_i (X_i^T w + c)) + 1).
 
+Elastic-Net regularization is a combination of L1 and L2, and minimizes the
+following cost function:
+
+.. math:: \min_{w, c} \frac{1 - \rho}{2}w^T w + \rho \|w\|_1 + C \sum_{i=1}^n \log(\exp(- y_i (X_i^T w + c)) + 1),
+
+where :math:`\rho` controls the strengh of L1 regularization vs L2
+regularization (it corresponds to the `l1_ratio` parameter).
+
 Note that, in this notation, it's assumed that the observation :math:`y_i` takes values in the set
-:math:`{-1, 1}` at trial :math:`i`.
+:math:`{-1, 1}` at trial :math:`i`. We can also see that Elastic-Net is
+equivalent to L1 when :math:`\rho = 1` and equivalent to L2 when
+:math:`\rho=0`.
 
 The solvers implemented in the class :class:`LogisticRegression`
 are "liblinear", "newton-cg", "lbfgs", "sag" and "saga":
@@ -772,10 +782,12 @@ than other solvers for large datasets, when both the number of samples and the
 number of features are large.
 
 The "saga" solver [7]_ is a variant of "sag" that also supports the
-non-smooth `penalty="l1"` option. This is therefore the solver of choice
-for sparse multinomial logistic regression.
+non-smooth `penalty="l1"`. This is therefore the solver of choice for sparse
+multinomial logistic regression. It is also the only solver that supports
+`penalty="elasticnet"`.
 
-In a nutshell, the following table summarizes the penalties supported by each solver:
+In a nutshell, the following table summarizes the penalties supported by
+each solver:
 
 +------------------------------+-----------------+-------------+-----------------+-----------+------------+
 | | **Solvers** |
@@ -790,6 +802,8 @@ In a nutshell, the following table summarizes the penalties supported by each so
 +------------------------------+-----------------+-------------+-----------------+-----------+------------+
 | OVR + L1 penalty | yes | no | no | no | yes |
 +------------------------------+-----------------+-------------+-----------------+-----------+------------+
+| Elastic-Net | no | no | no | no | yes |
++------------------------------+-----------------+-------------+-----------------+-----------+------------+
 | **Behaviors** | |
 +------------------------------+-----------------+-------------+-----------------+-----------+------------+
 | Penalize the intercept (bad) | yes | no | no | no | no |
@@ -799,8 +813,8 @@ In a nutshell, the following table summarizes the penalties supported by each so
 | Robust to unscaled datasets | yes | yes | yes | no | no |
 +------------------------------+-----------------+-------------+-----------------+-----------+------------+
 
-The "saga" solver is often the best choice but requires scaling. The "liblinear" solver is
-used by default for historical reasons.
+The "saga" solver is often the best choice but requires scaling. The
+"liblinear" solver is used by default for historical reasons.
 
 For large dataset, you may also consider using :class:`SGDClassifier`
 with 'log' loss.
@@ -838,14 +852,11 @@ with 'log' loss.
  thus be used to perform feature selection, as detailed in
  :ref:`l1_feature_selection`.
 
-:class:`LogisticRegressionCV` implements Logistic Regression with
-builtin cross-validation to find out the optimal C parameter.
-"newton-cg", "sag", "saga" and "lbfgs" solvers are found to be faster
-for high-dimensional dense data, due to warm-starting. For the
-multiclass case, if `multi_class` option is set to "ovr", an optimal C
-is obtained for each class and if the `multi_class` option is set to
-"multinomial", an optimal C is obtained by minimizing the cross-entropy
-loss.
+:class:`LogisticRegressionCV` implements Logistic Regression with built-in
+cross-validation support, to find the optimal `C` and `l1_ratio` parameters
+according to the ``scoring`` attribute. The "newton-cg", "sag", "saga" and
+"lbfgs" solvers are found to be faster for high-dimensional dense data, due
+to warm-starting (see :term:`Glossary <warm_start>`).
 
 .. topic:: References:
 
 
@@ -183,6 +183,7 @@ Linear models: :math:`y = X\beta + \epsilon`
  [ 0.30349955 -237.63931533 510.53060544 327.73698041 -814.13170937
  492.81458798 102.84845219 184.60648906 743.51961675 76.09517222]
 
+
  >>> # The mean square error
  >>> np.mean((regr.predict(diabetes_X_test) - diabetes_y_test)**2)
  ... # doctest: +ELLIPSIS
@@ -378,7 +379,7 @@ function or **logistic** function:
  ... multi_class='multinomial')
  >>> log.fit(iris_X_train, iris_y_train) # doctest: +NORMALIZE_WHITESPACE
  LogisticRegression(C=100000.0, class_weight=None, dual=False,
- fit_intercept=True, intercept_scaling=1, max_iter=100,
+ fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=100,
  multi_class='multinomial', n_jobs=None, penalty='l2', random_state=None,
  solver='lbfgs', tol=0.0001, verbose=0, warm_start=False)
 
 
@@ -22,6 +22,9 @@ random sampling procedures.
 
 - Decision trees and derived ensembles when both `max_depth` and
  `max_leaf_nodes` are set. |Fix|
+- :class:`linear_model.LogisticRegression` and
+ :class:`linear_model.LogisticRegressionCV` with 'saga' solver. |Fix|
+
 
 Details are listed in the changelog below.
 
@@ -146,6 +149,15 @@ Support for Python 3.4 and below has been officially dropped.
  affects all ensemble methods using decision trees.
  :pr:`12344` by :user:`Adrin Jalali <adrinjalali>`.
 
+:mod:`sklearn.linear_model`
+...........................
+
+- |Feature| :class:`linear_model.LogisticRegression` and
+ :class:`linear_model.LogisticRegressionCV` now support Elastic-Net penalty,
+ with the 'saga' solver. :issue:`11646` by :user:`Nicolas Hug <NicolasHug>`.
+
+- |Fix| Fixed a bug in the 'saga' solver where the weights would not be
+ correctly updated in some cases. :issue:`11646` by `Tom Dupre la Tour`_.
 
 Multiple modules
 ................
 
@@ -4,10 +4,11 @@
 ==============================================
 
 Comparison of the sparsity (percentage of zero coefficients) of solutions when
-L1 and L2 penalty are used for different values of C. We can see that large
-values of C give more freedom to the model. Conversely, smaller values of C
-constrain the model more. In the L1 penalty case, this leads to sparser
-solutions.
+L1, L2 and Elastic-Net penalty are used for different values of C. We can see
+that large values of C give more freedom to the model. Conversely, smaller
+values of C constrain the model more. In the L1 penalty case, this leads to
+sparser solutions. As expected, the Elastic-Net penalty sparsity is between
+that of L1 and L2.
 
 We classify 8x8 images of digits into two classes: 0-4 against 5-9.
 The visualization shows coefficients of the models for varying C.
@@ -35,45 +36,55 @@
 # classify small against large digits
 y = (y > 4).astype(np.int)
 
+l1_ratio = 0.5 # L1 weight in the Elastic-Net regularization
+
+fig, axes = plt.subplots(3, 3)
 
 # Set regularization parameter
-for i, C in enumerate((1, 0.1, 0.01)):
+for i, (C, axes_row) in enumerate(zip((1, 0.1, 0.01), axes)):
  # turn down tolerance for short training time
  clf_l1_LR = LogisticRegression(C=C, penalty='l1', tol=0.01, solver='saga')
  clf_l2_LR = LogisticRegression(C=C, penalty='l2', tol=0.01, solver='saga')
+ clf_en_LR = LogisticRegression(C=C, penalty='elasticnet', solver='saga',
+ l1_ratio=l1_ratio, tol=0.01)
  clf_l1_LR.fit(X, y)
  clf_l2_LR.fit(X, y)
+ clf_en_LR.fit(X, y)
 
  coef_l1_LR = clf_l1_LR.coef_.ravel()
  coef_l2_LR = clf_l2_LR.coef_.ravel()
+ coef_en_LR = clf_en_LR.coef_.ravel()
 
  # coef_l1_LR contains zeros due to the
  # L1 sparsity inducing norm
 
  sparsity_l1_LR = np.mean(coef_l1_LR == 0) * 100
  sparsity_l2_LR = np.mean(coef_l2_LR == 0) * 100
+ sparsity_en_LR = np.mean(coef_en_LR == 0) * 100
 
  print("C=%.2f" % C)
- print("Sparsity with L1 penalty: %.2f%%" % sparsity_l1_LR)
- print("score with L1 penalty: %.4f" % clf_l1_LR.score(X, y))
- print("Sparsity with L2 penalty: %.2f%%" % sparsity_l2_LR)
- print("score with L2 penalty: %.4f" % clf_l2_LR.score(X, y))
+ print("{:<40} {:.2f}%".format("Sparsity with L1 penalty:", sparsity_l1_LR))
+ print("{:<40} {:.2f}%".format("Sparsity with Elastic-Net penalty:",
+ sparsity_en_LR))
+ print("{:<40} {:.2f}%".format("Sparsity with L2 penalty:", sparsity_l2_LR))
+ print("{:<40} {:.2f}".format("Score with L1 penalty:",
+ clf_l1_LR.score(X, y)))
+ print("{:<40} {:.2f}".format("Score with Elastic-Net penalty:",
+ clf_en_LR.score(X, y)))
+ print("{:<40} {:.2f}".format("Score with L2 penalty:",
+ clf_l2_LR.score(X, y)))
 
- l1_plot = plt.subplot(3, 2, 2 * i + 1)
- l2_plot = plt.subplot(3, 2, 2 * (i + 1))
  if i == 0:
- l1_plot.set_title("L1 penalty")
- l2_plot.set_title("L2 penalty")
-
- l1_plot.imshow(np.abs(coef_l1_LR.reshape(8, 8)), interpolation='nearest',
- cmap='binary', vmax=1, vmin=0)
- l2_plot.imshow(np.abs(coef_l2_LR.reshape(8, 8)), interpolation='nearest',
- cmap='binary', vmax=1, vmin=0)
- plt.text(-8, 3, "C = %.2f" % C)
-
- l1_plot.set_xticks(())
- l1_plot.set_yticks(())
- l2_plot.set_xticks(())
- l2_plot.set_yticks(())
+ axes_row[0].set_title("L1 penalty")
+ axes_row[1].set_title("Elastic-Net\nl1_ratio = %s" % l1_ratio)
+ axes_row[2].set_title("L2 penalty")
+
+ for ax, coefs in zip(axes_row, [coef_l1_LR, coef_en_LR, coef_l2_LR]):
+ ax.imshow(np.abs(coefs.reshape(8, 8)), interpolation='nearest',
+ cmap='binary', vmax=1, vmin=0)
+ ax.set_xticks(())
+ ax.set_yticks(())
+
+ axes_row[0].set_ylabel('C = %s' % C)
 
 plt.show()