Skip to content

Commit 652b950

Browse files
committed
Merge pull request scikit-learn#4242 from MechCoder/select_from_model
[MRG+1] Implemented SelectFromModel meta-transformer
2 parents 84e9e10 + c805fbc commit 652b950

File tree

16 files changed

+516
-102
lines changed

16 files changed

+516
-102
lines changed

doc/modules/classes.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -463,6 +463,7 @@ From text
463463
feature_selection.SelectKBest
464464
feature_selection.SelectFpr
465465
feature_selection.SelectFdr
466+
feature_selection.SelectFromModel
466467
feature_selection.SelectFwe
467468
feature_selection.RFE
468469
feature_selection.RFECV

doc/modules/feature_selection.rst

Lines changed: 37 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -131,33 +131,52 @@ number of features.
131131
elimination example with automatic tuning of the number of features
132132
selected with cross-validation.
133133

134+
.. _select_from_model:
134135

135-
.. _l1_feature_selection:
136+
Feature selection using SelectFromModel
137+
=======================================
138+
139+
:class:`SelectFromModel` is a meta-transformer that can be used along with any
140+
estimator that has a ``coef_`` or ``feature_importances_`` attribute after fitting.
141+
The features are considered unimportant and removed, if the corresponding
142+
``coef_`` or ``feature_importances_`` values are below the provided
143+
``threshold`` parameter. Apart from specifying the threshold numerically,
144+
there are build-in heuristics for finding a threshold using a string argument.
145+
Available heuristics are "mean", "median" and float multiples of these like
146+
"0.1*mean".
147+
148+
For examples on how it is to be used refer to the sections below.
149+
150+
.. topic:: Examples
151+
152+
* :ref:`example_feature_selection_plot_select_from_model_boston.py`: Selecting the two
153+
most important features from the Boston dataset without knowing the
154+
threshold beforehand.
136155

137156
L1-based feature selection
138-
==========================
157+
--------------------------
139158

140159
.. currentmodule:: sklearn
141160

142-
Selecting non-zero coefficients
143-
---------------------------------
144-
145161
:ref:`Linear models <linear_model>` penalized with the L1 norm have
146162
sparse solutions: many of their estimated coefficients are zero. When the goal
147163
is to reduce the dimensionality of the data to use with another classifier,
148-
they expose a ``transform`` method to select the non-zero coefficient. In
149-
particular, sparse estimators useful for this purpose are the
150-
:class:`linear_model.Lasso` for regression, and
164+
they can be used along with :class:`feature_selection.SelectFromModel`
165+
to select the non-zero coefficients. In particular, sparse estimators useful for
166+
this purpose are the :class:`linear_model.Lasso` for regression, and
151167
of :class:`linear_model.LogisticRegression` and :class:`svm.LinearSVC`
152168
for classification::
153169

154170
>>> from sklearn.svm import LinearSVC
155171
>>> from sklearn.datasets import load_iris
172+
>>> from sklearn.feature_selection import SelectFromModel
156173
>>> iris = load_iris()
157174
>>> X, y = iris.data, iris.target
158175
>>> X.shape
159176
(150, 4)
160-
>>> X_new = LinearSVC(C=0.01, penalty="l1", dual=False).fit_transform(X, y)
177+
>>> lsvc = LinearSVC(C=0.01, penalty="l1", dual=False).fit(X, y)
178+
>>> model = SelectFromModel(lsvc, prefit=True)
179+
>>> X_new = model.transform(X)
161180
>>> X_new.shape
162181
(150, 3)
163182

@@ -241,23 +260,27 @@ of features non zero.
241260
http://hal.inria.fr/hal-00354771/
242261

243262
Tree-based feature selection
244-
============================
263+
----------------------------
245264

246265
Tree-based estimators (see the :mod:`sklearn.tree` module and forest
247266
of trees in the :mod:`sklearn.ensemble` module) can be used to compute
248267
feature importances, which in turn can be used to discard irrelevant
249-
features::
268+
features (when coupled with the :class:`sklearn.feature_selection.SelectFromModel`
269+
meta-transformer)::
250270

251271
>>> from sklearn.ensemble import ExtraTreesClassifier
252272
>>> from sklearn.datasets import load_iris
273+
>>> from sklearn.feature_selection import SelectFromModel
253274
>>> iris = load_iris()
254275
>>> X, y = iris.data, iris.target
255276
>>> X.shape
256277
(150, 4)
257278
>>> clf = ExtraTreesClassifier()
258-
>>> X_new = clf.fit(X, y).transform(X)
279+
>>> clf = clf.fit(X, y)
259280
>>> clf.feature_importances_ # doctest: +SKIP
260281
array([ 0.04..., 0.05..., 0.4..., 0.4...])
282+
>>> model = SelectFromModel(clf, prefit=True)
283+
>>> X_new = model.transform(X)
261284
>>> X_new.shape # doctest: +SKIP
262285
(150, 2)
263286

@@ -278,12 +301,13 @@ the actual learning. The recommended way to do this in scikit-learn is
278301
to use a :class:`sklearn.pipeline.Pipeline`::
279302

280303
clf = Pipeline([
281-
('feature_selection', LinearSVC(penalty="l1")),
304+
('feature_selection', SelectFromModel(LinearSVC(penalty="l1"))),
282305
('classification', RandomForestClassifier())
283306
])
284307
clf.fit(X, y)
285308

286309
In this snippet we make use of a :class:`sklearn.svm.LinearSVC`
310+
coupled with :class:`sklearn.feature_selection.SelectFromModel`
287311
to evaluate feature importances and select the most relevant features.
288312
Then, a :class:`sklearn.ensemble.RandomForestClassifier` is trained on the
289313
transformed output, i.e. using only relevant features. You can perform

doc/whats_new.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,11 @@ Enhancements
210210
- Altered :func:`metrics.roc_curve` to drop unnecessary thresholds by
211211
default. By `Graham Clenaghan`_.
212212

213+
- Added :class:`feature_selection.SelectFromModel` meta-transformer which can
214+
be used along with estimators that have `coef_` or `feature_importances_`
215+
attribute to select important features of the input data. By
216+
`Maheshakya Wijewardena`_, `Joel Nothman`_ and `Manoj Kumar`_.
217+
213218
Bug fixes
214219
.........
215220

@@ -283,6 +288,13 @@ API changes summary
283288
fit method to the constructor in
284289
:class:`discriminant_analysis.QuadraticDiscriminantAnalysis`.
285290

291+
- Models inheriting from ``_LearntSelectorMixin`` will no longer support the
292+
transform methods. (i.e, RandomForests, GradientBoosting, LogisticRegression,
293+
DecisionTrees, SVMs and SGD related models). Wrap these models around the
294+
metatransfomer :class:`feature_selection.SelectFromModel` to remove
295+
features (according to `coefs_` or `feature_importances_`)
296+
which are below a certain threshold value instead.
297+
286298
.. _changes_0_1_16:
287299

288300
Version 0.16.1

examples/ensemble/plot_feature_transformation.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
from sklearn.linear_model import LogisticRegression
3535
from sklearn.ensemble import (RandomTreesEmbedding, RandomForestClassifier,
3636
GradientBoostingClassifier)
37+
from sklearn.feature_selection import SelectFromModel
3738
from sklearn.preprocessing import OneHotEncoder
3839
from sklearn.cross_validation import train_test_split
3940
from sklearn.metrics import roc_curve
@@ -53,12 +54,12 @@
5354
rt = RandomTreesEmbedding(max_depth=3, n_estimators=n_estimator)
5455
rt_lm = LogisticRegression()
5556
rt.fit(X_train, y_train)
56-
rt_lm.fit(rt.transform(X_train_lr), y_train_lr)
57+
rt_lm.fit(SelectFromModel(rt, prefit=True).transform(X_train_lr), y_train_lr)
5758

58-
y_pred_rt = rt_lm.predict_proba(rt.transform(X_test))[:, 1]
59+
y_pred_rt = rt_lm.predict_proba(
60+
SelectFromModel(rt, prefit=True).transform(X_test))[:, 1]
5961
fpr_rt_lm, tpr_rt_lm, _ = roc_curve(y_test, y_pred_rt)
6062

61-
6263
# Supervised transformation based on random forests
6364
rf = RandomForestClassifier(max_depth=3, n_estimators=n_estimator)
6465
rf_enc = OneHotEncoder()

examples/ensemble/plot_random_forest_embedding.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,14 +30,17 @@
3030
from sklearn.datasets import make_circles
3131
from sklearn.ensemble import RandomTreesEmbedding, ExtraTreesClassifier
3232
from sklearn.decomposition import TruncatedSVD
33+
from sklearn.feature_selection import SelectFromModel
3334
from sklearn.naive_bayes import BernoulliNB
3435

3536
# make a synthetic dataset
3637
X, y = make_circles(factor=0.5, random_state=0, noise=0.05)
3738

3839
# use RandomTreesEmbedding to transform data
3940
hasher = RandomTreesEmbedding(n_estimators=10, random_state=0, max_depth=3)
40-
X_transformed = hasher.fit_transform(X)
41+
hasher.fit(X)
42+
model = SelectFromModel(hasher, prefit=True)
43+
X_transformed = model.transform(X)
4144

4245
# Visualize result using PCA
4346
pca = TruncatedSVD(n_components=2)
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
"""
2+
===================================================
3+
Feature selection using SelectFromModel and LassoCV
4+
===================================================
5+
6+
Use SelectFromModel meta-transformer along with Lasso to select the best
7+
couple of features from the Boston dataset.
8+
"""
9+
# Author: Manoj Kumar <mks542@nyu.edu>
10+
# License: BSD 3 clause
11+
12+
print(__doc__)
13+
14+
import matplotlib.pyplot as plt
15+
import numpy as np
16+
17+
from sklearn.datasets import load_boston
18+
from sklearn.feature_selection import SelectFromModel
19+
from sklearn.linear_model import LassoCV
20+
21+
# Load the boston dataset.
22+
boston = load_boston()
23+
X, y = boston['data'], boston['target']
24+
25+
# We use the base estimator LassoCV since the L1 norm promotes sparsity of features.
26+
clf = LassoCV()
27+
28+
# Set a minimum threshold of 0.25
29+
sfm = SelectFromModel(clf, threshold=0.25)
30+
sfm.fit(X, y)
31+
n_features = sfm.transform(X).shape[1]
32+
33+
# Reset the threshold till the number of features equals two.
34+
# Note that the attribute can be set directly instead of repeatedly
35+
# fitting the metatransformer.
36+
while n_features > 2:
37+
sfm.threshold += 0.1
38+
X_transform = sfm.transform(X)
39+
n_features = X_transform.shape[1]
40+
41+
# Plot the selected two features from X.
42+
plt.title(
43+
"Features selected from Boston using SelectFromModel with "
44+
"threshold %0.3f." % sfm.threshold)
45+
feature1 = X_transform[:, 0]
46+
feature2 = X_transform[:, 1]
47+
plt.plot(feature1, feature2, 'r.')
48+
plt.xlabel("Feature number 1")
49+
plt.ylabel("Feature number 2")
50+
plt.ylim([np.min(feature2), np.max(feature2)])
51+
plt.show()

sklearn/ensemble/tests/test_forest.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
from scipy.sparse import csc_matrix
2020
from scipy.sparse import coo_matrix
2121

22+
from sklearn.utils import warnings
2223
from sklearn.utils.testing import assert_almost_equal
2324
from sklearn.utils.testing import assert_array_almost_equal
2425
from sklearn.utils.testing import assert_array_equal
@@ -194,15 +195,19 @@ def test_probability():
194195
def check_importances(X, y, name, criterion):
195196
ForestEstimator = FOREST_ESTIMATORS[name]
196197

197-
est = ForestEstimator(n_estimators=20, criterion=criterion,random_state=0)
198+
est = ForestEstimator(n_estimators=20, criterion=criterion,
199+
random_state=0)
198200
est.fit(X, y)
199201
importances = est.feature_importances_
200202
n_important = np.sum(importances > 0.1)
201203
assert_equal(importances.shape[0], 10)
202204
assert_equal(n_important, 3)
203205

204-
X_new = est.transform(X, threshold="mean")
205-
assert_less(X_new.shape[1], X.shape[1])
206+
# XXX: Remove this test in 0.19 after transform support to estimators
207+
# is removed.
208+
X_new = assert_warns(
209+
DeprecationWarning, est.transform, X, threshold="mean")
210+
assert_less(0 < X_new.shape[1], X.shape[1])
206211

207212
# Check with parallel
208213
importances = est.feature_importances_

sklearn/ensemble/tests/test_gradient_boosting.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@
2626
from sklearn.utils.testing import assert_raises
2727
from sklearn.utils.testing import assert_true
2828
from sklearn.utils.testing import assert_warns
29+
from sklearn.utils.testing import ignore_warnings
2930
from sklearn.utils.validation import DataConversionWarning
3031
from sklearn.utils.validation import NotFittedError
3132

@@ -296,10 +297,13 @@ def test_feature_importances():
296297
clf.fit(X, y)
297298
assert_true(hasattr(clf, 'feature_importances_'))
298299

299-
X_new = clf.transform(X, threshold="mean")
300+
# XXX: Remove this test in 0.19 after transform support to estimators
301+
# is removed.
302+
X_new = assert_warns(
303+
DeprecationWarning, clf.transform, X, threshold="mean")
300304
assert_less(X_new.shape[1], X.shape[1])
301-
302-
feature_mask = clf.feature_importances_ > clf.feature_importances_.mean()
305+
feature_mask = (
306+
clf.feature_importances_ > clf.feature_importances_.mean())
303307
assert_array_almost_equal(X_new, X[:, feature_mask])
304308

305309

sklearn/feature_selection/__init__.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,8 @@
2020
from .rfe import RFE
2121
from .rfe import RFECV
2222

23+
from .from_model import SelectFromModel
24+
2325
__all__ = ['GenericUnivariateSelect',
2426
'RFE',
2527
'RFECV',
@@ -32,4 +34,5 @@
3234
'chi2',
3335
'f_classif',
3436
'f_oneway',
35-
'f_regression']
37+
'f_regression',
38+
'SelectFromModel']

sklearn/feature_selection/base.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ def transform(self, X):
8181
return np.empty(0).reshape((X.shape[0], 0))
8282
if len(mask) != X.shape[1]:
8383
raise ValueError("X has a different shape than during fitting.")
84-
return check_array(X, accept_sparse='csr')[:, safe_mask(X, mask)]
84+
return X[:, safe_mask(X, mask)]
8585

8686
def inverse_transform(self, X):
8787
"""

0 commit comments

Comments
 (0)