|
14 | 14 | except for Local Outlier Factor (LOF) as it has no predict method to be applied |
15 | 15 | on new data when it is used for outlier detection. |
16 | 16 |
|
17 | | -The :class:`svm.OneClassSVM` is known to be sensitive to outliers and thus does |
18 | | -not perform very well for outlier detection. This estimator is best suited for |
19 | | -novelty detection when the training set is not contaminated by outliers. |
20 | | -That said, outlier detection in high-dimension, or without any assumptions on |
21 | | -the distribution of the inlying data is very challenging, and a One-class SVM |
22 | | -might give useful results in these situations depending on the value of its |
23 | | -hyperparameters. |
24 | | -
|
25 | | -:class:`covariance.EllipticEnvelope` assumes the data is Gaussian and learns |
26 | | -an ellipse. It thus degrades when the data is not unimodal. Notice however |
27 | | -that this estimator is robust to outliers. |
28 | | -
|
29 | | -:class:`ensemble.IsolationForest` and :class:`neighbors.LocalOutlierFactor` |
30 | | -seem to perform reasonably well for multi-modal data sets. The advantage of |
31 | | -:class:`neighbors.LocalOutlierFactor` over the other estimators is shown for |
32 | | -the third data set, where the two modes have different densities. This |
33 | | -advantage is explained by the local aspect of LOF, meaning that it only |
| 17 | +The :class:`sklearn.svm.OneClassSVM` is known to be sensitive to outliers and |
| 18 | +thus does not perform very well for outlier detection. This estimator is best |
| 19 | +suited for novelty detection when the training set is not contaminated by |
| 20 | +outliers. That said, outlier detection in high-dimension, or without any |
| 21 | +assumptions on the distribution of the inlying data is very challenging, and a |
| 22 | +One-class SVM might give useful results in these situations depending on the |
| 23 | +value of its hyperparameters. |
| 24 | +
|
| 25 | +:class:`sklearn.covariance.EllipticEnvelope` assumes the data is Gaussian and |
| 26 | +learns an ellipse. It thus degrades when the data is not unimodal. Notice |
| 27 | +however that this estimator is robust to outliers. |
| 28 | +
|
| 29 | +:class:`sklearn.ensemble.IsolationForest` and |
| 30 | +:class:`sklearn.neighbors.LocalOutlierFactor` seem to perform reasonably well |
| 31 | +for multi-modal data sets. The advantage of |
| 32 | +:class:`sklearn.neighbors.LocalOutlierFactor` over the other estimators is |
| 33 | +shown for the third data set, where the two modes have different densities. |
| 34 | +This advantage is explained by the local aspect of LOF, meaning that it only |
34 | 35 | compares the score of abnormality of one sample with the scores of its |
35 | 36 | neighbors. |
36 | 37 |
|
37 | 38 | Finally, for the last data set, it is hard to say that one sample is more |
38 | 39 | abnormal than another sample as they are uniformly distributed in a |
39 | | -hypercube. Except for the :class:`svm.OneClassSVM` which overfits a little, all |
40 | | -estimators present decent solutions for this situation. In such a case, it |
41 | | -would be wise to look more closely at the scores of abnormality of the samples |
42 | | -as a good estimator should assign similar scores to all the samples. |
| 40 | +hypercube. Except for the :class:`sklearn.svm.OneClassSVM` which overfits a |
| 41 | +little, all estimators present decent solutions for this situation. In such a |
| 42 | +case, it would be wise to look more closely at the scores of abnormality of |
| 43 | +the samples as a good estimator should assign similar scores to all the |
| 44 | +samples. |
43 | 45 |
|
44 | 46 | While these examples give some intuition about the algorithms, this |
45 | 47 | intuition might not apply to very high dimensional data. |
|
0 commit comments