Probability Calibration of Classifiers in Scikit Learn

Probability Calibration of Classifiers in Scikit Learn

Probability calibration refers to the process of adjusting the predicted probabilities of a classifier such that they better represent the true underlying probabilities. This is especially useful when the predicted probabilities are used for decision-making or when they are used as an input for subsequent models.

Some classifiers like Logistic Regression naturally output well-calibrated probabilities. But others, like SVMs or Random Forests, often do not.

Scikit-learn provides the CalibratedClassifierCV method which can be used to calibrate the predicted probabilities. Two methods are available for calibration:

  1. sigmoid: It is based on Platt's logistic model.
  2. isotonic: It is based on isotonic regression, which is a non-parametric method. It's often better for cases where the data is not well-modeled by a sigmoid function.

Basic Usage:

Here's a basic example using a Support Vector Machine, which often outputs probabilities that are not well-calibrated:

import numpy as np from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.svm import SVC from sklearn.calibration import CalibratedClassifierCV from sklearn.metrics import log_loss # Load dataset data = datasets.load_iris() X, y = data.data, data.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Train an SVM svm = SVC(probability=True) svm.fit(X_train, y_train) prob_svm = svm.predict_proba(X_test) # Calibrate SVM using isotonic regression clf_isotonic = CalibratedClassifierCV(svm, method='isotonic', cv='prefit') clf_isotonic.fit(X_train, y_train) prob_isotonic = clf_isotonic.predict_proba(X_test) # Calibrate SVM using sigmoid clf_sigmoid = CalibratedClassifierCV(svm, method='sigmoid', cv='prefit') clf_sigmoid.fit(X_train, y_train) prob_sigmoid = clf_sigmoid.predict_proba(X_test) # Evaluate log loss print("Log-loss of SVM:", log_loss(y_test, prob_svm)) print("Log-loss of SVM + Isotonic calibration:", log_loss(y_test, prob_isotonic)) print("Log-loss of SVM + Sigmoid calibration:", log_loss(y_test, prob_sigmoid)) 

In the example above:

  • First, an SVM is trained.
  • The SVM probabilities are calibrated using both isotonic regression and the sigmoid method.
  • The log loss (a common metric to evaluate the quality of predicted probabilities) is then computed for the original SVM probabilities, the isotonic-calibrated probabilities, and the sigmoid-calibrated probabilities.

In practice, you'll often find that calibrated probabilities yield better performance (e.g., lower log loss) when evaluated on true outcomes.


More Tags

httpserver virtual-memory orientation match-phrase websocket servlet-filters colorbar heidisql multi-index factory-bot

More Programming Guides

Other Guides

More Programming Examples