View on GitHub

data-science

Notebooks and Python about data science

If you like this project please add your Star

Binary classification single feature

Classification using "raw" python or libraries (SciKit Learn, Tensorflow).

The classification is first on a single boundary defined by a continuous univariate function and added white noise

In [1]:
import math import numpy as np import matplotlib.pyplot as plt import matplotlib.colors as pltcolors from sklearn import metrics as metrics from sklearn.linear_model import LogisticRegression as SkLinReg import scipy as sy import seaborn as sns import pandas import tensorflow as tf 

Model

We want to measure or predict a value y to be above a threshold. E.g.: y is a temperature.

We know a feature x, y is related to x through a quadratic function we do not a priori know and some unknown

This unknown is modeled by a Gaussian noise

In [2]:
# Single feature, Gaussian noise nFeatures = 1 def generateBatch(N): # xMin = 0 xMax = 1 b = 0.2 std = 0.2 # Threshold from 0 to 1 threshold = 1 # x = np.random.uniform(xMin, xMax, N) # 4th degree relation between y and x yClean = 2*(x**4 + (x-0.3)**3 + b) labels = yClean + np.random.normal(0, std, N) > threshold return (x, yClean, labels) 

The values of X are uniformly distributed and independent

In [3]:
N = 2000 # x and y have 1 dim in R, label has 1 dim in B xTrain, yCleanTrain, labelTrain = generateBatch(N) colors = ['blue','red'] fig = plt.figure(figsize=(8,4)) plt.subplot(1,2,1) plt.scatter(xTrain, yCleanTrain, c=labelTrain, cmap=pltcolors.ListedColormap(colors), marker=',', alpha=0.01) plt.xlabel('x') plt.ylabel('y') plt.grid() plt.subplot(1,2,2) plt.scatter(xTrain, labelTrain, marker=',', alpha=0.01) plt.xlabel('x') plt.ylabel('label') plt.grid() 
In [4]:
count, bins, ignored = plt.hist(labelTrain*1.0, 10, density=True, alpha=0.5) p = np.mean(labelTrain) print('Bernouilli parameter of the distribution:', p) 
Bernouilli parameter of the distribution: 0.3005 

Note: The two values are not a priori equi probable. In theory, ressampling of the training values would be required to balance the a priori distribution.

In [5]:
xTest, yTest, labelTest = generateBatch(N) 

Helpers

In [6]:
def plotHeatMap(X, classes, title=None, fmt='.2g', ax=None, xlabel=None, ylabel=None): """ Fix heatmap plot from Seaborn with pyplot 3.1.0, 3.1.1  https://stackoverflow.com/questions/56942670/matplotlib-seaborn-first-and-last-row-cut-in-half-of-heatmap-plot  """ ax = sns.heatmap(X, xticklabels=classes, yticklabels=classes, annot=True, fmt=fmt, cmap=plt.cm.Blues, ax=ax) #notation: "annot" not "annote" bottom, top = ax.get_ylim() ax.set_ylim(bottom + 0.5, top - 0.5) if title: ax.set_title(title) if xlabel: ax.set_xlabel(xlabel) if ylabel: ax.set_ylabel(ylabel) def plotConfusionMatrix(yTrue, yEst, classes, title=None, fmt='.2g', ax=None): plotHeatMap(metrics.confusion_matrix(yTrue, yEst), classes, title, fmt, ax, xlabel='Estimations', \ ylabel='True values'); 

Logistic and log of Logistic functions

In [7]:
def logistic(X): return (1+(np.exp(-(X))))**-1 
In [8]:
xx = np.linspace(-10, 10) xlogistic = logistic(xx) plt.figure(figsize=(10,5)) plt.subplot(1, 2, 1) plt.plot(xx, xlogistic) plt.grid() plt.subplot(1, 2, 2) plt.plot(xx, np.log(xlogistic)) plt.grid() 

Logistic regression

\begin{align} y \in \left\{ 0, 1 \right\} \end{align}\begin{align} p(Y=1 \mid x) & = \frac{1}{1+e^{-f_\theta(x)}} \\ f_\theta(x) & = b + w x \\ \theta &= \{b, w\} \end{align}

We are looking for the value of w that maximize the likelyhood: \begin{align} \hat{\theta} & = \max_{\theta}{\prod_{i=0}^N{p(y_i \mid x_i, w)}} \\ & = \max_{\theta}{\sum_{i=0}^N{log \left(p(y_i \mid x_i, w)\right)} } \\ & = \max_{\theta}{\sum_{i=0}^N{log \left(\left(\frac{1}{1+e^{-f_\theta(x_i)}}\right)^{y_i}\left(1-\frac{1}{1+e^{-f_\theta(x_i)}}\right)^{1-y_i}\right)} } \\ & = \max_{\theta}{\sum_{i=0}^N{log \left(y_i * \left(\frac{1}{1+e^{-f_\theta(x_i)}}\right) + \left(1-y_i\right) * \left(1-\frac{1}{1+e^{-f_\theta(x_I)}}\right) \right)} } \\ \end{align}

Using the fact that $y_i$ is either 0 or 1. The last formulation is avoiding logarithm of zero as one of the two terms within the sum is null.

Since the number of classes is 2, the maximum log likelyhood is also called binary cross entropy.

Reference:

Fitting of $b$ and then $w$

Suboptimal fitting:

  • Taking some assumption on $w$ to fit $b$ as $\hat{b}$
  • and then fitting $w$ with the $\hat{b}$ estimate
In [9]:
b = np.linspace(-5, 5) w = 1 px = np.zeros(len(b)) for i in range(len(b)): fx = logistic(b[i] + w*xTrain) px[i] = 1/N * np.sum(np.log(labelTrain*fx + (1-labelTrain)*(1-fx))) plt.plot(b, px); plt.xlabel('$b$') plt.ylabel('l(b, X)') plt.grid() bHat = b[np.argmax(px)] print('Estimate b =', bHat) 
Estimate b = -1.3265306122448979 
In [10]:
w = np.linspace(-20, 20) px = np.zeros(len(w)) for i in range(len(w)): fx = logistic(bHat + w[i]*xTrain) px[i] = 1/N * np.sum(np.log(labelTrain*fx + (1-labelTrain)*(1-fx))) plt.plot(w, px); plt.xlabel('w') plt.ylabel('l(w, X)') plt.grid() wHat = w[np.argmax(px)] print('Estimate w =', wHat) 
Estimate w = 2.0408163265306136 
In [11]:
pXTest0 = logistic(bHat + wHat * xTest) labelEst0 = pXTest0 > 0.5 
In [12]:
plt.scatter(xTest, pXTest0, c=labelEst0, cmap=pltcolors.ListedColormap(colors), marker=',', alpha=0.01); plt.scatter(xTest, yTest/np.max(yTest), c = labelTest, cmap=pltcolors.ListedColormap(colors), marker='x', alpha=0.01); plt.xlabel('x') plt.legend(('Estimated probability', 'Normalized model')); 
In [13]:
plt.hist(labelEst0*1.0, 10, density=True) print('Bernouilli parameter =', np.mean(labelEst0)) 
Bernouilli parameter = 0.368 
In [14]:
accuracy0 = np.sum(labelTest == labelEst0)/N print('Accuracy =', accuracy0) 
Accuracy = 0.9315 

Precision

$p(y = 1 \mid \hat{y} = 1)$

In [15]:
print('Precision =', np.sum(labelTest[labelEst0 == 1])/np.sum(labelEst0)) 
Precision = 0.8383152173913043 

Recall

$p(\hat{y} = 1 \mid y = 1)$

In [16]:
print('Recall =', np.sum(labelTest[labelEst0 == 1])/np.sum(labelTest)) 
Recall = 0.9716535433070866 

Confusion matrix

In [17]:
plotConfusionMatrix(labelTest, labelEst0, np.array(['Blue', 'Red'])); 
In [18]:
print(metrics.classification_report(labelTest, labelEst0)) 
 precision recall f1-score support False 0.99 0.91 0.95 1365 True 0.84 0.97 0.90 635 accuracy 0.93 2000 macro avg 0.91 0.94 0.92 2000 weighted avg 0.94 0.93 0.93 2000 

SciKit Learn

References:

In [19]:
model1 = SkLinReg(solver='lbfgs') model1.fit(xTrain.reshape(-1,1), labelTrain) model1.coef_ 
Out[19]:
array([[13.08470501]])
In [20]:
labelEst1 = model1.predict(xTest.reshape(-1,1)) print('Accuracy =',model1.score(xTest.reshape(-1,1), labelTest)) 
Accuracy = 0.948 
In [21]:
plt.hist(labelEst1*1.0, 10, density=True) print('Bernouilli parameter =', np.mean(labelEst1)) 
Bernouilli parameter = 0.3225 

Confusion matrix (plot)

In [22]:
plotConfusionMatrix(labelTest, labelEst1, np.array(['Blue', 'Red'])) 

Classification report

In [23]:
print(metrics.classification_report(labelTest, labelEst1)) 
 precision recall f1-score support False 0.97 0.96 0.96 1365 True 0.91 0.93 0.92 635 accuracy 0.95 2000 macro avg 0.94 0.94 0.94 2000 weighted avg 0.95 0.95 0.95 2000 

ROC curve

In [24]:
logit_roc_auc = metrics.roc_auc_score(labelTest, labelEst1) fpr, tpr, thresholds = metrics.roc_curve(labelTest, model1.predict_proba(xTest.reshape(-1,1))[:,1]) plt.plot(fpr, tpr, label='Logistic Regression (area = %0.2f)' % logit_roc_auc) plt.plot([0, 1], [0, 1],'r--') plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.05]) plt.xlabel('False Positive Rate') plt.ylabel('True Positive Rate') plt.title('Receiver operating characteristic') plt.legend(loc="lower right"); 

Using TensorFlow 2.0

In TensorFlow 2.0 many possibilities are available to design a sequential layer. It could be based on high level API using Keras, down to function code close to the syntax of Tensorflow 1.0.

Following design is showing how to implement a custom layer within a Sequential pipeline of Keras, and how to implement a custom metric. This is the favoured method to implement custom code in TensorFlow 2.0.

In [25]:
# Labels as floats {0., 1.} labelTrainF = np.multiply(labelTrain, 1.0) labelTrainF.dtype, labelTrainF.shape 
Out[25]:
(dtype('float64'), (2000,))
In [26]:
# (Mini) Batch size nBatch = 100 # Number of batches per Epoch nBatchPerEpoch =20 # Number of epochs nEpochMax = 1000 
In [27]:
# Simple custom layer exposing the linear regression model class MyLogisticRegressionLayer(tf.keras.layers.Layer): def __init__(self, *args, **kwargs): super(MyLogisticRegressionLayer, self).__init__(*args, **kwargs) def build(self, input_shape): self.w = self.add_weight( shape=input_shape[0], dtype=self.dtype, initializer=tf.keras.initializers.ones(), #regularizer=tf.keras.regularizers.l2(0.02), trainable=True) self.b = self.add_weight( shape=1, dtype=self.dtype, initializer=tf.keras.initializers.ones(), #regularizer=tf.keras.regularizers.l2(0.02), trainable=True) @tf.function def call(self, x, training=None): return tf.math.sigmoid(tf.math.add(tf.math.multiply(x, self.w), self.b)) 
In [28]:
# Using TensorFlow 2.0 style of metrics to implement accuracy class MyBinaryAccuracy(tf.keras.metrics.Metric): def __init__(self, name='my_accuracy', **kwargs): super(MyBinaryAccuracy, self).__init__(name=name, **kwargs) self.accuracySum = self.add_weight(name='accuracySum', initializer='zeros') self.accuracyCount = self.add_weight(name='accuracyCount', initializer='zeros') def update_state(self, labels, yEst): labels = tf.cast(labels, tf.bool) labelEst = tf.greater(yEst, 0.5) values = tf.cast(tf.equal(labels, labelEst), self.dtype) self.accuracySum.assign_add(tf.reduce_sum(values)) self.accuracyCount.assign_add(values.get_shape()[0]) def result(self): return self.accuracySum / self.accuracyCount 
In [29]:
# Model 1, instantiate the custom layer model1 = tf.keras.Sequential([MyLogisticRegressionLayer(input_shape=[nFeatures], dtype="float64")]) # Stochastic Gradient Descent Optimizer optim1 = tf.keras.optimizers.SGD(0.01) # Perform a train step on a mini-batch # This function's code is rewritten by TensorFlow 2.0 and shall be compiled at every execution of the optimizer @tf.function def trainStep1(x, labels): with tf.GradientTape() as tape: predictions = model1(x, training=True) loss = -tf.reduce_sum(tf.math.log((labels * predictions) + ((1 - labels) * (1 - predictions)))) #loss = tf.keras.losses.categorical_crossentropy(labels, predictions) gradients = tape.gradient(loss, model1.trainable_variables) optim1.apply_gradients(zip(gradients, model1.trainable_variables)) return loss, predictions # Initialize values and loop on epochs and mini batch epoch = 0 cost_epoch = 1 histo = [] accuracy = MyBinaryAccuracy() for epoch in range(nEpochMax): cost_cumul = 0 accuracy.reset_states() for b in range(0, nBatchPerEpoch*nBatch, nBatch): cost, predictions = trainStep1(xTrain[b : b + nBatch], labelTrainF[b : b + nBatch]) cost_cumul += cost accuracy.update_state(labelTrainF[b : b + nBatch], predictions) cost_epoch = cost_cumul / nBatchPerEpoch W = model1.get_weights() histo.append((cost_epoch.numpy(), accuracy.result().numpy(), W[1][0], W[0])) print("Predicted model: {b:.3f} + {w:.3f} x, num epochs={c}".format(w=W[0], b=W[1][0], c=len(histo))) # Save history as a Panda Data Frame df = pandas.DataFrame(histo, columns = ('cost', 'accuracy', 'b', 'w0')) 
Predicted model: -17.421 + 25.190 x, num epochs=1000 

SGD shows that there is not a single optimal value for b+w (intercept + slope) but a straight line as shown on the graph below. This is explained by the single feature: the decision boundary does not need to be a straight line, a single intercept point would be enough.

In [30]:
plt.scatter(df['b'], df['w0'], marker='.', alpha=0.2); plt.xlabel('intercept') plt.ylabel('weight'); 
In [31]:
fig, ax = plt.subplots(1,2, figsize=(16, 4)) ax[0].plot(df['cost']) ax[0].grid() ax[1].plot(df['accuracy']) ax[1].grid() 

Where to go from here ?

More complex models with the 2 feature binary classification (Notebook) or the K Nearest Neighbors classifier (Notebook)

Compare with the single feature linear regression using simple algorithms (Notebook), or using Tensorflow (Notebook)