Neural Network Backward propagation and parameters update

In this tutorial part, we'll write backward propagation and parameters update function using cache computed during the forward propagation

In our previous tutorial, we wrote forward propagation and cost functions. So next, we need to write a backpropagation function. For this, we'll use cache computed during the forward propagation.

Backpropagation is usually the hardest (most mathematical) part of deep learning. Here again, is the picture with six mathematical equations we'll use. We'll use six equations on the right of this image since we are building a vectorized implementation.

Code for our backward propagation function:

Arguments:

parameters - python dictionary containing our parameters;
cache - a dictionary containing "Z1", "A1", "Z2" and "A2";
X - input data of shape (input, number of examples);
Y - "true" labels vector of shape (1, number of examples).

Return:

grads - python dictionary containing our gradients with respect to different parameters.

def backward_propagation(parameters, cache, X, Y): # number of example m = X.shape[1] # Retrieve W1 and W2 from the "parameters" dictionary W1 = parameters["W1"] W2 = parameters["W2"] # Retrieve A1 and A2 from "cache" dictionary A1 = cache["A1"] A2 = cache["A2"] # Backward propagation for dW1, db1, dW2, db2 dZ2 = A2-Y dW2 = 1./m*np.dot(dZ2, A1.T) db2 = 1./m*np.sum(dZ2, axis = 1, keepdims=True) dZ1 = np.dot(W2.T, dZ2) * (1 - np.power(A1, 2)) dW1 = 1./m*np.dot(dZ1, X.T) db1 = 1./m*np.sum(dZ1, axis = 1, keepdims=True) grads = {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2} return grads

Code for our update parameters function:

We'll implement the update rule using gradient descent. So we'll have to use (dW1, db1, dW2, db2) to update (W1, b1, W2, b2).

General Gradient descent rule is:θ=θ-αJθwhere: α - learning rate, θ represents a parameter

Arguments:

parameters - python dictionary containing our parameters;
grads - python dictionary containing your gradients;
learning_rate - learning rate of the gradient descent update rule;

Return:

parameters -- python dictionary containing your updated parameters

def update_parameters(parameters, grads, learning_rate = 0.1): # Retrieve each parameter from the dictionary "parameters" W1 = parameters["W1"] W2 = parameters["W2"] b1 = parameters["b1"] b2 = parameters["b2"] # Retrieve each gradient from the "grads" dictionary dW1 = grads["dW1"] db1 = grads["db1"] dW2 = grads["dW2"] db2 = grads["db2"] # Update rule for each parameter W1 = W1 - dW1 * learning_rate b1 = b1 - db1 * learning_rate W2 = W2 - dW2 * learning_rate b2 = b2 - db2 * learning_rate parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2} return parameters

Full tutorial code:

import os import cv2 import numpy as np import matplotlib.pyplot as plt import scipy ROWS = 64 COLS = 64 CHANNELS = 3 #TRAIN_DIR = 'Train_data/' #TEST_DIR = 'Test_data/' #train_images = [TRAIN_DIR+i for i in os.listdir(TRAIN_DIR)] #test_images = [TEST_DIR+i for i in os.listdir(TEST_DIR)] def read_image(file_path): img = cv2.imread(file_path, cv2.IMREAD_COLOR) return cv2.resize(img, (ROWS, COLS), interpolation=cv2.INTER_CUBIC) def prepare_data(images): m = len(images) X = np.zeros((m, ROWS, COLS, CHANNELS), dtype=np.uint8) y = np.zeros((1, m)) for i, image_file in enumerate(images): X[i,:] = read_image(image_file) if 'dog' in image_file.lower(): y[0, i] = 1 elif 'cat' in image_file.lower(): y[0, i] = 0 return X, y def sigmoid(z): s = 1/(1+np.exp(-z)) return s ''' train_set_x, train_set_y = prepare_data(train_images) test_set_x, test_set_y = prepare_data(test_images) train_set_x_flatten = train_set_x.reshape(train_set_x.shape[0], ROWS*COLS*CHANNELS).T test_set_x_flatten = test_set_x.reshape(test_set_x.shape[0], -1).T train_set_x = train_set_x_flatten/255 test_set_x = test_set_x_flatten/255 ''' #train_set_x_flatten shape: (12288, 6002) #train_set_y shape: (1, 6002) def initialize_parameters(input_layer, hidden_layer, output_layer): # initialize 1st layer output and input with random values W1 = np.random.randn(hidden_layer, input_layer) * 0.01 # initialize 1st layer output bias b1 = np.zeros((hidden_layer, 1)) # initialize 2nd layer output and input with random values W2 = np.random.randn(output_layer, hidden_layer) * 0.01 # initialize 2nd layer output bias b2 = np.zeros((output_layer,1)) parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2} return parameters def forward_propagation(X, parameters): # Retrieve each parameter from the dictionary "parameters" W1 = parameters["W1"] b1 = parameters["b1"] W2 = parameters["W2"] b2 = parameters["b2"] # Implementing Forward Propagation to calculate A2 probabilities Z1 = np.dot(W1, X) + b1 A1 = np.tanh(Z1) Z2 = np.dot(W2, A1) + b2 A2 = sigmoid(Z2) # Values needed in the backpropagation are stored in "cache" cache = {"Z1": Z1, "A1": A1, "Z2": Z2, "A2": A2} return A2, cache def compute_cost(A2, Y, parameters): # number of example m = Y.shape[1] # Compute the cross-entropy cost logprobs = np.multiply(np.log(A2),Y) + np.multiply(np.log(1-A2), (1-Y)) cost = -1/m*np.sum(logprobs) # makes sure cost is in dimension we expect, E.g., turns [[51]] into 51 cost = np.squeeze(cost) return cost def backward_propagation(parameters, cache, X, Y): # number of example m = X.shape[1] # Retrieve W1 and W2 from the "parameters" dictionary W1 = parameters["W1"] W2 = parameters["W2"] # Retrieve A1 and A2 from "cache" dictionary A1 = cache["A1"] A2 = cache["A2"] # Backward propagation for dW1, db1, dW2, db2 dZ2 = A2-Y dW2 = 1./m*np.dot(dZ2, A1.T) db2 = 1./m*np.sum(dZ2, axis = 1, keepdims=True) dZ1 = np.dot(W2.T, dZ2) * (1 - np.power(A1, 2)) dW1 = 1./m*np.dot(dZ1, X.T) db1 = 1./m*np.sum(dZ1, axis = 1, keepdims=True) grads = {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2} return grads def update_parameters(parameters, grads, learning_rate = 0.1): # Retrieve each parameter from "parameters" dictionary W1 = parameters["W1"] W2 = parameters["W2"] b1 = parameters["b1"] b2 = parameters["b2"] # Retrieve each gradient from the "grads" dictionary dW1 = grads["dW1"] db1 = grads["db1"] dW2 = grads["dW2"] db2 = grads["db2"] # Update rule for each parameter W1 = W1 - dW1 * learning_rate b1 = b1 - db1 * learning_rate W2 = W2 - dW2 * learning_rate b2 = b2 - db2 * learning_rate parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2} return parameters

Conclusion:

So up to this point, we already wrote parameter initialization, forward propagation, backward propagation, cost, and parameters update functions. So in the next tutorial, we'll connect all of them into a model, and we'll start training neural networks with one hidden layer.