|
1 | 1 | { |
2 | 2 | "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "# Runinng a backward pass through LeNet using MNIST and Joey" |
| 8 | + ] |
| 9 | + }, |
| 10 | + { |
| 11 | + "cell_type": "markdown", |
| 12 | + "metadata": {}, |
| 13 | + "source": [ |
| 14 | + "In this notebook, we will construct LeNet using Joey and run a backward pass through it with some training data from MNIST.\n", |
| 15 | + "\n", |
| 16 | + "The aim of a backward pass is calculating gradients of all network parameters necessary for later weight updates done by a PyTorch optimizer. A backward pass follows a forward pass." |
| 17 | + ] |
| 18 | + }, |
| 19 | + { |
| 20 | + "cell_type": "markdown", |
| 21 | + "metadata": {}, |
| 22 | + "source": [ |
| 23 | + "Firstly, let's import the required prerequisites:" |
| 24 | + ] |
| 25 | + }, |
3 | 26 | { |
4 | 27 | "cell_type": "code", |
5 | 28 | "execution_count": 1, |
|
11 | 34 | "import torchvision.transforms as transforms\n", |
12 | 35 | "import joey as ml\n", |
13 | 36 | "import matplotlib.pyplot as plt\n", |
14 | | - "import numpy as np" |
| 37 | + "import numpy as np\n", |
| 38 | + "import torch.nn as nn\n", |
| 39 | + "import torch.nn.functional as F\n", |
| 40 | + "import torch.optim as optim" |
| 41 | + ] |
| 42 | + }, |
| 43 | + { |
| 44 | + "cell_type": "markdown", |
| 45 | + "metadata": {}, |
| 46 | + "source": [ |
| 47 | + "Then, let's define `imshow()` allowing us to look at the training data we'll use for the backward pass." |
15 | 48 | ] |
16 | 49 | }, |
17 | 50 | { |
|
27 | 60 | " plt.show()" |
28 | 61 | ] |
29 | 62 | }, |
| 63 | + { |
| 64 | + "cell_type": "markdown", |
| 65 | + "metadata": {}, |
| 66 | + "source": [ |
| 67 | + "In this particular example, every training batch will have 4 images." |
| 68 | + ] |
| 69 | + }, |
30 | 70 | { |
31 | 71 | "cell_type": "code", |
32 | 72 | "execution_count": 3, |
|
36 | 76 | "batch_size = 4" |
37 | 77 | ] |
38 | 78 | }, |
| 79 | + { |
| 80 | + "cell_type": "markdown", |
| 81 | + "metadata": {}, |
| 82 | + "source": [ |
| 83 | + "Once we have `imshow()` and `batch_size` defined, we'll download the MNIST images using PyTorch." |
| 84 | + ] |
| 85 | + }, |
39 | 86 | { |
40 | 87 | "cell_type": "code", |
41 | 88 | "execution_count": 4, |
|
53 | 100 | "dataiter = iter(trainloader)" |
54 | 101 | ] |
55 | 102 | }, |
| 103 | + { |
| 104 | + "cell_type": "markdown", |
| 105 | + "metadata": {}, |
| 106 | + "source": [ |
| 107 | + "In our case, only one batch will be used for the backward pass. Joey accepts only NumPy arrays, so we have to convert PyTorch tensors to their NumPy equivalents first." |
| 108 | + ] |
| 109 | + }, |
56 | 110 | { |
57 | 111 | "cell_type": "code", |
58 | 112 | "execution_count": 5, |
|
63 | 117 | "input_data = images.numpy()" |
64 | 118 | ] |
65 | 119 | }, |
| 120 | + { |
| 121 | + "cell_type": "markdown", |
| 122 | + "metadata": {}, |
| 123 | + "source": [ |
| 124 | + "For reference, let's have a look at our training data. There are 4 images corresponding to the following digits: 5, 0, 4, 1." |
| 125 | + ] |
| 126 | + }, |
66 | 127 | { |
67 | 128 | "cell_type": "code", |
68 | 129 | "execution_count": 6, |
|
85 | 146 | "imshow(torchvision.utils.make_grid(images))" |
86 | 147 | ] |
87 | 148 | }, |
| 149 | + { |
| 150 | + "cell_type": "markdown", |
| 151 | + "metadata": {}, |
| 152 | + "source": [ |
| 153 | + "At this point, we're ready to define `backward_pass()` running the backward pass through Joey-constructed LeNet. We'll do so using the `Conv`, `MaxPooling`, `Flat`, `FullyConnected` and `FullyConnectedSoftmax` layer classes along with the `Net` class packing everything into one network we can interact with." |
| 154 | + ] |
| 155 | + }, |
| 156 | + { |
| 157 | + "cell_type": "markdown", |
| 158 | + "metadata": {}, |
| 159 | + "source": [ |
| 160 | + "Note that a loss function has to be defined manually. Joey doesn't provide any built-in options here at the moment." |
| 161 | + ] |
| 162 | + }, |
88 | 163 | { |
89 | 164 | "cell_type": "code", |
90 | 165 | "execution_count": 7, |
|
95 | 170 | " # Six 3x3 filters, activation RELU\n", |
96 | 171 | " layer1 = ml.Conv(kernel_size=(6, 3, 3),\n", |
97 | 172 | " input_size=(batch_size, 1, 32, 32),\n", |
98 | | - " activation=ml.activation.ReLU(),\n", |
99 | | - " generate_code=False)\n", |
| 173 | + " activation=ml.activation.ReLU())\n", |
100 | 174 | " # Max 2x2 subsampling\n", |
101 | 175 | " layer2 = ml.MaxPooling(kernel_size=(2, 2),\n", |
102 | 176 | " input_size=(batch_size, 6, 30, 30),\n", |
103 | | - " stride=(2, 2),\n", |
104 | | - " generate_code=False)\n", |
| 177 | + " stride=(2, 2))\n", |
105 | 178 | " # Sixteen 3x3 filters, activation RELU\n", |
106 | 179 | " layer3 = ml.Conv(kernel_size=(16, 3, 3),\n", |
107 | 180 | " input_size=(batch_size, 6, 15, 15),\n", |
108 | | - " activation=ml.activation.ReLU(),\n", |
109 | | - " generate_code=False)\n", |
| 181 | + " activation=ml.activation.ReLU())\n", |
110 | 182 | " # Max 2x2 subsampling\n", |
111 | 183 | " layer4 = ml.MaxPooling(kernel_size=(2, 2),\n", |
112 | 184 | " input_size=(batch_size, 16, 13, 13),\n", |
113 | 185 | " stride=(2, 2),\n", |
114 | | - " strict_stride_check=False,\n", |
115 | | - " generate_code=False)\n", |
| 186 | + " strict_stride_check=False)\n", |
116 | 187 | " # Full connection (16 * 6 * 6 -> 120), activation RELU\n", |
117 | 188 | " layer5 = ml.FullyConnected(weight_size=(120, 576),\n", |
118 | 189 | " input_size=(576, batch_size),\n", |
119 | | - " activation=ml.activation.ReLU(),\n", |
120 | | - " generate_code=False)\n", |
| 190 | + " activation=ml.activation.ReLU())\n", |
121 | 191 | " # Full connection (120 -> 84), activation RELU\n", |
122 | 192 | " layer6 = ml.FullyConnected(weight_size=(84, 120),\n", |
123 | 193 | " input_size=(120, batch_size),\n", |
124 | | - " activation=ml.activation.ReLU(),\n", |
125 | | - " generate_code=False)\n", |
| 194 | + " activation=ml.activation.ReLU())\n", |
126 | 195 | " # Full connection (84 -> 10), output layer\n", |
127 | 196 | " layer7 = ml.FullyConnectedSoftmax(weight_size=(10, 84),\n", |
128 | | - " input_size=(84, batch_size),\n", |
129 | | - " generate_code=False)\n", |
| 197 | + " input_size=(84, batch_size))\n", |
130 | 198 | " # Flattening layer necessary between layer 4 and 5\n", |
131 | | - " layer_flat = ml.Flat(input_size=(batch_size, 16, 6, 6),\n", |
132 | | - " generate_code=False)\n", |
| 199 | + " layer_flat = ml.Flat(input_size=(batch_size, 16, 6, 6))\n", |
133 | 200 | " \n", |
134 | 201 | " layers = [layer1, layer2, layer3, layer4,\n", |
135 | 202 | " layer_flat, layer5, layer6, layer7]\n", |
136 | 203 | " \n", |
137 | 204 | " net = ml.Net(layers)\n", |
138 | 205 | " outputs = net.forward(input_data)\n", |
139 | 206 | " \n", |
140 | | - " def loss_grad(layer, b):\n", |
| 207 | + " def loss_grad(layer, expected):\n", |
141 | 208 | " gradients = []\n", |
142 | 209 | " \n", |
143 | | - " for i in range(10):\n", |
144 | | - " result = layer.result.data[i, b]\n", |
145 | | - " if i == expected_results[b]:\n", |
146 | | - " result -= 1\n", |
147 | | - " gradients.append(result)\n", |
| 210 | + " for b in range(batch_size):\n", |
| 211 | + " row = []\n", |
| 212 | + " for i in range(10):\n", |
| 213 | + " result = layer.result.data[i, b]\n", |
| 214 | + " if i == expected[b]:\n", |
| 215 | + " result -= 1\n", |
| 216 | + " row.append(result)\n", |
| 217 | + " gradients.append(row)\n", |
148 | 218 | " \n", |
149 | 219 | " return gradients\n", |
150 | 220 | " \n", |
151 | | - " net.backward(loss_grad)\n", |
| 221 | + " net.backward(expected_results, loss_grad)\n", |
152 | 222 | " \n", |
153 | 223 | " return (layer1, layer2, layer3, layer4, layer_flat, layer5, layer6, layer7)" |
154 | 224 | ] |
155 | 225 | }, |
| 226 | + { |
| 227 | + "cell_type": "markdown", |
| 228 | + "metadata": {}, |
| 229 | + "source": [ |
| 230 | + "Afterwards, we're ready to run the backward pass." |
| 231 | + ] |
| 232 | + }, |
156 | 233 | { |
157 | 234 | "cell_type": "code", |
158 | 235 | "execution_count": 8, |
|
167 | 244 | "/home/maksymilian/Desktop/UROP/devito/devito/types/grid.py:206: RuntimeWarning: divide by zero encountered in true_divide\n", |
168 | 245 | " spacing = (np.array(self.extent) / (np.array(self.shape) - 1)).astype(self.dtype)\n", |
169 | 246 | "Operator `Kernel` run in 0.01 s\n", |
170 | | - "Operator `Kernel` run in 0.01 s\n", |
171 | | - "Operator `Kernel` run in 0.01 s\n", |
172 | | - "Operator `Kernel` run in 0.01 s\n", |
173 | 247 | "Operator `Kernel` run in 0.01 s\n" |
174 | 248 | ] |
175 | 249 | } |
|
182 | 256 | "cell_type": "markdown", |
183 | 257 | "metadata": {}, |
184 | 258 | "source": [ |
185 | | - "PyTorch:" |
| 259 | + "Results are stored in the `kernel_gradients` and `bias_gradients` properties of each layer (where applicable)." |
186 | 260 | ] |
187 | 261 | }, |
188 | 262 | { |
189 | | - "cell_type": "code", |
190 | | - "execution_count": 9, |
| 263 | + "cell_type": "markdown", |
191 | 264 | "metadata": {}, |
192 | | - "outputs": [], |
193 | 265 | "source": [ |
194 | | - "import torch.nn as nn\n", |
195 | | - "import torch.nn.functional as F\n", |
196 | | - "import torch.optim as optim" |
| 266 | + "In order to check the numerical correctness, we'll create the same network with PyTorch, run a backward pass through it using the same initial weights and data and compare the results with Joey's." |
| 267 | + ] |
| 268 | + }, |
| 269 | + { |
| 270 | + "cell_type": "markdown", |
| 271 | + "metadata": {}, |
| 272 | + "source": [ |
| 273 | + "Here's the PyTorch code:" |
197 | 274 | ] |
198 | 275 | }, |
199 | 276 | { |
200 | 277 | "cell_type": "code", |
201 | | - "execution_count": 10, |
| 278 | + "execution_count": 9, |
202 | 279 | "metadata": {}, |
203 | 280 | "outputs": [], |
204 | 281 | "source": [ |
|
230 | 307 | }, |
231 | 308 | { |
232 | 309 | "cell_type": "code", |
233 | | - "execution_count": 11, |
| 310 | + "execution_count": 10, |
234 | 311 | "metadata": {}, |
235 | 312 | "outputs": [], |
236 | 313 | "source": [ |
|
252 | 329 | }, |
253 | 330 | { |
254 | 331 | "cell_type": "code", |
255 | | - "execution_count": 12, |
| 332 | + "execution_count": 11, |
256 | 333 | "metadata": {}, |
257 | 334 | "outputs": [], |
258 | 335 | "source": [ |
|
263 | 340 | "loss.backward()" |
264 | 341 | ] |
265 | 342 | }, |
| 343 | + { |
| 344 | + "cell_type": "markdown", |
| 345 | + "metadata": {}, |
| 346 | + "source": [ |
| 347 | + "After running the backward pass in PyTorch, we're ready to make comparisons. Let's calculate relative errors between Joey and PyTorch in terms of weight/bias gradients." |
| 348 | + ] |
| 349 | + }, |
266 | 350 | { |
267 | 351 | "cell_type": "code", |
268 | | - "execution_count": 13, |
| 352 | + "execution_count": 12, |
269 | 353 | "metadata": {}, |
270 | 354 | "outputs": [ |
271 | 355 | { |
272 | 356 | "name": "stdout", |
273 | 357 | "output_type": "stream", |
274 | 358 | "text": [ |
275 | | - "layers[0] maximum relative error: 1.599673499123359e-14\n", |
276 | | - "layers[1] maximum relative error: 5.710234136667345e-12\n", |
277 | | - "layers[2] maximum relative error: 1.9638017195468526e-11\n", |
278 | | - "layers[3] maximum relative error: 1.8676488586249282e-11\n", |
279 | | - "layers[4] maximum relative error: 3.4692340371450744e-13\n", |
| 359 | + "layers[0] maximum relative error: 1.4935025269750558e-14\n", |
| 360 | + "layers[1] maximum relative error: 1.0457210947850931e-13\n", |
| 361 | + "layers[2] maximum relative error: 3.0920027811804816e-12\n", |
| 362 | + "layers[3] maximum relative error: 2.615895862310905e-13\n", |
| 363 | + "layers[4] maximum relative error: 1.4951643318957554e-12\n", |
280 | 364 | "\n", |
281 | | - "Maximum relative error is in layers[2]: 1.9638017195468526e-11\n" |
| 365 | + "Maximum relative error is in layers[2]: 3.0920027811804816e-12\n" |
282 | 366 | ] |
283 | 367 | }, |
284 | 368 | { |
285 | 369 | "name": "stderr", |
286 | 370 | "output_type": "stream", |
287 | 371 | "text": [ |
288 | | - "<ipython-input-13-c5fd7a032cbe>:11: RuntimeWarning: invalid value encountered in true_divide\n", |
| 372 | + "<ipython-input-12-c5fd7a032cbe>:11: RuntimeWarning: invalid value encountered in true_divide\n", |
289 | 373 | " kernel_error = abs(kernel_grad - pytorch_kernel_grad) / abs(pytorch_kernel_grad)\n", |
290 | | - "<ipython-input-13-c5fd7a032cbe>:16: RuntimeWarning: invalid value encountered in true_divide\n", |
| 374 | + "<ipython-input-12-c5fd7a032cbe>:16: RuntimeWarning: invalid value encountered in true_divide\n", |
291 | 375 | " bias_error = abs(bias_grad - pytorch_bias_grad) / abs(pytorch_bias_grad)\n" |
292 | 376 | ] |
293 | 377 | } |
|
320 | 404 | "print()\n", |
321 | 405 | "print('Maximum relative error is in layers[' + str(index) + ']: ' + str(max_error))" |
322 | 406 | ] |
| 407 | + }, |
| 408 | + { |
| 409 | + "cell_type": "markdown", |
| 410 | + "metadata": {}, |
| 411 | + "source": [ |
| 412 | + "As we can see, the maximum error is low enough (given floating-point calculation accuracy and the complexity of our network) for Joey's results to be considered correct." |
| 413 | + ] |
323 | 414 | } |
324 | 415 | ], |
325 | 416 | "metadata": { |
|
0 commit comments