This is my extension for the C group project in Imperial College First year Computing.
βββ README.md // Readme βββ code β βββ Makefile // For easier compilation, type "make mnist" in cmd β βββ adam.c // code for Adam optimizer β βββ adam.h // header file for Adam optimizer β βββ ann.c // code for neural network (create, predict, train) β βββ ann.h // header file for ann.c β βββ cnnlayer.c // code for Convolutional layers β βββ cnnlayer.h // header file for cnnlayer.c β βββ conv.c // operations for convolutional layers β βββ data.temp // temp file to store training data for graph plotting β βββ flattenlayer.c // flatten layer for CNN β βββ flattenlayer.h // header file for flatten layer β βββ helper.c // code for graph plotting β βββ helper.h // header file for helper.c β βββ layer.c // code for fully connected layers β βββ layer.h // header file for layer.c β βββ math_funcs.c // math functions, e.g. activation and loss functions β βββ math_funcs.h // header file for math_funcs.c β βββ math_r4t.c // operations on the r4t struct β βββ math_r4t.h // header file for math_r4t.c β βββ math_structs.c // matrix operations β βββ math_structs.h // header file for math_structs.c β βββ mnist.c // Main program -- tests the DNN on the MNIST dataset β βββ tensor.c // implementation of Pytorch Tensors β βββ tensor.h // header file for tensor.c β βββ tensor_math.c // math operations on tensors β βββ xor.c // testing program to test sigmoidal MLP to learn the XOR function. Use "make xor" to build. βββ data // Image data downloaded from official MNIST website β βββ test-images.idx3-ubyte β βββ test-labels.idx1-ubyte β βββ train-images.idx3-ubyte β βββ train-labels.idx1-ubyte βββ results βββ mnist_accuracy_100x100.png // Accuracy graph for 2 hidden layers (100+100), both dropout 0.4, mini-batch GD βββ mnist_accuracy_100x100_adam.png // Accuracy graph for 2 hidden layers (100+100), only final hidden layer dropout 0.4, Adam optimizer (BEST) βββ mnist_accuracy_30x30.png // Accuracy graph for 2 hidden layers (30+30), no dropout, mini-batch GD (INITIAL) βββ mnist_accuracy_60x60.png // Accuracy graph for 2 hidden layers (60+60), both dropout 0.4, mini-batch GD βββ mnist_loss_100x100.png // Same descriptions as above, but for loss graph (Mean Cross Entropy loss of the whole training set in each epoch) βββ mnist_loss_100x100_adam.png βββ mnist_loss_30x30.png βββ mnist_loss_60x60.png βββ xor.png // Loss graph (MSE) for XOR network (1 hidden layer, 2 neurons)
Only the Deep Neural Network (with Fully Connected Layers) is tested with MNIST. Convolutional network is partially complete.
4 Layers, batch size 16:
- Input layer: 784 outputs (28 x 28 px input image)
- Hidden layer: 100 neurons, RELU activation
- Hidden layer: 100 neurons, RELU activation, dropout probability 0.4
- Output layer: 10 neurons (to represent 10 classes), softmax activation
Number of training examples: 60000 (3750 batches)
Number of validation examples: 10000 (625 batches)
I started by implementing Stochastic Gradient Descent - update for every sample
The idea is that instead of beginning with a single input vector,
Setup: Change the dimensions of the delta matrix from (num_neurons x 1) to (num_neurons * batch_size) [later denote as n x b], and calculate the deltas for each sample in a batch.
Change the learning rate
Does not actually provide a significant speedup because our matrix library is not optimised like NumPy. Fun to implement nonetheless.
Dropout is a regularisation technique to reduce overfitting. It simply means that during training, we randomly omit some of the neurons in the layers with probability
Used a technique called inverted dropout to avoid modifying the outputs during testing phase. Here
NOTE: When computing the delta matrices in backpropagation, the deltas are multiplied by the dropout mask to filter out the neurons that are dropped.
This was the missing piece that boosted the accuracy by a lot. Adam optimizer has a higher convergence speed than SGD, hence more optimal weights and biases are found.
First load the byte input files of the MNIST images, then split them into batches of 16. Then trains the network for 20 epochs.
Finally plots the graph using gnuplot
.
Prerequisites:
- Linux / WSL (does not work on Windows π€·)
- gnuplot (install via
sudo apt install gnuplot
in command line)
If you encounter 404 Not Found errors while installinggnuplot
, runsudo apt-get update
thensudo apt install gnuplot --fix-missing
.
Once you have the prerequisites, typecd code
followed bymake mnist
to generate the object files andmnist
executable. Then run the program by typing./mnist
.
The output images will be produced in thecode
directory.
The validation set is not included in training and is a true reflection of how well the model is doing.
Here's the results of some different network architectures that I tried:
- 2 hidden layers (100+100), only final hidden layer dropout 0.4, Adam optimizer (BEST)
Best network, highest validation accuracy is 97.40%. - 2 hidden layers (30+30), no dropout, mini-batch GD (INITIAL)
Surprisingly the 2nd best network, the best amongst the mini-batch GD networks. Reached highest validation accuracy of 95.59% - 2 hidden layers (60+60), both dropout 0.4, mini-batch GD
Highest validation accuracy 95.10% - 2 hidden layers (100+100), both dropout 0.4, mini-batch GD
Surprisingly worse than 60+60, validation accuracy was only about 94+% \
From the graphs of 3 and 4, the validation accuracy/loss is consistently better than the training accuracy/loss, which indicates underfitting.
Credits to my group members Jeffrey Chang and Sam Shariatmadari for writing the code for the convolutional layers.
- Put more neurons in the hidden layers. I did not do that because my potato machine will probably explode.
- Try to use GPU for matrix operations as it provides a significant speedup.
- (Specific for image classification) Perform some data augmentation to generate more training examples, such as rotating and resizing the images.