DEV Community

Somnath Das
Somnath Das

Posted on

I Built a Neural Network from Scratch Using Only Numpy—Here’s How You Can Too!

🤯 Why You Should Care
Neural networks (NNs) power everything from ChatGPT to self-driving cars. But let’s be honest: Using TensorFlow/PyTorch feels like magic—until you realize you don’t know how the wand works.

This post is for you if:

🧠 You want to demystify neural networks (no more black boxes!).

💻 You love coding fundamentals (goodbye model.fit(), hello raw matrices!).

⚡ You crave the satisfaction of "I built this myself!"

Spoiler: By the end, you’ll code a NN that classifies handwritten digits (MNIST) with 90%+ accuracy—using only numpy. Let’s go!


🔥 The Blueprint: How Neural Nets Actually Work
Here’s what we’ll implement:

  1. Layers: Input → Hidden → Output (with weights and biases).
  2. Activation Function: ReLU (hidden layer) and Softmax (output).
  3. Loss: Cross-entropy (because we’re classifying digits).
  4. Backpropagation: Calculus + chain rule (don’t panic—numpy does the heavy lifting).

💻 Step 1: Coding the Neural Network

1. Initialize Parameters

import numpy as np def initialize_parameters(input_size, hidden_size, output_size): W1 = np.random.randn(hidden_size, input_size) * 0.01 b1 = np.zeros((hidden_size, 1)) W2 = np.random.randn(output_size, hidden_size) * 0.01 b2 = np.zeros((output_size, 1)) return {"W1": W1, "b1": b1, "W2": W2, "b2": b2} 
Enter fullscreen mode Exit fullscreen mode

Why? Tiny random weights prevent symmetry issues. Biases start at zero.
2. Forward Propagation

def relu(Z): return np.maximum(0, Z) def softmax(Z): exp = np.exp(Z - np.max(Z)) # Stability hack return exp / exp.sum(axis=0) def forward(X, params): Z1 = params["W1"] @ X + params["b1"] A1 = relu(Z1) Z2 = params["W2"] @ A1 + params["b2"] A2 = softmax(Z2) return A2, (Z1, A1, Z2) 
Enter fullscreen mode Exit fullscreen mode

3. Compute Loss

def cross_entropy_loss(A2, Y): m = Y.shape[1] log_probs = np.log(A2) * Y return -np.sum(log_probs) / m 
Enter fullscreen mode Exit fullscreen mode

4. Backpropagation (The “Aha!” Moment)

def backward(X, Y, params, cache): m = Y.shape[1] Z1, A1, Z2 = cache A2, _ = forward(X, params) # Output layer gradient dZ2 = A2 - Y dW2 = (dZ2 @ A1.T) / m db2 = np.sum(dZ2, axis=1, keepdims=True) / m # Hidden layer gradient dZ1 = (params["W2"].T @ dZ2) * (Z1 > 0) # ReLU derivative dW1 = (dZ1 @ X.T) / m db1 = np.sum(dZ1, axis=1, keepdims=True) / m return {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2} 
Enter fullscreen mode Exit fullscreen mode

5. Update Parameters (Gradient Descent)

def update_params(params, grads, learning_rate=0.1): params["W1"] -= learning_rate * grads["dW1"] params["b1"] -= learning_rate * grads["db1"] params["W2"] -= learning_rate * grads["dW2"] params["b2"] -= learning_rate * grads["db2"] return params 
Enter fullscreen mode Exit fullscreen mode

🚂 Training Loop (The Grind)

def train(X, Y, epochs=1000): params = initialize_parameters(784, 128, 10) # MNIST: 28x28=784 pixels for i in range(epochs): A2, cache = forward(X, params) loss = cross_entropy_loss(A2, Y) grads = backward(X, Y, params, cache) params = update_params(params, grads) if i % 100 == 0: print(f"Epoch {i}: Loss = {loss:.4f}") return params 
Enter fullscreen mode Exit fullscreen mode

🎯 Results: 92% Accuracy on MNIST!
After training on 60k MNIST images (and tuning hyperparameters):

Epoch 0: Loss = 2.3026 Epoch 100: Loss = 0.3541 Epoch 200: Loss = 0.2011 ... Final Test Accuracy: 92.3% 
Enter fullscreen mode Exit fullscreen mode

Not bad for 150 lines of numpy!


💡 Key Takeaways

  1. NNs are just math: Matrix multiplications, derivatives, and chain rules.
  2. Backpropagation = Loss gradients flowing backward (no magic!).
  3. You don’t need frameworks to understand the core (but use them for real projects 😉).

👨💻 Follow on GitHub
https://github.com/dassomnath99

📣 Share This Post
If you geeked out reading this, share it with a friend and tag #NumpyNN!

💬 Comments
“Wait, backprop is just the chain rule?!” → Drop your reactions below!

Top comments (0)