🤯 Why You Should Care
Neural networks (NNs) power everything from ChatGPT to self-driving cars. But let’s be honest: Using TensorFlow/PyTorch feels like magic—until you realize you don’t know how the wand works.
This post is for you if:
🧠 You want to demystify neural networks (no more black boxes!).
💻 You love coding fundamentals (goodbye model.fit()
, hello raw matrices!).
⚡ You crave the satisfaction of "I built this myself!"
Spoiler: By the end, you’ll code a NN that classifies handwritten digits (MNIST) with 90%+ accuracy—using only numpy. Let’s go!
🔥 The Blueprint: How Neural Nets Actually Work
Here’s what we’ll implement:
- Layers: Input → Hidden → Output (with weights and biases).
- Activation Function: ReLU (hidden layer) and Softmax (output).
- Loss: Cross-entropy (because we’re classifying digits).
- Backpropagation: Calculus + chain rule (don’t panic—numpy does the heavy lifting).
💻 Step 1: Coding the Neural Network
1. Initialize Parameters
import numpy as np def initialize_parameters(input_size, hidden_size, output_size): W1 = np.random.randn(hidden_size, input_size) * 0.01 b1 = np.zeros((hidden_size, 1)) W2 = np.random.randn(output_size, hidden_size) * 0.01 b2 = np.zeros((output_size, 1)) return {"W1": W1, "b1": b1, "W2": W2, "b2": b2}
Why? Tiny random weights prevent symmetry issues. Biases start at zero.
2. Forward Propagation
def relu(Z): return np.maximum(0, Z) def softmax(Z): exp = np.exp(Z - np.max(Z)) # Stability hack return exp / exp.sum(axis=0) def forward(X, params): Z1 = params["W1"] @ X + params["b1"] A1 = relu(Z1) Z2 = params["W2"] @ A1 + params["b2"] A2 = softmax(Z2) return A2, (Z1, A1, Z2)
3. Compute Loss
def cross_entropy_loss(A2, Y): m = Y.shape[1] log_probs = np.log(A2) * Y return -np.sum(log_probs) / m
4. Backpropagation (The “Aha!” Moment)
def backward(X, Y, params, cache): m = Y.shape[1] Z1, A1, Z2 = cache A2, _ = forward(X, params) # Output layer gradient dZ2 = A2 - Y dW2 = (dZ2 @ A1.T) / m db2 = np.sum(dZ2, axis=1, keepdims=True) / m # Hidden layer gradient dZ1 = (params["W2"].T @ dZ2) * (Z1 > 0) # ReLU derivative dW1 = (dZ1 @ X.T) / m db1 = np.sum(dZ1, axis=1, keepdims=True) / m return {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2}
5. Update Parameters (Gradient Descent)
def update_params(params, grads, learning_rate=0.1): params["W1"] -= learning_rate * grads["dW1"] params["b1"] -= learning_rate * grads["db1"] params["W2"] -= learning_rate * grads["dW2"] params["b2"] -= learning_rate * grads["db2"] return params
🚂 Training Loop (The Grind)
def train(X, Y, epochs=1000): params = initialize_parameters(784, 128, 10) # MNIST: 28x28=784 pixels for i in range(epochs): A2, cache = forward(X, params) loss = cross_entropy_loss(A2, Y) grads = backward(X, Y, params, cache) params = update_params(params, grads) if i % 100 == 0: print(f"Epoch {i}: Loss = {loss:.4f}") return params
🎯 Results: 92% Accuracy on MNIST!
After training on 60k MNIST images (and tuning hyperparameters):
Epoch 0: Loss = 2.3026 Epoch 100: Loss = 0.3541 Epoch 200: Loss = 0.2011 ... Final Test Accuracy: 92.3%
Not bad for 150 lines of numpy!
💡 Key Takeaways
- NNs are just math: Matrix multiplications, derivatives, and chain rules.
- Backpropagation = Loss gradients flowing backward (no magic!).
- You don’t need frameworks to understand the core (but use them for real projects 😉).
👨💻 Follow on GitHub
https://github.com/dassomnath99
📣 Share This Post
If you geeked out reading this, share it with a friend and tag #NumpyNN!
💬 Comments
“Wait, backprop is just the chain rule?!” → Drop your reactions below!
Top comments (0)