(Neural Networks From Scratch · Article 3)
The missing piece: how does the network improve?
In Article 2, we introduced loss.
The network can now answer:
“How wrong am I?”
But it still cannot answer:
“What should I change?”
To improve predictions, the network must know:
- which weights caused the error
- how much each weight contributed
- in which direction to adjust them
This information is carried by gradients.
What is a gradient (plain English)
A gradient tells us:
“If I change this weight a little, how does the loss change?”
- Positive gradient → decrease the weight
- Negative gradient → increase the weight
- Large gradient → weight matters a lot
- Small gradient → weight matters less
Gradients are signals, not updates.
The backward pass (big idea)
The backward pass is the process of:
- starting from the loss
- moving backward through the network
- computing gradients for each parameter
Conceptually:
Loss ↓Output Layer ↓Hidden Layer ↓Input
This is why the algorithm is called backpropagation.
We start small: one Dense layer
To avoid confusion, we will work with:
- one Dense layer
- one input
- one output
- Mean Squared Error (MSE)
No activations yet.
Step 1: Add gradient storage to Dense
Update nn/layers.py:
class Dense(Layer): def __init__(self, input_size, output_size): import random self.weights = [ [random.uniform(-0.1, 0.1) for _ in range(input_size)] for _ in range(output_size) ] self.bias = [0.0 for _ in range(output_size)] # Gradients self.grad_weights = [ [0.0 for _ in range(input_size)] for _ in range(output_size) ] self.grad_bias = [0.0 for _ in range(output_size)] def forward(self, x): self.input = x # cache input for backward pass output = [] for neuron_weights, neuron_bias in zip(self.weights, self.bias): value = 0.0 for w, xi in zip(neuron_weights, x): value += w * xi value += neuron_bias output.append(value) return output
The key addition:
- we store the input
- we prepare space for gradients
Step 2: Understanding the math (minimal, safe)
For a single output neuron:
ŷ = w₁x₁ + w₂x₂ + bloss = (y − ŷ)²
The gradients become:
∂loss/∂w = 2(ŷ − y) * x∂loss/∂b = 2(ŷ − y)
That’s all we need.
No calculus steps required.
Step 3: Implement backward() for Dense
Add this to the Dense class:
def backward(self, grad_output): """ grad_output: gradient coming from the next layer (or loss) """ grad_input = [0.0 for _ in self.input] for i in range(len(self.weights)): # bias gradient self.grad_bias[i] = grad_output[i] for j in range(len(self.weights[i])): # weight gradient self.grad_weights[i][j] = grad_output[i] * self.input[j] # gradient to pass backward grad_input[j] += self.weights[i][j] * grad_output[i] return grad_input
What this does:
- computes gradients for weights and bias
- returns gradient for the previous layer
Step 4: Loss gradient (MSE derivative)
Update nn/losses.py:
class MSE: def forward(self, y_pred, y_true): self.y_pred = y_pred self.y_true = y_true losses = [] for yp, yt in zip(y_pred, y_true): losses.append((yp - yt) ** 2) return sum(losses) / len(losses) def backward(self): grads = [] for yp, yt in zip(self.y_pred, self.y_true): grads.append(2 * (yp - yt)) return grads
Now the loss can:
- compute error
- send gradients backward
Step 5: Wiring backward pass in the network
Update nn/core.py:
class NeuralNet: def __init__(self, layers): self.layers = layers def forward(self, x): for layer in self.layers: x = layer.forward(x) return x def backward(self, grad): for layer in reversed(self.layers): grad = layer.backward(grad)
This is the entire backpropagation pipeline.
Step 6: A backward-pass demo
Create examples/03_backward_demo.py:
import randomfrom nn.layers import Densefrom nn.core import NeuralNetfrom nn.losses import MSErandom.seed(42)net = NeuralNet([ Dense(1, 1)])loss_fn = MSE()x = [2.0]y_true = [4.0]y_pred = net.forward(x)loss = loss_fn.forward(y_pred, y_true)grad_loss = loss_fn.backward()net.backward(grad_loss)layer = net.layers[0]print("Prediction:", y_pred)print("Loss:", loss)print("Weight gradients:", layer.grad_weights)print("Bias gradients:", layer.grad_bias)
Run:
python -m examples.03_backward_demo
You should now see non-zero gradients.
That means:
- the network knows what went wrong
- it knows where
- but it still hasn’t changed anything
Important checkpoint (very important)
At this point:
- ❌ no learning yet
- ❌ no training loop
- ❌ no weight updates
But:
- ✅ error flows backward
- ✅ gradients are correct
- ✅ foundation is solid
This is the hardest conceptual part of neural networks.
You just passed it.
Common beginner confusion
“Why aren’t weights changing?”
Because we haven’t applied updates yet.
“Is this backpropagation?”
Yes — this is backpropagation in its simplest form.
“Why is backward separate from update?”
Because computing gradients and applying updates are different responsibilities.
What comes next (Article 4)
Now that we have gradients, we can finally answer:
“How do we update the weights?”
Next article:
- Stochastic Gradient Descent (SGD)
- training loop
- learning rate
- watching loss go down
This is where learning truly begins.
Series progress
- Article 1: Project setup & core abstractions ✅
- Article 2: Loss functions & error intuition ✅
- Article 3: Gradients & backward pass ✅
- Article 4: Training loop & SGD ⏭️