SolveWithPython

Building a Neural Network Template in Python — Gradients and the Backward Pass

(Neural Networks From Scratch · Article 3)

The missing piece: how does the network improve?

In Article 2, we introduced loss.

The network can now answer:

“How wrong am I?”

But it still cannot answer:

“What should I change?”

To improve predictions, the network must know:

which weights caused the error
how much each weight contributed
in which direction to adjust them

This information is carried by gradients.

What is a gradient (plain English)

A gradient tells us:

“If I change this weight a little, how does the loss change?”

Positive gradient → decrease the weight
Negative gradient → increase the weight
Large gradient → weight matters a lot
Small gradient → weight matters less

Gradients are signals, not updates.

The backward pass (big idea)

The backward pass is the process of:

starting from the loss
moving backward through the network
computing gradients for each parameter

Conceptually:

			
Loss
  ↓
Output Layer
  ↓
Hidden Layer
  ↓
Input

		

This is why the algorithm is called backpropagation.

We start small: one Dense layer

To avoid confusion, we will work with:

one Dense layer
one input
one output
Mean Squared Error (MSE)

No activations yet.

Step 1: Add gradient storage to `Dense`

Update nn/layers.py:

class Dense(Layer):
    def __init__(self, input_size, output_size):
        import random
        self.weights = [
            [random.uniform(-0.1, 0.1) for _ in range(input_size)]
            for _ in range(output_size)
        ]
        self.bias = [0.0 for _ in range(output_size)]
        # Gradients
        self.grad_weights = [
            [0.0 for _ in range(input_size)]
            for _ in range(output_size)
        ]
        self.grad_bias = [0.0 for _ in range(output_size)]
    def forward(self, x):
        self.input = x  # cache input for backward pass
        output = []
        for neuron_weights, neuron_bias in zip(self.weights, self.bias):
            value = 0.0
            for w, xi in zip(neuron_weights, x):
                value += w * xi
            value += neuron_bias
            output.append(value)
        return output

The key addition:

we store the input
we prepare space for gradients

Step 2: Understanding the math (minimal, safe)

For a single output neuron:

			
ŷ = w₁x₁ + w₂x₂ + b
loss = (y − ŷ)²

The gradients become:

			
∂loss/∂w = 2(ŷ − y) * x
∂loss/∂b = 2(ŷ − y)

That’s all we need.

No calculus steps required.

Step 3: Implement `backward()` for Dense

Add this to the Dense class:

			
def backward(self, grad_output):
    """
    grad_output: gradient coming from the next layer (or loss)
    """
    grad_input = [0.0 for _ in self.input]
    for i in range(len(self.weights)):
        # bias gradient
        self.grad_bias[i] = grad_output[i]
        for j in range(len(self.weights[i])):
            # weight gradient
            self.grad_weights[i][j] = grad_output[i] * self.input[j]
            # gradient to pass backward
            grad_input[j] += self.weights[i][j] * grad_output[i]
    return grad_input

		

What this does:

computes gradients for weights and bias
returns gradient for the previous layer

Step 4: Loss gradient (MSE derivative)

Update nn/losses.py:

class MSE:
    def forward(self, y_pred, y_true):
        self.y_pred = y_pred
        self.y_true = y_true
        losses = []
        for yp, yt in zip(y_pred, y_true):
            losses.append((yp - yt) ** 2)
        return sum(losses) / len(losses)
    def backward(self):
        grads = []
        for yp, yt in zip(self.y_pred, self.y_true):
            grads.append(2 * (yp - yt))
        return grads

Now the loss can:

compute error
send gradients backward

Step 5: Wiring backward pass in the network

Update nn/core.py:

class NeuralNet:
    def __init__(self, layers):
        self.layers = layers
    def forward(self, x):
        for layer in self.layers:
            x = layer.forward(x)
        return x
    def backward(self, grad):
        for layer in reversed(self.layers):
            grad = layer.backward(grad)

This is the entire backpropagation pipeline.

Step 6: A backward-pass demo

Create examples/03_backward_demo.py:

import random
from nn.layers import Dense
from nn.core import NeuralNet
from nn.losses import MSE
random.seed(42)
net = NeuralNet([
    Dense(1, 1)
])
loss_fn = MSE()
x = [2.0]
y_true = [4.0]
y_pred = net.forward(x)
loss = loss_fn.forward(y_pred, y_true)
grad_loss = loss_fn.backward()
net.backward(grad_loss)
layer = net.layers[0]
print("Prediction:", y_pred)
print("Loss:", loss)
print("Weight gradients:", layer.grad_weights)
print("Bias gradients:", layer.grad_bias)

Run:

python -m examples.03_backward_demo

You should now see non-zero gradients.

That means:

the network knows what went wrong
it knows where
but it still hasn’t changed anything

Important checkpoint (very important)

At this point:

❌ no learning yet
❌ no training loop
❌ no weight updates

But:

✅ error flows backward
✅ gradients are correct
✅ foundation is solid

This is the hardest conceptual part of neural networks.

You just passed it.

Common beginner confusion

“Why aren’t weights changing?”
Because we haven’t applied updates yet.

“Is this backpropagation?”
Yes — this is backpropagation in its simplest form.

“Why is backward separate from update?”
Because computing gradients and applying updates are different responsibilities.

What comes next (Article 4)

Now that we have gradients, we can finally answer:

“How do we update the weights?”

Stochastic Gradient Descent (SGD)
training loop
learning rate
watching loss go down

This is where learning truly begins.

Series progress

Article 1: Project setup & core abstractions ✅
Article 2: Loss functions & error intuition ✅
Article 3: Gradients & backward pass ✅
Article 4: Training loop & SGD ⏭️