SolveWithPython

Building a Neural Network Template in Python — Gradients and the Backward Pass

(Neural Networks From Scratch · Article 3)

The missing piece: how does the network improve?

In Article 2, we introduced loss.

The network can now answer:

“How wrong am I?”

But it still cannot answer:

“What should I change?”

To improve predictions, the network must know:

  • which weights caused the error
  • how much each weight contributed
  • in which direction to adjust them

This information is carried by gradients.

What is a gradient (plain English)

A gradient tells us:

“If I change this weight a little, how does the loss change?”

  • Positive gradient → decrease the weight
  • Negative gradient → increase the weight
  • Large gradient → weight matters a lot
  • Small gradient → weight matters less

Gradients are signals, not updates.

The backward pass (big idea)

The backward pass is the process of:

  • starting from the loss
  • moving backward through the network
  • computing gradients for each parameter

Conceptually:

Loss
Output Layer
Hidden Layer
Input

This is why the algorithm is called backpropagation.

We start small: one Dense layer

To avoid confusion, we will work with:

  • one Dense layer
  • one input
  • one output
  • Mean Squared Error (MSE)

No activations yet.

Step 1: Add gradient storage to Dense

Update nn/layers.py:

Python
class Dense(Layer):
def __init__(self, input_size, output_size):
import random
self.weights = [
[random.uniform(-0.1, 0.1) for _ in range(input_size)]
for _ in range(output_size)
]
self.bias = [0.0 for _ in range(output_size)]
# Gradients
self.grad_weights = [
[0.0 for _ in range(input_size)]
for _ in range(output_size)
]
self.grad_bias = [0.0 for _ in range(output_size)]
def forward(self, x):
self.input = x # cache input for backward pass
output = []
for neuron_weights, neuron_bias in zip(self.weights, self.bias):
value = 0.0
for w, xi in zip(neuron_weights, x):
value += w * xi
value += neuron_bias
output.append(value)
return output

The key addition:

  • we store the input
  • we prepare space for gradients

Step 2: Understanding the math (minimal, safe)

For a single output neuron:

ŷ = w₁x₁ + w₂x₂ + b
loss = (y − ŷ)²

The gradients become:

∂loss/∂w = 2(ŷ − y) * x
∂loss/∂b = 2(ŷ − y)

That’s all we need.

No calculus steps required.

Step 3: Implement backward() for Dense

Add this to the Dense class:

def backward(self, grad_output):
"""
grad_output: gradient coming from the next layer (or loss)
"""
grad_input = [0.0 for _ in self.input]
for i in range(len(self.weights)):
# bias gradient
self.grad_bias[i] = grad_output[i]
for j in range(len(self.weights[i])):
# weight gradient
self.grad_weights[i][j] = grad_output[i] * self.input[j]
# gradient to pass backward
grad_input[j] += self.weights[i][j] * grad_output[i]
return grad_input

What this does:

  • computes gradients for weights and bias
  • returns gradient for the previous layer

Step 4: Loss gradient (MSE derivative)

Update nn/losses.py:

Python
class MSE:
def forward(self, y_pred, y_true):
self.y_pred = y_pred
self.y_true = y_true
losses = []
for yp, yt in zip(y_pred, y_true):
losses.append((yp - yt) ** 2)
return sum(losses) / len(losses)
def backward(self):
grads = []
for yp, yt in zip(self.y_pred, self.y_true):
grads.append(2 * (yp - yt))
return grads

Now the loss can:

  • compute error
  • send gradients backward

Step 5: Wiring backward pass in the network

Update nn/core.py:

Python
class NeuralNet:
def __init__(self, layers):
self.layers = layers
def forward(self, x):
for layer in self.layers:
x = layer.forward(x)
return x
def backward(self, grad):
for layer in reversed(self.layers):
grad = layer.backward(grad)

This is the entire backpropagation pipeline.

Step 6: A backward-pass demo

Create examples/03_backward_demo.py:

Python
import random
from nn.layers import Dense
from nn.core import NeuralNet
from nn.losses import MSE
random.seed(42)
net = NeuralNet([
Dense(1, 1)
])
loss_fn = MSE()
x = [2.0]
y_true = [4.0]
y_pred = net.forward(x)
loss = loss_fn.forward(y_pred, y_true)
grad_loss = loss_fn.backward()
net.backward(grad_loss)
layer = net.layers[0]
print("Prediction:", y_pred)
print("Loss:", loss)
print("Weight gradients:", layer.grad_weights)
print("Bias gradients:", layer.grad_bias)

Run:

python -m examples.03_backward_demo

You should now see non-zero gradients.

That means:

  • the network knows what went wrong
  • it knows where
  • but it still hasn’t changed anything

Important checkpoint (very important)

At this point:

  • ❌ no learning yet
  • ❌ no training loop
  • ❌ no weight updates

But:

  • ✅ error flows backward
  • ✅ gradients are correct
  • ✅ foundation is solid

This is the hardest conceptual part of neural networks.

You just passed it.

Common beginner confusion

“Why aren’t weights changing?”
Because we haven’t applied updates yet.

“Is this backpropagation?”
Yes — this is backpropagation in its simplest form.

“Why is backward separate from update?”
Because computing gradients and applying updates are different responsibilities.

What comes next (Article 4)

Now that we have gradients, we can finally answer:

“How do we update the weights?”

Next article:

  • Stochastic Gradient Descent (SGD)
  • training loop
  • learning rate
  • watching loss go down

This is where learning truly begins.

Series progress