So far, we have done something crucial:
- We trained a single neuron
- We computed gradients explicitly
- We included activation functions in backpropagation
This already covers the entire mathematical foundation of neural networks.
Now we take the next step:
How does backpropagation work when a layer has many neurons?
The answer is reassuringly simple.
A layer does not introduce new math.
It just repeats the same math many times.
What Changes When We Move From a Neuron to a Layer?
Recall what a dense layer is:
- Multiple neurons
- Same input vector
- Different weights and biases
- Independent activations
Each neuron:
- Produces its own output
- Contributes to the next layer
- Has its own gradients
Backpropagation through a layer means:
Compute gradients per neuron, then aggregate them.
The Forward Computation (Layer Recap)
For a dense layer with k neurons:
The layer output is the vector:
Nothing new here.
What Gradients Do We Need for a Layer?
For each neuron i, we need:
- for every weight
- for every bias
- to pass backward to the previous layer
That last one is important.
Why Inputs Need Gradients Too
In a multi-layer network:
- The “input” to one layer
- Is the “output” of the previous layer
So during backpropagation:
- Each layer must return gradients with respect to its inputs
- Those gradients become the upstream signal for the layer before it
This is how gradients flow through the network.
Step-by-Step: Backpropagation for One Neuron in a Layer
For neuron i:
Where:
This is exactly what we already did — just indexed.
Implementing Layer Backpropagation in Python
Let’s assume:
- One dense layer
- ReLU activation
- Incoming gradient from the next layer:
dL_da_list
Forward Cache (Needed for Backprop)
def dense_forward(inputs, weights_list, bias_list, activation): z_list = [] a_list = [] for weights, bias in zip(weights_list, bias_list): z = sum(x * w for x, w in zip(inputs, weights)) + bias a = activation(z) z_list.append(z) a_list.append(a) return a_list, z_list
We store z_list because activation derivatives need it.
Backward Pass for the Layer
def dense_backward(inputs, z_list, dL_da_list, weights_list, activation_derivative): dL_dw = [] dL_db = [] dL_dx = [0.0 for _ in inputs] for i in range(len(weights_list)): da_dz = activation_derivative(z_list[i]) dL_dz = dL_da_list[i] * da_dz # Gradients for weights and bias neuron_dw = [] for j in range(len(inputs)): neuron_dw.append(dL_dz * inputs[j]) dL_dx[j] += dL_dz * weights_list[i][j] dL_dw.append(neuron_dw) dL_db.append(dL_dz) return dL_dw, dL_db, dL_dx
What This Code Is Doing
For each neuron:
- Compute its local gradient
- Compute gradients for its weights
- Compute gradient for its bias
For the layer:
- Sum contributions to
dL_dx - Return gradients upstream
This aggregation is the key idea.
Why dL_dx Is a Sum
Each input affects every neuron in the layer.
So the total gradient with respect to an input is the sum of all paths through which it influences the loss.
This is the chain rule applied at scale.
Updating the Layer Parameters
def update_layer(weights_list, bias_list, dL_dw, dL_db, learning_rate): for i in range(len(weights_list)): for j in range(len(weights_list[i])): weights_list[i][j] -= learning_rate * dL_dw[i][j] bias_list[i] -= learning_rate * dL_db[i]
This completes one learning step for the layer.
What We Have Built Now
At this point, you understand:
- Backpropagation for a neuron
- Backpropagation with activation functions
- Backpropagation through a full dense layer
- How gradients flow backward between layers
This is the full engine of learning in neural networks.
Everything else is:
- Vectorization
- Performance optimization
- Engineering convenience
Common Beginner Misconceptions
Mistake 1: Thinking layers require new math
→ They don’t. Just repetition and aggregation.
Mistake 2: Forgetting input gradients
→ Without them, networks cannot stack layers.
Mistake 3: Thinking frameworks do something different
→ They do exactly this, just faster.
What’s Next in the Series
In Article #10, we will:
- Combine everything into a full training loop
- Train a multi-layer neural network end to end
- Watch loss decrease over epochs
- See learning happen in real time
This is where all pieces finally come together.
GitHub Code
Layer-level backpropagation code will be added to the repository:
👉 [link to your GitHub repository]
Series Progress
Neural Networks From Scratch (Pure Python)
✔ Article #1 — What a Neuron Really Computes
✔ Article #2 — Activation Functions
✔ Article #3 — Building a Layer
✔ Article #4 — Forward Propagation
✔ Article #5 — Loss Functions
✔ Article #6 — Gradients Explained
✔ Article #7 — Backpropagation (Single Neuron)
✔ Article #8 — Backpropagation With Activations
✔ Article #9 — Backpropagation Through a Layer
➡ Article #10 — Training a Neural Network End to End