SolveWithPython

Backpropagation Step by Step — Computing Gradients for a Single Neuron

In the previous article, we introduced gradients and answered a fundamental question:

How does a change in a weight affect the loss?

We saw that the answer is a derivative, and that learning requires computing these derivatives systematically.

In this article, we will do exactly that.

We will:

Take a single neuron
Walk through forward computation
Compute the loss
Derive gradients step by step
Implement the math directly in Python

This is the moment where neural networks stop being conceptual—and start learning.

The Simplest Learnable Setup

We will work with the smallest meaningful system:

One input
One weight
One bias
One activation
One loss function

No layers. No loops. No shortcuts.

Step 1: Define the Forward Computation

Our neuron: $z = w \cdot x + b$

Activation (identity, for now): $a = z$

Loss (Mean Squared Error): $L = (a – y)^2$

This setup removes distractions so we can focus on gradients.

Step 2: Concrete Values

Let’s assign real numbers:

x = 2.0      # input
w = 1.5      # weight
b = 0.5      # bias
y = 4.0      # target

Forward Pass

z = w * x + b        # 1.5 * 2.0 + 0.5 = 3.5
a = z
loss = (a - y) ** 2 # (3.5 - 4.0)^2 = 0.25

The network is wrong — but not by much.

Now comes the important part.

Step 3: What Gradients Do We Need?

To update the parameters, we need: $\frac{\partial L}{\partial w} \quad \text{and} \quad \frac{\partial L}{\partial b}$

We compute these using the chain rule.

Step 4: Apply the Chain Rule

Recall the dependency chain:

Loss depends on a
a depends on z
z depends on w and b

So: $\frac{\partial L}{\partial w} = \frac{\partial L}{\partial a} \cdot \frac{\partial a}{\partial z} \cdot \frac{\partial z}{\partial w}$

And similarly for b.

Step 5: Compute Each Derivative

1. Loss with respect to activation

$L = (a – y)^2$ $\frac{\partial L}{\partial a} = 2(a – y)$

dL_da = 2 * (a - y)  # 2 * (3.5 - 4.0) = -1.0

2. Activation with respect to z

Since a = z: $\frac{\partial a}{\partial z} = 1$

da_dz = 1.0

3. z with respect to weight

$z = w \cdot x + b$ $\frac{\partial z}{\partial w} = x$

dz_dw = x  # 2.0

4. z with respect to bias

$\frac{\partial z}{\partial b} = 1$

dz_db = 1.0

Step 6: Combine Gradients

Gradient with respect to weight

dL_dw = dL_da * da_dz * dz_dw
# -1.0 * 1.0 * 2.0 = -2.0

Gradient with respect to bias

dL_db = dL_da * da_dz * dz_db
# -1.0 * 1.0 * 1.0 = -1.0

What These Numbers Mean

dL_dw = -2.0 → increasing w will reduce loss
dL_db = -1.0 → increasing b will reduce loss

Gradients do not update anything yet.

They only point in the right direction.

Step 7: Update the Parameters (Gradient Descent)

Now we apply a learning rate.

learning_rate = 0.1
w = w - learning_rate * dL_dw
b = b - learning_rate * dL_db

New values:

w = 1.5 - 0.1 * (-2.0) = 1.7
b = 0.5 - 0.1 * (-1.0) = 0.6

Step 8: Forward Pass Again

z = w * x + b        # 1.7 * 2.0 + 0.6 = 4.0
a = z
loss = (a - y) ** 2 # (4.0 - 4.0)^2 = 0.0

The loss is now zero.

The neuron has learned.

This Is Backpropagation

Nothing more happened than:

Forward computation
Loss calculation
Derivative calculation
Parameter update

Backpropagation is not magic.

It is organized calculus applied repeatedly.

Where Activation Functions Fit In

In real networks:

a ≠ z
Activations introduce an extra derivative term

For example:

ReLU → derivative is 0 or 1
Sigmoid → derivative depends on output value

We will add these next.

Common Beginner Misconceptions

Mistake 1: Thinking backpropagation is one formula
→ It is a process, not a single equation.

Mistake 2: Confusing gradients with updates
→ Gradients describe change; learning rate applies it.

Mistake 3: Thinking frameworks do something different
→ They do exactly this—just faster and in bulk.

What We Have Achieved So Far

At this point, you now understand:

Forward propagation
Loss functions
Gradients
Backpropagation for a neuron
Gradient descent updates

This is the core of all neural networks.

Everything else is scale.

What’s Next in the Series

In Article #8, we will:

Add activation functions to backpropagation
Compute ReLU and Sigmoid derivatives
Backpropagate through a full neuron with activation
Prepare for multi-neuron layers

This is where the math becomes reusable.

GitHub Code

This article’s code will be added as a standalone, readable example:

👉 [link to your GitHub repository]

Series Progress

Neural Networks From Scratch (Pure Python)
✔ Article #1 — What a Neuron Really Computes
✔ Article #2 — Activation Functions
✔ Article #3 — Building a Layer
✔ Article #4 — Forward Propagation
✔ Article #5 — Loss Functions
✔ Article #6 — Gradients Explained
✔ Article #7 — Backpropagation Step by Step
➡ Article #8 — Backpropagation with Activation Functions