In the previous article, we introduced gradients and answered a fundamental question:
How does a change in a weight affect the loss?
We saw that the answer is a derivative, and that learning requires computing these derivatives systematically.
In this article, we will do exactly that.
We will:
- Take a single neuron
- Walk through forward computation
- Compute the loss
- Derive gradients step by step
- Implement the math directly in Python
This is the moment where neural networks stop being conceptual—and start learning.
The Simplest Learnable Setup
We will work with the smallest meaningful system:
- One input
- One weight
- One bias
- One activation
- One loss function
No layers. No loops. No shortcuts.
Step 1: Define the Forward Computation
Our neuron:
Activation (identity, for now):
Loss (Mean Squared Error):
This setup removes distractions so we can focus on gradients.
Step 2: Concrete Values
Let’s assign real numbers:
x = 2.0 # inputw = 1.5 # weightb = 0.5 # biasy = 4.0 # target
Forward Pass
z = w * x + b # 1.5 * 2.0 + 0.5 = 3.5a = zloss = (a - y) ** 2 # (3.5 - 4.0)^2 = 0.25
The network is wrong — but not by much.
Now comes the important part.
Step 3: What Gradients Do We Need?
To update the parameters, we need:
We compute these using the chain rule.
Step 4: Apply the Chain Rule
Recall the dependency chain:
- Loss depends on
a adepends onzzdepends onwandb
So:
And similarly for b.
Step 5: Compute Each Derivative
1. Loss with respect to activation
dL_da = 2 * (a - y) # 2 * (3.5 - 4.0) = -1.0
2. Activation with respect to z
Since a = z:
da_dz = 1.0
3. z with respect to weight
dz_dw = x # 2.0
4. z with respect to bias
dz_db = 1.0
Step 6: Combine Gradients
Gradient with respect to weight
dL_dw = dL_da * da_dz * dz_dw# -1.0 * 1.0 * 2.0 = -2.0
Gradient with respect to bias
dL_db = dL_da * da_dz * dz_db# -1.0 * 1.0 * 1.0 = -1.0
What These Numbers Mean
dL_dw = -2.0→ increasingwwill reduce lossdL_db = -1.0→ increasingbwill reduce loss
Gradients do not update anything yet.
They only point in the right direction.
Step 7: Update the Parameters (Gradient Descent)
Now we apply a learning rate.
learning_rate = 0.1w = w - learning_rate * dL_dwb = b - learning_rate * dL_db
New values:
w = 1.5 - 0.1 * (-2.0) = 1.7b = 0.5 - 0.1 * (-1.0) = 0.6
Step 8: Forward Pass Again
z = w * x + b # 1.7 * 2.0 + 0.6 = 4.0a = zloss = (a - y) ** 2 # (4.0 - 4.0)^2 = 0.0
The loss is now zero.
The neuron has learned.
This Is Backpropagation
Nothing more happened than:
- Forward computation
- Loss calculation
- Derivative calculation
- Parameter update
Backpropagation is not magic.
It is organized calculus applied repeatedly.
Where Activation Functions Fit In
In real networks:
a ≠ z- Activations introduce an extra derivative term
For example:
- ReLU → derivative is 0 or 1
- Sigmoid → derivative depends on output value
We will add these next.
Common Beginner Misconceptions
Mistake 1: Thinking backpropagation is one formula
→ It is a process, not a single equation.
Mistake 2: Confusing gradients with updates
→ Gradients describe change; learning rate applies it.
Mistake 3: Thinking frameworks do something different
→ They do exactly this—just faster and in bulk.
What We Have Achieved So Far
At this point, you now understand:
- Forward propagation
- Loss functions
- Gradients
- Backpropagation for a neuron
- Gradient descent updates
This is the core of all neural networks.
Everything else is scale.
What’s Next in the Series
In Article #8, we will:
- Add activation functions to backpropagation
- Compute ReLU and Sigmoid derivatives
- Backpropagate through a full neuron with activation
- Prepare for multi-neuron layers
This is where the math becomes reusable.
GitHub Code
This article’s code will be added as a standalone, readable example:
👉 [link to your GitHub repository]
Series Progress
Neural Networks From Scratch (Pure Python)
✔ Article #1 — What a Neuron Really Computes
✔ Article #2 — Activation Functions
✔ Article #3 — Building a Layer
✔ Article #4 — Forward Propagation
✔ Article #5 — Loss Functions
✔ Article #6 — Gradients Explained
✔ Article #7 — Backpropagation Step by Step
➡ Article #8 — Backpropagation with Activation Functions