Up to this point, we have built a complete neural network pipeline:
- Inputs flow forward through layers
- The network produces a prediction
- A loss function measures how wrong that prediction is
At this stage, the network can evaluate itself.
But it still cannot improve.
To improve, the network must answer a deeper question:
Which weights caused the error, and by how much?
The tool that answers this question is the gradient.
This article introduces gradients from first principles and explains why derivatives are the engine of learning in neural networks.
The Core Problem of Learning
Suppose your network produces a loss of 0.25.
That number tells you:
- The prediction is not perfect
But it does not tell you:
- Which weight caused the error
- Whether to increase or decrease a weight
- How much to change it
Loss alone is just a measurement.
To learn, we need direction.
What Is a Gradient?
A gradient tells us:
How much the loss changes when a parameter changes.
In simpler terms:
- If I nudge this weight slightly…
- Will the loss go up or down?
- And how fast?
Mathematically, this is a derivative.
One Weight, One Question
Let’s simplify the problem to its smallest form.
Imagine a neuron with:
- One input
- One weight
- One bias
- One loss value
The learning question becomes:
How does the loss change if I change the weight?
This is written as:
This is the gradient of the loss with respect to the weight.
Why Derivatives Matter
Derivatives give us two critical pieces of information:
- Direction
- Positive gradient → increasing weight increases loss
- Negative gradient → increasing weight decreases loss
- Sensitivity
- Large gradient → small changes matter a lot
- Small gradient → changes barely matter
Learning is simply moving weights in the direction that reduces loss.
A Concrete Example (No Neural Network Yet)
Let’s step away from networks for a moment.
Consider this simple function:
This function has:
- A minimum at
w = 3 - Higher values as you move away from 3
Its derivative is:
What the Derivative Tells Us
- If
w = 5, derivative =+4→ decreasew - If
w = 1, derivative =-4→ increasew - If
w = 3, derivative =0→ stop
This is exactly how neural networks learn — just with more variables.
Gradients in a Neuron
Now let’s return to our neural network.
Recall a simple neuron:
With an activation function:
And a loss function:
The loss depends on the weight indirectly.
To compute the gradient, we apply the chain rule.
The Chain Rule (Conceptual, Not Formal)
The chain rule tells us:
If A affects B, and B affects C, then A affects C.
In neural networks:
- Weight affects
z zaffects activationaaaffects lossL
So:
This is the backbone of backpropagation.
We will compute each part explicitly in the next article.
A First Gradient Calculation (Manual)
Let’s compute one part right now.
From:
The derivative with respect to w is:
This means:
- The input directly scales how much the weight matters
- Larger inputs → larger gradients
This is not an accident — it is fundamental.
Why Gradients Are Computed Backward
Notice something important:
- The loss is computed last
- But gradients are needed for the earliest weights
This means:
- We must start at the loss
- And move backward through the network
This is why the algorithm is called backpropagation.
Common Beginner Misconceptions
Mistake 1: Thinking gradients are magic
→ They are just derivatives.
Mistake 2: Thinking gradients update weights
→ Gradients only describe change. Updates come later.
Mistake 3: Fearing calculus
→ You only need simple derivatives, applied systematically.
What We Have Achieved So Far
At this point, we now understand:
- Why loss alone is insufficient
- Why derivatives are necessary
- What a gradient represents
- How gradients relate to learning
We are now ready to compute gradients end to end.
What’s Next in the Series
In Article #7, we will:
- Compute gradients for a full neuron
- Differentiate the loss function
- Differentiate activation functions (ReLU, Sigmoid)
- Combine everything using the chain rule
This will be our first true step into backpropagation.
GitHub Code
In the next article, we will begin adding explicit gradient calculations to the repository.
👉 [link to your GitHub repository]
Series Progress
You are reading:
Neural Networks From Scratch (Pure Python)
✔ Article #1 — What a Neuron Really Computes
✔ Article #2 — Activation Functions
✔ Article #3 — Building a Layer
✔ Article #4 — Forward Propagation
✔ Article #5 — Loss Functions
✔ Article #6 — Gradients Explained
➡ Article #7 — Backpropagation Step by Step