Up to now, we have focused on how to train a neural network:
- forward propagation
- backpropagation
- vectorization
- batch and mini-batch training
At this point, your network runs fast and learns.
But there is a new problem you must learn to recognize:
Sometimes a model is learning — and still getting worse.
This article teaches you how to see that happening.
Why Training Loss Alone Is Not Enough
Most beginner tutorials celebrate this moment:
Epoch 0 → Loss = 4.82Epoch 200 → Loss = 0.12Epoch 500 → Loss = 0.01
Lower loss looks good.
But here is the catch:
A model can achieve very low training loss and still perform terribly on new data.
This is called overfitting.
The Core Idea: Train vs Validation
To detect overfitting, we must split our data.
Training Set
- Used to update weights
- The model learns from this data
Validation Set
- Never used for updates
- Only used to evaluate performance
If these two behave differently, something is wrong.
Step 1: Creating a Train / Validation Split
X_train = X[:80]y_train = y[:80]X_val = X[80:]y_val = y[80:]
The exact ratio doesn’t matter at first.
What matters is separation.
Step 2: Tracking Loss Over Time
Instead of printing one loss value, we store them.
train_losses = []val_losses = []
At each epoch:
train_losses.append(train_loss)val_losses.append(val_loss)
This allows us to see learning behavior.
Step 3: Plotting Loss Curves
import matplotlib.pyplot as pltplt.plot(train_losses, label="Training Loss")plt.plot(val_losses, label="Validation Loss")plt.xlabel("Epoch")plt.ylabel("Loss")plt.legend()plt.show()
This plot is one of the most important debugging tools in machine learning.
How to Read Loss Curves (Very Carefully)
Case 1: Healthy Learning
Training Loss ↓Validation Loss ↓
This is ideal.
The model is learning and generalizing.
Case 2: Overfitting
Training Loss ↓Validation Loss ↑
The model is memorizing the training data.
This is the most common failure mode.
Case 3: Underfitting
Training Loss → flatValidation Loss → flat
The model is too simple or not trained enough.
Why Overfitting Happens
Overfitting occurs when:
- The model has too many parameters
- The dataset is small
- Training runs too long
- Noise is learned as signal
In other words:
The model becomes too specialized.
A Simple Overfitting Example
Imagine a model that learns:
“If input = exactly this pattern, output = correct”
Instead of:
“If input is similar to this pattern, output = correct”
The first fails in the real world.
Why Neural Networks Are Especially Prone to Overfitting
Neural networks:
- are highly expressive
- can memorize arbitrary patterns
- do not know what “generalization” means
They only minimize loss.
If minimizing loss means memorizing — they will.
Early Warning Signs of Overfitting
Watch for:
- Validation loss increasing while training loss decreases
- Validation accuracy stagnating
- Large gap between training and validation metrics
If you see these, stop training.
The Simplest Defense: Early Stopping
Early stopping means:
Stop training when validation loss stops improving.
Example:
if val_loss > previous_val_loss: stop_training = True
This is often enough to prevent severe overfitting.
What We Are Not Doing Yet (On Purpose)
We are not yet using:
- regularization
- dropout
- weight decay
Those come next.
First, you must be able to recognize the problem visually.
Common Beginner Mistakes Here
Mistake 1: Trusting training loss only
→ Always track validation loss.
Mistake 2: Training “just a bit longer”
→ Often makes things worse.
Mistake 3: Assuming more data isn’t needed
→ Often, it is.
What You Have Learned in This Article
You can now:
- split data properly
- track training vs validation loss
- interpret loss curves
- detect overfitting and underfitting
- know when training should stop
This is the beginning of model diagnosis.
What’s Next in the Series
In Article #15, we will introduce:
- L2 regularization (weight decay)
- Why it reduces overfitting
- How it modifies the loss function
- How to implement it from scratch
This will be your first active defense against overfitting.
Series Status
- Part I — Foundations ✔
- Part II — Scaling & Diagnostics ▶ In Progress
You now understand not just how to train a neural network —
but how to tell whether training is actually working.