SolveWithPython

Reading Loss Curves and Detecting Overfitting — When Learning Goes Wrong

Up to now, we have focused on how to train a neural network:

  • forward propagation
  • backpropagation
  • vectorization
  • batch and mini-batch training

At this point, your network runs fast and learns.

But there is a new problem you must learn to recognize:

Sometimes a model is learning — and still getting worse.

This article teaches you how to see that happening.

Why Training Loss Alone Is Not Enough

Most beginner tutorials celebrate this moment:

Epoch 0 Loss = 4.82
Epoch 200 Loss = 0.12
Epoch 500 Loss = 0.01

Lower loss looks good.

But here is the catch:

A model can achieve very low training loss and still perform terribly on new data.

This is called overfitting.

The Core Idea: Train vs Validation

To detect overfitting, we must split our data.

Training Set

  • Used to update weights
  • The model learns from this data

Validation Set

  • Never used for updates
  • Only used to evaluate performance

If these two behave differently, something is wrong.

Step 1: Creating a Train / Validation Split

X_train = X[:80]
y_train = y[:80]
X_val = X[80:]
y_val = y[80:]

The exact ratio doesn’t matter at first.
What matters is separation.

Step 2: Tracking Loss Over Time

Instead of printing one loss value, we store them.

Python
train_losses = []
val_losses = []

At each epoch:

Python
train_losses.append(train_loss)
val_losses.append(val_loss)

This allows us to see learning behavior.

Step 3: Plotting Loss Curves

Python
import matplotlib.pyplot as plt
plt.plot(train_losses, label="Training Loss")
plt.plot(val_losses, label="Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()
plt.show()

This plot is one of the most important debugging tools in machine learning.

How to Read Loss Curves (Very Carefully)

Case 1: Healthy Learning

Training Loss ↓
Validation Loss ↓

This is ideal.
The model is learning and generalizing.

Case 2: Overfitting

Training Loss
Validation Loss

The model is memorizing the training data.

This is the most common failure mode.

Case 3: Underfitting

Training Loss flat
Validation Loss flat

The model is too simple or not trained enough.

Why Overfitting Happens

Overfitting occurs when:

  • The model has too many parameters
  • The dataset is small
  • Training runs too long
  • Noise is learned as signal

In other words:

The model becomes too specialized.

A Simple Overfitting Example

Imagine a model that learns:

“If input = exactly this pattern, output = correct”

Instead of:

“If input is similar to this pattern, output = correct”

The first fails in the real world.

Why Neural Networks Are Especially Prone to Overfitting

Neural networks:

  • are highly expressive
  • can memorize arbitrary patterns
  • do not know what “generalization” means

They only minimize loss.

If minimizing loss means memorizing — they will.

Early Warning Signs of Overfitting

Watch for:

  • Validation loss increasing while training loss decreases
  • Validation accuracy stagnating
  • Large gap between training and validation metrics

If you see these, stop training.

The Simplest Defense: Early Stopping

Early stopping means:

Stop training when validation loss stops improving.

Example:

Python
if val_loss > previous_val_loss:
stop_training = True

This is often enough to prevent severe overfitting.

What We Are Not Doing Yet (On Purpose)

We are not yet using:

  • regularization
  • dropout
  • weight decay

Those come next.

First, you must be able to recognize the problem visually.

Common Beginner Mistakes Here

Mistake 1: Trusting training loss only
→ Always track validation loss.

Mistake 2: Training “just a bit longer”
→ Often makes things worse.

Mistake 3: Assuming more data isn’t needed
→ Often, it is.

What You Have Learned in This Article

You can now:

  • split data properly
  • track training vs validation loss
  • interpret loss curves
  • detect overfitting and underfitting
  • know when training should stop

This is the beginning of model diagnosis.

What’s Next in the Series

In Article #15, we will introduce:

  • L2 regularization (weight decay)
  • Why it reduces overfitting
  • How it modifies the loss function
  • How to implement it from scratch

This will be your first active defense against overfitting.

Series Status

  • Part I — Foundations ✔
  • Part II — Scaling & Diagnostics ▶ In Progress

You now understand not just how to train a neural network —
but how to tell whether training is actually working.