So far, we have built a complete neural network that can:
- Accept inputs
- Pass them through multiple layers
- Apply non-linear activation functions
- Produce an output
At this point, the network works.
But it does not learn.
To learn, the network must answer one critical question:
How wrong is this prediction?
The mechanism that answers this question is called a loss function.
This article introduces loss functions from first principles and implements them in pure Python.
What Is a Loss Function?
A loss function is a mathematical function that measures the difference between:
- The network’s prediction
- The true (expected) value
It produces a single number:
- Low loss → good prediction
- High loss → bad prediction
Learning is simply the process of reducing this number over time.
Why Loss Functions Matter
Without a loss function:
- The network has no feedback
- There is no notion of “better” or “worse”
- We cannot adjust weights meaningfully
Loss functions convert prediction quality into a numeric signal that optimization algorithms can act on.
Prediction vs Target
Let’s establish terminology:
- Prediction (
y_pred): output of the network - Target (
y_true): correct value
A loss function compares these two.
Two Common Loss Functions We Will Use
We will implement the two most important loss functions:
- Mean Squared Error (MSE) — for regression
- Binary Cross-Entropy — for binary classification
1. Mean Squared Error (MSE)
Definition
For a single prediction:
For multiple predictions, we average the squared errors.
Intuition
- Penalizes large errors more than small ones
- Smooth and easy to optimize
- Common in regression problems
Implementing MSE in Python
def mean_squared_error(y_true, y_pred): return (y_true - y_pred) ** 2
Example
y_true = 3.0y_pred = 2.5loss = mean_squared_error(y_true, y_pred)print(loss)
Output:
0.25
The prediction is close, so the loss is small.
2. Binary Cross-Entropy Loss
Used when:
- Output represents a probability
- Target is 0 or 1
- Final activation is typically sigmoid
Definition
Where:
- y is the true label (0 or 1)
- p is the predicted probability
Why This Formula Works
- Confident wrong predictions are punished heavily
- Confident correct predictions are rewarded
- Encourages calibrated probabilities
Implementing Binary Cross-Entropy in Python
import mathdef binary_cross_entropy(y_true, y_pred, epsilon=1e-9): y_pred = min(max(y_pred, epsilon), 1 - epsilon) return -( y_true * math.log(y_pred) + (1 - y_true) * math.log(1 - y_pred) )
The epsilon prevents numerical issues with log(0).
Example
y_true = 1y_pred = 0.9loss = binary_cross_entropy(y_true, y_pred)print(loss)
Output:
~0.105
A confident and correct prediction yields low loss.
Loss Functions Are Just Functions
This is a key insight:
A loss function is just a mathematical function applied after forward propagation.
Nothing magical happens here.
The network:
- Produces an output
- Compares it to the target
- Produces a scalar loss
Where Loss Fits in the Pipeline
At this point, the full pipeline looks like this:
- Forward propagation → prediction
- Loss function → error measurement
What we still cannot do is improve the network.
For that, we need gradients.
Common Beginner Mistakes
Mistake 1: Using the wrong loss function
→ MSE for classification, cross-entropy for regression.
Mistake 2: Forgetting numerical stability
→ log(0) causes crashes.
Mistake 3: Thinking loss updates weights
→ Loss only measures error — it does not fix it.
What We Have Built So Far
At this point, we have:
- A neural network that produces predictions
- A loss function that measures error
This is enough to evaluate the network.
But not enough to train it.
Training requires one final mechanism:
Understanding how each weight affects the loss.
That mechanism is backpropagation.
What’s Next in the Series
In Article #6, we will:
- Introduce gradients intuitively
- Explain why derivatives matter
- Compute gradients by hand for a simple neuron
- Prepare for full backpropagation
This is where learning truly begins.
GitHub Code
All loss functions introduced here are included in the repository:
👉 [link to your GitHub repository]
Each article adds one clean, isolated component.
Series Progress
You are reading:
Neural Networks From Scratch (Pure Python)
✔ Article #1 — What a Neuron Really Computes
✔ Article #2 — Activation Functions
✔ Article #3 — Building a Layer
✔ Article #4 — Forward Propagation
✔ Article #5 — Loss Functions
➡ Article #6 — Gradients and Backpropagation (Next)