SolveWithPython

Loss Functions in Neural Networks — Measuring How Wrong the Network Is

So far, we have built a complete neural network that can:

Accept inputs
Pass them through multiple layers
Apply non-linear activation functions
Produce an output

At this point, the network works.

But it does not learn.

To learn, the network must answer one critical question:

How wrong is this prediction?

The mechanism that answers this question is called a loss function.

This article introduces loss functions from first principles and implements them in pure Python.

What Is a Loss Function?

A loss function is a mathematical function that measures the difference between:

The network’s prediction
The true (expected) value

It produces a single number:

Low loss → good prediction
High loss → bad prediction

Learning is simply the process of reducing this number over time.

Why Loss Functions Matter

Without a loss function:

The network has no feedback
There is no notion of “better” or “worse”
We cannot adjust weights meaningfully

Loss functions convert prediction quality into a numeric signal that optimization algorithms can act on.

Prediction vs Target

Let’s establish terminology:

Prediction (y_pred): output of the network
Target (y_true): correct value

A loss function compares these two.

Two Common Loss Functions We Will Use

We will implement the two most important loss functions:

Mean Squared Error (MSE) — for regression
Binary Cross-Entropy — for binary classification

1. Mean Squared Error (MSE)

Definition

For a single prediction: $\text{MSE} = (y_{\text{true}} – y_{\text{pred}})^2$

For multiple predictions, we average the squared errors.

Intuition

Penalizes large errors more than small ones
Smooth and easy to optimize
Common in regression problems

Implementing MSE in Python

def mean_squared_error(y_true, y_pred):
    return (y_true - y_pred) ** 2

Example

y_true = 3.0
y_pred = 2.5
loss = mean_squared_error(y_true, y_pred)
print(loss)

Output:

0.25

The prediction is close, so the loss is small.

2. Binary Cross-Entropy Loss

Used when:

Output represents a probability
Target is 0 or 1
Final activation is typically sigmoid

Definition

$\text{Loss} = – \left( y \log(p) + (1 – y)\log(1 – p) \right)$

Where:

$y$ y is the true label (0 or 1)
$p$ p is the predicted probability

Why This Formula Works

Confident wrong predictions are punished heavily
Confident correct predictions are rewarded
Encourages calibrated probabilities

Implementing Binary Cross-Entropy in Python

import math
def binary_cross_entropy(y_true, y_pred, epsilon=1e-9):
    y_pred = min(max(y_pred, epsilon), 1 - epsilon)
    return -(
        y_true * math.log(y_pred) +
        (1 - y_true) * math.log(1 - y_pred)
    )

The epsilon prevents numerical issues with log(0).

Example

y_true = 1
y_pred = 0.9
loss = binary_cross_entropy(y_true, y_pred)
print(loss)

Output:

~0.105

A confident and correct prediction yields low loss.

Loss Functions Are Just Functions

This is a key insight:

A loss function is just a mathematical function applied after forward propagation.

Nothing magical happens here.

The network:

Produces an output
Compares it to the target
Produces a scalar loss

Where Loss Fits in the Pipeline

At this point, the full pipeline looks like this:

Forward propagation → prediction
Loss function → error measurement

What we still cannot do is improve the network.

For that, we need gradients.

Common Beginner Mistakes

Mistake 1: Using the wrong loss function
→ MSE for classification, cross-entropy for regression.

Mistake 2: Forgetting numerical stability
→ log(0) causes crashes.

Mistake 3: Thinking loss updates weights
→ Loss only measures error — it does not fix it.

What We Have Built So Far

At this point, we have:

A neural network that produces predictions
A loss function that measures error

This is enough to evaluate the network.

But not enough to train it.

Training requires one final mechanism:

Understanding how each weight affects the loss.

That mechanism is backpropagation.

What’s Next in the Series

In Article #6, we will:

Introduce gradients intuitively
Explain why derivatives matter
Compute gradients by hand for a simple neuron
Prepare for full backpropagation

This is where learning truly begins.

GitHub Code

All loss functions introduced here are included in the repository:

👉 https://github.com/Benard-Kemp/Loss-Functions-in-Neural-Networks

Each article adds one clean, isolated component.

Series Progress

You are reading:

Neural Networks From Scratch (Pure Python)
✔ Article #1 — What a Neuron Really Computes
✔ Article #2 — Activation Functions
✔ Article #3 — Building a Layer
✔ Article #4 — Forward Propagation
✔ Article #5 — Loss Functions
➡ Article #6 — Gradients and Backpropagation (Next)