SolveWithPython

Loss Functions in Neural Networks — Measuring How Wrong the Network Is

So far, we have built a complete neural network that can:

  • Accept inputs
  • Pass them through multiple layers
  • Apply non-linear activation functions
  • Produce an output

At this point, the network works.

But it does not learn.

To learn, the network must answer one critical question:

How wrong is this prediction?

The mechanism that answers this question is called a loss function.

This article introduces loss functions from first principles and implements them in pure Python.

What Is a Loss Function?

A loss function is a mathematical function that measures the difference between:

  • The network’s prediction
  • The true (expected) value

It produces a single number:

  • Low loss → good prediction
  • High loss → bad prediction

Learning is simply the process of reducing this number over time.

Why Loss Functions Matter

Without a loss function:

  • The network has no feedback
  • There is no notion of “better” or “worse”
  • We cannot adjust weights meaningfully

Loss functions convert prediction quality into a numeric signal that optimization algorithms can act on.

Prediction vs Target

Let’s establish terminology:

  • Prediction (y_pred): output of the network
  • Target (y_true): correct value

A loss function compares these two.

Two Common Loss Functions We Will Use

We will implement the two most important loss functions:

  1. Mean Squared Error (MSE) — for regression
  2. Binary Cross-Entropy — for binary classification

1. Mean Squared Error (MSE)

Definition

For a single prediction:MSE=(ytrueypred)2\text{MSE} = (y_{\text{true}} – y_{\text{pred}})^2

For multiple predictions, we average the squared errors.

Intuition

  • Penalizes large errors more than small ones
  • Smooth and easy to optimize
  • Common in regression problems

Implementing MSE in Python

Python
def mean_squared_error(y_true, y_pred):
return (y_true - y_pred) ** 2

Example

Python
y_true = 3.0
y_pred = 2.5
loss = mean_squared_error(y_true, y_pred)
print(loss)

Output:

0.25

The prediction is close, so the loss is small.

2. Binary Cross-Entropy Loss

Used when:

  • Output represents a probability
  • Target is 0 or 1
  • Final activation is typically sigmoid

Definition

Loss=(ylog(p)+(1y)log(1p))\text{Loss} = – \left( y \log(p) + (1 – y)\log(1 – p) \right)

Where:

  • yyy is the true label (0 or 1)
  • ppp is the predicted probability

Why This Formula Works

  • Confident wrong predictions are punished heavily
  • Confident correct predictions are rewarded
  • Encourages calibrated probabilities

Implementing Binary Cross-Entropy in Python

Python
import math
def binary_cross_entropy(y_true, y_pred, epsilon=1e-9):
y_pred = min(max(y_pred, epsilon), 1 - epsilon)
return -(
y_true * math.log(y_pred) +
(1 - y_true) * math.log(1 - y_pred)
)

The epsilon prevents numerical issues with log(0).

Example

Python
y_true = 1
y_pred = 0.9
loss = binary_cross_entropy(y_true, y_pred)
print(loss)

Output:

~0.105

A confident and correct prediction yields low loss.

Loss Functions Are Just Functions

This is a key insight:

A loss function is just a mathematical function applied after forward propagation.

Nothing magical happens here.

The network:

  1. Produces an output
  2. Compares it to the target
  3. Produces a scalar loss

Where Loss Fits in the Pipeline

At this point, the full pipeline looks like this:

  1. Forward propagation → prediction
  2. Loss function → error measurement

What we still cannot do is improve the network.

For that, we need gradients.

Common Beginner Mistakes

Mistake 1: Using the wrong loss function
→ MSE for classification, cross-entropy for regression.

Mistake 2: Forgetting numerical stability
log(0) causes crashes.

Mistake 3: Thinking loss updates weights
→ Loss only measures error — it does not fix it.

What We Have Built So Far

At this point, we have:

  • A neural network that produces predictions
  • A loss function that measures error

This is enough to evaluate the network.

But not enough to train it.

Training requires one final mechanism:

Understanding how each weight affects the loss.

That mechanism is backpropagation.

What’s Next in the Series

In Article #6, we will:

  • Introduce gradients intuitively
  • Explain why derivatives matter
  • Compute gradients by hand for a simple neuron
  • Prepare for full backpropagation

This is where learning truly begins.

GitHub Code

All loss functions introduced here are included in the repository:

👉 [link to your GitHub repository]

Each article adds one clean, isolated component.

Series Progress

You are reading:

Neural Networks From Scratch (Pure Python)
✔ Article #1 — What a Neuron Really Computes
✔ Article #2 — Activation Functions
✔ Article #3 — Building a Layer
✔ Article #4 — Forward Propagation
✔ Article #5 — Loss Functions
➡ Article #6 — Gradients and Backpropagation (Next)