SolveWithPython

Activation Functions in Neural Networks — Why a Network Without Them Cannot Learn

In the previous article, we built a real artificial neuron in pure Python.
It took inputs, applied weights, added a bias, and produced an output.

At that point, we had something important—but also something fundamentally limited.

A network made only of those neurons cannot learn complex patterns, no matter how many layers you stack.

This article explains why, and introduces the single idea that turns linear math into learning:
activation functions.

The Hidden Problem With Linear Neurons

Recall the neuron we built: $z = (x_1 \cdot w_1) + (x_2 \cdot w_2) + \dots + b$

This is a linear function.

Now here is the key insight:

A stack of linear functions is still just a linear function.

That means:

1 layer → linear
10 layers → still linear
1,000 layers → still linear

No matter how deep the network is, it cannot model non-linear relationships.

This is why a neural network without activation functions is mathematically pointless.

A Simple Proof (Intuition, Not Formal Math)

Suppose we have two layers:

Layer 1: $z_1 = W_1 x + b_1$

Layer 2: $z_2 = W_2 z_1 + b_2$

Substitute the first into the second: $z_2 = W_2 (W_1 x + b_1) + b_2$

Which simplifies to: $z_2 = (W_2 W_1) x + (W_2 b_1 + b_2)$

That is still a single linear transformation.

Depth alone does nothing.

What Activation Functions Do

Activation functions introduce non-linearity.

Instead of outputting z directly, a neuron outputs: $a = f(z)$

Where f is a non-linear function.

This one change allows neural networks to:

Bend decision boundaries
Learn curves, shapes, and patterns
Approximate complex functions

Without activation functions, neural networks collapse into linear regression.

The Two Most Important Activation Functions

We will start with the two that matter most conceptually.

1. ReLU (Rectified Linear Unit)

Definition: $\text{ReLU}(z) = \max(0, z)$ ReLU(z)=max(0,z)

Interpretation:

Negative values → 0
Positive values → unchanged

Why it works well:

Simple
Efficient
Avoids saturation for positive values
Dominates modern deep learning

Implementing ReLU in Python

def relu(z):
    return max(0.0, z)

Example:

print(relu(-3.0))  # 0.0
print(relu(2.5))   # 2.5

2. Sigmoid

Definition: $\sigma(z) = \frac{1}{1 + e^{-z}}$

Interpretation:

Maps values to the range (0, 1)
Can be interpreted as probability

When it’s used:

Binary classification
Output layers

Implementing Sigmoid in Python

import math
def sigmoid(z):
    return 1 / (1 + math.exp(-z))

Example:

print(sigmoid(-2.0))  # ~0.12
print(sigmoid(0.0))   # 0.5
print(sigmoid(2.0))   # ~0.88

Adding Activation to Our Neuron

Let’s extend the neuron from Article #1.

def neuron(inputs, weights, bias, activation):
    total = 0.0
    for x, w in zip(inputs, weights):
        total += x * w
    total += bias
    return activation(total)

Now the neuron is no longer purely linear.

Example: Neuron With ReLU

inputs = [2.0, 3.0]
weights = [0.5, -1.0]
bias = 1.0
output = neuron(inputs, weights, bias, relu)
print(output)

Previously, the raw output was -1.0.

After ReLU:

relu(-1.0) → 0.0

This single decision changes how information flows through the network.

Why Non-Linearity Enables Learning

With activation functions:

Different neurons activate for different regions of input space
Layers can progressively reshape the data
Decision boundaries become curved instead of straight

This is what allows neural networks to solve problems like:

XOR
Image recognition
Speech
Language

Without activation functions, none of that is possible.

Common Beginner Mistakes

Mistake 1: Using no activation at all
→ The network becomes linear regression.

Mistake 2: Using sigmoid everywhere
→ Gradients vanish in deep networks.

Mistake 3: Thinking activation is optional
→ It is the core of neural networks.

What We Have Built So Far

We now have:

A neuron
Weights and bias
A non-linear activation function
A neuron capable of expressing complex behavior

But we still have a limitation:

Our neuron works alone.

Neural networks gain power when neurons work in groups.

What’s Next in the Series

In Article #3, we will:

Combine neurons into a layer
Implement a dense (fully connected) layer in pure Python
Understand how data flows through multiple neurons
Prepare for full forward propagation

This is where a neural network truly begins.

GitHub Code

All code for this article is available here:

👉 https://github.com/Benard-Kemp/Activation-Functions-in-Neural-Networks

Each article adds exactly one new concept and one new file.

Series Progress

You are reading:

Neural Networks From Scratch (Pure Python)
✔ Article #1 — What a Neuron Really Computes
✔ Article #2 — Activation Functions
➡ Article #3 — Building a Layer From Neurons