In the previous article, we built a real artificial neuron in pure Python.
It took inputs, applied weights, added a bias, and produced an output.
At that point, we had something important—but also something fundamentally limited.
A network made only of those neurons cannot learn complex patterns, no matter how many layers you stack.
This article explains why, and introduces the single idea that turns linear math into learning:
activation functions.
The Hidden Problem With Linear Neurons
Recall the neuron we built:
This is a linear function.
Now here is the key insight:
A stack of linear functions is still just a linear function.
That means:
- 1 layer → linear
- 10 layers → still linear
- 1,000 layers → still linear
No matter how deep the network is, it cannot model non-linear relationships.
This is why a neural network without activation functions is mathematically pointless.
A Simple Proof (Intuition, Not Formal Math)
Suppose we have two layers:
Layer 1:
Layer 2:
Substitute the first into the second:
Which simplifies to:
That is still a single linear transformation.
Depth alone does nothing.
What Activation Functions Do
Activation functions introduce non-linearity.
Instead of outputting z directly, a neuron outputs:
Where f is a non-linear function.
This one change allows neural networks to:
- Bend decision boundaries
- Learn curves, shapes, and patterns
- Approximate complex functions
Without activation functions, neural networks collapse into linear regression.
The Two Most Important Activation Functions
We will start with the two that matter most conceptually.
1. ReLU (Rectified Linear Unit)
Definition:ReLU(z)=max(0,z)
Interpretation:
- Negative values → 0
- Positive values → unchanged
Why it works well:
- Simple
- Efficient
- Avoids saturation for positive values
- Dominates modern deep learning
Implementing ReLU in Python
def relu(z): return max(0.0, z)
Example:
print(relu(-3.0)) # 0.0print(relu(2.5)) # 2.5
2. Sigmoid
Definition:
Interpretation:
- Maps values to the range (0, 1)
- Can be interpreted as probability
When it’s used:
- Binary classification
- Output layers
Implementing Sigmoid in Python
import mathdef sigmoid(z): return 1 / (1 + math.exp(-z))
Example:
print(sigmoid(-2.0)) # ~0.12print(sigmoid(0.0)) # 0.5print(sigmoid(2.0)) # ~0.88
Adding Activation to Our Neuron
Let’s extend the neuron from Article #1.
def neuron(inputs, weights, bias, activation): total = 0.0 for x, w in zip(inputs, weights): total += x * w total += bias return activation(total)
Now the neuron is no longer purely linear.
Example: Neuron With ReLU
inputs = [2.0, 3.0]weights = [0.5, -1.0]bias = 1.0output = neuron(inputs, weights, bias, relu)print(output)
Previously, the raw output was -1.0.
After ReLU:
relu(-1.0)→0.0
This single decision changes how information flows through the network.
Why Non-Linearity Enables Learning
With activation functions:
- Different neurons activate for different regions of input space
- Layers can progressively reshape the data
- Decision boundaries become curved instead of straight
This is what allows neural networks to solve problems like:
- XOR
- Image recognition
- Speech
- Language
Without activation functions, none of that is possible.
Common Beginner Mistakes
Mistake 1: Using no activation at all
→ The network becomes linear regression.
Mistake 2: Using sigmoid everywhere
→ Gradients vanish in deep networks.
Mistake 3: Thinking activation is optional
→ It is the core of neural networks.
What We Have Built So Far
We now have:
- A neuron
- Weights and bias
- A non-linear activation function
- A neuron capable of expressing complex behavior
But we still have a limitation:
Our neuron works alone.
Neural networks gain power when neurons work in groups.
What’s Next in the Series
In Article #3, we will:
- Combine neurons into a layer
- Implement a dense (fully connected) layer in pure Python
- Understand how data flows through multiple neurons
- Prepare for full forward propagation
This is where a neural network truly begins.
GitHub Code
All code for this article is available here:
👉 https://github.com/Benard-Kemp/Activation-Functions-in-Neural-Networks
Each article adds exactly one new concept and one new file.
Series Progress
You are reading:
Neural Networks From Scratch (Pure Python)
✔ Article #1 — What a Neuron Really Computes
✔ Article #2 — Activation Functions
➡ Article #3 — Building a Layer From Neurons