SolveWithPython

Training a Neural Network End to End — The Complete Learning Loop in Python

In the previous articles, we built every component of a neural network:

Neurons and layers
Activation functions
Forward propagation
Loss functions
Gradients
Backpropagation through neurons and layers

Up to now, these pieces existed mostly in isolation.

In this article, we assemble them into the final structure:

A full training loop that makes a neural network learn.

This is the moment where everything finally clicks.

What “Training” Really Means

Training a neural network is not mysterious.

It is simply this cycle, repeated many times:

Forward propagation
Loss calculation
Backpropagation
Parameter update

Each repetition is called an epoch.

Learning happens because small improvements accumulate.

The Minimal Network We Will Train

To keep things clear, we will train:

A small fully connected network
One hidden layer (ReLU)
One output neuron (linear)
Mean Squared Error loss

This is enough to demonstrate real learning.

Step 1: Define the Building Blocks

Activation Functions

def relu(z):
    return max(0.0, z)
def relu_derivative(z):
    return 1.0 if z > 0 else 0.0

Loss Function (MSE)

def mean_squared_error(y_true, y_pred):
    return (y_true - y_pred) ** 2
def mse_derivative(y_true, y_pred):
    return 2 * (y_pred - y_true)

Step 2: Forward Pass for a Dense Layer

def dense_forward(inputs, weights_list, bias_list, activation):
    z_list = []
    a_list = []
    for weights, bias in zip(weights_list, bias_list):
        z = sum(x * w for x, w in zip(inputs, weights)) + bias
        a = activation(z)
        z_list.append(z)
        a_list.append(a)
    return a_list, z_list

Step 3: Backward Pass for a Dense Layer

def dense_backward(inputs, z_list, dL_da_list, weights_list, activation_derivative):
    dL_dw = []
    dL_db = []
    dL_dx = [0.0 for _ in inputs]
    for i in range(len(weights_list)):
        da_dz = activation_derivative(z_list[i])
        dL_dz = dL_da_list[i] * da_dz
        neuron_dw = []
        for j in range(len(inputs)):
            neuron_dw.append(dL_dz * inputs[j])
            dL_dx[j] += dL_dz * weights_list[i][j]
        dL_dw.append(neuron_dw)
        dL_db.append(dL_dz)
    return dL_dw, dL_db, dL_dx

Step 4: Parameter Update

def update_layer(weights_list, bias_list, dL_dw, dL_db, learning_rate):
    for i in range(len(weights_list)):
        for j in range(len(weights_list[i])):
            weights_list[i][j] -= learning_rate * dL_dw[i][j]
        bias_list[i] -= learning_rate * dL_db[i]

Step 5: Initialize a Small Network

import random
random.seed(0)
# Network structure: 2 → 3 → 1
weights_hidden = [[random.uniform(-1, 1) for _ in range(2)] for _ in range(3)]
bias_hidden = [0.0, 0.0, 0.0]
weights_output = [[random.uniform(-1, 1) for _ in range(3)]]
bias_output = [0.0]

Step 6: Training Data

We’ll use a simple regression problem:

data = [
    ([1.0, 2.0], 4.0),
    ([2.0, 1.0], 3.0),
    ([3.0, 1.0], 5.0),
    ([1.0, 3.0], 5.0)
]

Step 7: The Training Loop

This is the heart of learning.

learning_rate = 0.01
epochs = 1000
for epoch in range(epochs):
    total_loss = 0.0
    for inputs, target in data:
        # Forward pass
        hidden_output, hidden_z = dense_forward(
            inputs, weights_hidden, bias_hidden, relu
        )
        output, output_z = dense_forward(
            hidden_output, weights_output, bias_output, lambda z: z
        )
        prediction = output[0]
        loss = mean_squared_error(target, prediction)
        total_loss += loss
        # Backward pass (output layer)
        dL_dy = mse_derivative(target, prediction)
        dL_da_output = [dL_dy]
        dW_out, dB_out, dL_dhidden = dense_backward(
            hidden_output, output_z, dL_da_output, weights_output, lambda z: 1.0
        )
        # Backward pass (hidden layer)
        dW_hidden, dB_hidden, _ = dense_backward(
            inputs, hidden_z, dL_dhidden, weights_hidden, relu_derivative
        )
        # Update parameters
        update_layer(weights_output, bias_output, dW_out, dB_out, learning_rate)
        update_layer(weights_hidden, bias_hidden, dW_hidden, dB_hidden, learning_rate)
    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Loss: {total_loss:.4f}")

Watching Learning Happen

As training progresses, you should see output like:

Epoch 0, Loss: 6.23
Epoch 100, Loss: 0.91
Epoch 500, Loss: 0.12
Epoch 900, Loss: 0.03

This is real learning.

No frameworks.
No shortcuts.
Just math and Python.

What This Loop Actually Did

Each epoch:

Ran predictions
Measured error
Computed gradients
Updated weights
Repeated

Neural networks learn gradually, not instantly.

Why This Matters

At this point, you now understand:

Exactly what .fit() does internally
Why learning rates matter
Why gradients can explode or vanish
How architectures actually learn
Why debugging neural networks is possible

Nothing in deep learning is hidden anymore.

Common Beginner Mistakes at This Stage

Mistake 1: Expecting instant convergence
→ Learning is incremental.

Mistake 2: Using a large learning rate
→ Leads to instability.

Mistake 3: Blaming the math
→ Most issues are implementation details.

What We Have Completed

You have now built:

A neural network from scratch
A working training loop
A real learning system in pure Python

This completes Part I of the series.

What Comes Next

In Part II, we will explore:

Improving performance
Vectorization with NumPy
Debugging training
Overfitting and regularization
Visualizing learning
Mapping everything to PyTorch

But the foundation is now solid.

GitHub Repository

All code from Articles #1–#10 lives here:

👉 https://github.com/Benard-Kemp/Training-a-Neural-Network-End-to-End

Each article builds incrementally on the last.

Series Recap

Neural Networks From Scratch (Pure Python)
✔ Articles #1–#10 — Core Foundations Complete

If you can follow this series, you already understand neural networks at a deeper level than most users of deep learning frameworks.