SolveWithPython

Training a Neural Network End to End — The Complete Learning Loop in Python

In the previous articles, we built every component of a neural network:

  • Neurons and layers
  • Activation functions
  • Forward propagation
  • Loss functions
  • Gradients
  • Backpropagation through neurons and layers

Up to now, these pieces existed mostly in isolation.

In this article, we assemble them into the final structure:

A full training loop that makes a neural network learn.

This is the moment where everything finally clicks.

What “Training” Really Means

Training a neural network is not mysterious.

It is simply this cycle, repeated many times:

  1. Forward propagation
  2. Loss calculation
  3. Backpropagation
  4. Parameter update

Each repetition is called an epoch.

Learning happens because small improvements accumulate.

The Minimal Network We Will Train

To keep things clear, we will train:

  • A small fully connected network
  • One hidden layer (ReLU)
  • One output neuron (linear)
  • Mean Squared Error loss

This is enough to demonstrate real learning.

Step 1: Define the Building Blocks

Activation Functions

Python
def relu(z):
return max(0.0, z)
def relu_derivative(z):
return 1.0 if z > 0 else 0.0

Loss Function (MSE)

Python
def mean_squared_error(y_true, y_pred):
return (y_true - y_pred) ** 2
def mse_derivative(y_true, y_pred):
return 2 * (y_pred - y_true)

Step 2: Forward Pass for a Dense Layer

Python
def dense_forward(inputs, weights_list, bias_list, activation):
z_list = []
a_list = []
for weights, bias in zip(weights_list, bias_list):
z = sum(x * w for x, w in zip(inputs, weights)) + bias
a = activation(z)
z_list.append(z)
a_list.append(a)
return a_list, z_list

Step 3: Backward Pass for a Dense Layer

Python
def dense_backward(inputs, z_list, dL_da_list, weights_list, activation_derivative):
dL_dw = []
dL_db = []
dL_dx = [0.0 for _ in inputs]
for i in range(len(weights_list)):
da_dz = activation_derivative(z_list[i])
dL_dz = dL_da_list[i] * da_dz
neuron_dw = []
for j in range(len(inputs)):
neuron_dw.append(dL_dz * inputs[j])
dL_dx[j] += dL_dz * weights_list[i][j]
dL_dw.append(neuron_dw)
dL_db.append(dL_dz)
return dL_dw, dL_db, dL_dx

Step 4: Parameter Update

Python
def update_layer(weights_list, bias_list, dL_dw, dL_db, learning_rate):
for i in range(len(weights_list)):
for j in range(len(weights_list[i])):
weights_list[i][j] -= learning_rate * dL_dw[i][j]
bias_list[i] -= learning_rate * dL_db[i]

Step 5: Initialize a Small Network

Python
import random
random.seed(0)
# Network structure: 2 → 3 → 1
weights_hidden = [[random.uniform(-1, 1) for _ in range(2)] for _ in range(3)]
bias_hidden = [0.0, 0.0, 0.0]
weights_output = [[random.uniform(-1, 1) for _ in range(3)]]
bias_output = [0.0]

Step 6: Training Data

We’ll use a simple regression problem:

Python
data = [
([1.0, 2.0], 4.0),
([2.0, 1.0], 3.0),
([3.0, 1.0], 5.0),
([1.0, 3.0], 5.0)
]

Step 7: The Training Loop

This is the heart of learning.

Python
learning_rate = 0.01
epochs = 1000
for epoch in range(epochs):
total_loss = 0.0
for inputs, target in data:
# Forward pass
hidden_output, hidden_z = dense_forward(
inputs, weights_hidden, bias_hidden, relu
)
output, output_z = dense_forward(
hidden_output, weights_output, bias_output, lambda z: z
)
prediction = output[0]
loss = mean_squared_error(target, prediction)
total_loss += loss
# Backward pass (output layer)
dL_dy = mse_derivative(target, prediction)
dL_da_output = [dL_dy]
dW_out, dB_out, dL_dhidden = dense_backward(
hidden_output, output_z, dL_da_output, weights_output, lambda z: 1.0
)
# Backward pass (hidden layer)
dW_hidden, dB_hidden, _ = dense_backward(
inputs, hidden_z, dL_dhidden, weights_hidden, relu_derivative
)
# Update parameters
update_layer(weights_output, bias_output, dW_out, dB_out, learning_rate)
update_layer(weights_hidden, bias_hidden, dW_hidden, dB_hidden, learning_rate)
if epoch % 100 == 0:
print(f"Epoch {epoch}, Loss: {total_loss:.4f}")

Watching Learning Happen

As training progresses, you should see output like:

Python
Epoch 0, Loss: 6.23
Epoch 100, Loss: 0.91
Epoch 500, Loss: 0.12
Epoch 900, Loss: 0.03

This is real learning.

No frameworks.
No shortcuts.
Just math and Python.

What This Loop Actually Did

Each epoch:

  • Ran predictions
  • Measured error
  • Computed gradients
  • Updated weights
  • Repeated

Neural networks learn gradually, not instantly.

Why This Matters

At this point, you now understand:

  • Exactly what .fit() does internally
  • Why learning rates matter
  • Why gradients can explode or vanish
  • How architectures actually learn
  • Why debugging neural networks is possible

Nothing in deep learning is hidden anymore.

Common Beginner Mistakes at This Stage

Mistake 1: Expecting instant convergence
→ Learning is incremental.

Mistake 2: Using a large learning rate
→ Leads to instability.

Mistake 3: Blaming the math
→ Most issues are implementation details.

What We Have Completed

You have now built:

  • A neural network from scratch
  • A working training loop
  • A real learning system in pure Python

This completes Part I of the series.

What Comes Next

In Part II, we will explore:

  • Improving performance
  • Vectorization with NumPy
  • Debugging training
  • Overfitting and regularization
  • Visualizing learning
  • Mapping everything to PyTorch

But the foundation is now solid.

GitHub Repository

All code from Articles #1–#10 lives here:

👉 [link to your GitHub repository]

Each article builds incrementally on the last.

Series Recap

Neural Networks From Scratch (Pure Python)
✔ Articles #1–#10 — Core Foundations Complete

If you can follow this series, you already understand neural networks at a deeper level than most users of deep learning frameworks.