In the previous articles, we built every component of a neural network:
- Neurons and layers
- Activation functions
- Forward propagation
- Loss functions
- Gradients
- Backpropagation through neurons and layers
Up to now, these pieces existed mostly in isolation.
In this article, we assemble them into the final structure:
A full training loop that makes a neural network learn.
This is the moment where everything finally clicks.
What “Training” Really Means
Training a neural network is not mysterious.
It is simply this cycle, repeated many times:
- Forward propagation
- Loss calculation
- Backpropagation
- Parameter update
Each repetition is called an epoch.
Learning happens because small improvements accumulate.
The Minimal Network We Will Train
To keep things clear, we will train:
- A small fully connected network
- One hidden layer (ReLU)
- One output neuron (linear)
- Mean Squared Error loss
This is enough to demonstrate real learning.
Step 1: Define the Building Blocks
Activation Functions
def relu(z): return max(0.0, z)def relu_derivative(z): return 1.0 if z > 0 else 0.0
Loss Function (MSE)
def mean_squared_error(y_true, y_pred): return (y_true - y_pred) ** 2def mse_derivative(y_true, y_pred): return 2 * (y_pred - y_true)
Step 2: Forward Pass for a Dense Layer
def dense_forward(inputs, weights_list, bias_list, activation): z_list = [] a_list = [] for weights, bias in zip(weights_list, bias_list): z = sum(x * w for x, w in zip(inputs, weights)) + bias a = activation(z) z_list.append(z) a_list.append(a) return a_list, z_list
Step 3: Backward Pass for a Dense Layer
def dense_backward(inputs, z_list, dL_da_list, weights_list, activation_derivative): dL_dw = [] dL_db = [] dL_dx = [0.0 for _ in inputs] for i in range(len(weights_list)): da_dz = activation_derivative(z_list[i]) dL_dz = dL_da_list[i] * da_dz neuron_dw = [] for j in range(len(inputs)): neuron_dw.append(dL_dz * inputs[j]) dL_dx[j] += dL_dz * weights_list[i][j] dL_dw.append(neuron_dw) dL_db.append(dL_dz) return dL_dw, dL_db, dL_dx
Step 4: Parameter Update
def update_layer(weights_list, bias_list, dL_dw, dL_db, learning_rate): for i in range(len(weights_list)): for j in range(len(weights_list[i])): weights_list[i][j] -= learning_rate * dL_dw[i][j] bias_list[i] -= learning_rate * dL_db[i]
Step 5: Initialize a Small Network
import randomrandom.seed(0)# Network structure: 2 → 3 → 1weights_hidden = [[random.uniform(-1, 1) for _ in range(2)] for _ in range(3)]bias_hidden = [0.0, 0.0, 0.0]weights_output = [[random.uniform(-1, 1) for _ in range(3)]]bias_output = [0.0]
Step 6: Training Data
We’ll use a simple regression problem:
data = [ ([1.0, 2.0], 4.0), ([2.0, 1.0], 3.0), ([3.0, 1.0], 5.0), ([1.0, 3.0], 5.0)]
Step 7: The Training Loop
This is the heart of learning.
learning_rate = 0.01epochs = 1000for epoch in range(epochs): total_loss = 0.0 for inputs, target in data: # Forward pass hidden_output, hidden_z = dense_forward( inputs, weights_hidden, bias_hidden, relu ) output, output_z = dense_forward( hidden_output, weights_output, bias_output, lambda z: z ) prediction = output[0] loss = mean_squared_error(target, prediction) total_loss += loss # Backward pass (output layer) dL_dy = mse_derivative(target, prediction) dL_da_output = [dL_dy] dW_out, dB_out, dL_dhidden = dense_backward( hidden_output, output_z, dL_da_output, weights_output, lambda z: 1.0 ) # Backward pass (hidden layer) dW_hidden, dB_hidden, _ = dense_backward( inputs, hidden_z, dL_dhidden, weights_hidden, relu_derivative ) # Update parameters update_layer(weights_output, bias_output, dW_out, dB_out, learning_rate) update_layer(weights_hidden, bias_hidden, dW_hidden, dB_hidden, learning_rate) if epoch % 100 == 0: print(f"Epoch {epoch}, Loss: {total_loss:.4f}")
Watching Learning Happen
As training progresses, you should see output like:
Epoch 0, Loss: 6.23Epoch 100, Loss: 0.91Epoch 500, Loss: 0.12Epoch 900, Loss: 0.03
This is real learning.
No frameworks.
No shortcuts.
Just math and Python.
What This Loop Actually Did
Each epoch:
- Ran predictions
- Measured error
- Computed gradients
- Updated weights
- Repeated
Neural networks learn gradually, not instantly.
Why This Matters
At this point, you now understand:
- Exactly what
.fit()does internally - Why learning rates matter
- Why gradients can explode or vanish
- How architectures actually learn
- Why debugging neural networks is possible
Nothing in deep learning is hidden anymore.
Common Beginner Mistakes at This Stage
Mistake 1: Expecting instant convergence
→ Learning is incremental.
Mistake 2: Using a large learning rate
→ Leads to instability.
Mistake 3: Blaming the math
→ Most issues are implementation details.
What We Have Completed
You have now built:
- A neural network from scratch
- A working training loop
- A real learning system in pure Python
This completes Part I of the series.
What Comes Next
In Part II, we will explore:
- Improving performance
- Vectorization with NumPy
- Debugging training
- Overfitting and regularization
- Visualizing learning
- Mapping everything to PyTorch
But the foundation is now solid.
GitHub Repository
All code from Articles #1–#10 lives here:
👉 [link to your GitHub repository]
Each article builds incrementally on the last.
Series Recap
Neural Networks From Scratch (Pure Python)
✔ Articles #1–#10 — Core Foundations Complete
If you can follow this series, you already understand neural networks at a deeper level than most users of deep learning frameworks.