SolveWithPython

Building a Neural Network Template in Python — The Training Loop and Stochastic Gradient Descent

(Neural Networks From Scratch · Article 4)

This is where learning finally happens

In Article 3, something important changed.

For the first time, the network could answer:

what went wrong (loss)
where it went wrong (gradients)

But the network still did nothing with that information.

Gradients alone do not change a model.

To learn, we need updates.

That update mechanism is called Stochastic Gradient Descent (SGD).

The core idea of SGD (plain English)

SGD follows a very simple rule:

Move each weight a little bit in the direction that reduces the loss.

That’s it.

No magic.
No intelligence.
Just repeated small corrections.

The update rule (minimal math)

For a single weight:

new_weight = old_weight − learning_rate × gradient

gradient tells us which direction to move
learning rate tells us how far to move

What is the learning rate?

The learning rate controls step size.

Too large → unstable learning
Too small → very slow learning
Just right → steady improvement

Typical beginner values:

0.1
0.01
0.001

We will start simple.

Step 1: Implement SGD

Create nn/optimizers.py:

class SGD:
    def __init__(self, lr=0.01):
        self.lr = lr
    def step(self, layers):
        for layer in layers:
            if hasattr(layer, "weights"):
                for i in range(len(layer.weights)):
                    for j in range(len(layer.weights[i])):
                        layer.weights[i][j] -= self.lr * layer.grad_weights[i][j]
                for i in range(len(layer.bias)):
                    layer.bias[i] -= self.lr * layer.grad_bias[i]

This optimizer:

looks at each layer
checks if it has trainable parameters
applies gradient updates

Step 2: Add training logic to `NeuralNet`

Update nn/core.py:

			
class NeuralNet:
    def __init__(self, layers, loss, optimizer):
        self.layers = layers
        self.loss = loss
        self.optimizer = optimizer
    def forward(self, x):
        for layer in self.layers:
            x = layer.forward(x)
        return x
    def backward(self, grad):
        for layer in reversed(self.layers):
            grad = layer.backward(grad)
    def train_step(self, x, y):
        y_pred = self.forward(x)
        loss_value = self.loss.forward(y_pred, y)
        grad_loss = self.loss.backward()
        self.backward(grad_loss)
        self.optimizer.step(self.layers)
        return loss_value

		

This is the heart of learning.

Step 3: The training loop

Now we repeat the same steps many times.

Create examples/04_training_demo.py:

import random
from nn.layers import Dense
from nn.core import NeuralNet
from nn.losses import MSE
from nn.optimizers import SGD
random.seed(42)
net = NeuralNet(
    layers=[Dense(1, 1)],
    loss=MSE(),
    optimizer=SGD(lr=0.1)
)
X = [[1.0], [2.0], [3.0], [4.0]]
y = [2.0, 4.0, 6.0, 8.0]
for epoch in range(20):
    total_loss = 0.0
    for x, target in zip(X, y):
        loss = net.train_step(x, [target])
        total_loss += loss
    print(f"Epoch {epoch:02d} | Loss: {total_loss:.4f}")

Run:

python -m examples.04_training_demo

What you should observe

You should see something like:

			
Epoch 00 | Loss: 12.83
Epoch 01 | Loss: 4.91
Epoch 02 | Loss: 1.87
...
Epoch 19 | Loss: 0.02

		

The exact numbers may differ slightly.

What matters:

loss goes down
predictions improve
learning is happening

Why this works

Each epoch:

Forward pass → prediction
Loss → error size
Backward pass → gradients
SGD → weight update

Repeated many times:

random network
becomes structured
approximates the data

Important beginner insight

Training is just repetition with correction.

There is no intelligence inside SGD.
The power comes from:

gradients
repetition
small steps

Common beginner mistakes

“My loss explodes”

Learning rate is too high.

“My loss doesn’t change”

Learning rate is too low
or gradients are zero.

“Why do we update after each sample?”

This is stochastic gradient descent.
Batch training comes later.

What we achieved in Article 4

You now have:

a complete training loop
gradient-based learning
a real optimizer
a network that improves itself

This is a real neural network, not a demo.

What comes next (Article 5)

Next, we will:

clean up the API
add fit() and predict()
train a slightly bigger model
run a full end-to-end example

This is where the template starts to feel usable.

Series progress

Article 1: Project setup & core abstractions ✅
Article 2: Loss functions & error intuition ✅
Article 3: Gradients & backward pass ✅
Article 4: Training loop & SGD ✅
Article 5: Your first complete model ⏭️

Python source code is available on Github: https://github.com/Benard-Kemp/Building-a-Neural-Network-Template-in-Python