SolveWithPython

Building a Neural Network Template in Python — The Training Loop and Stochastic Gradient Descent

(Neural Networks From Scratch · Article 4)

This is where learning finally happens

In Article 3, something important changed.

For the first time, the network could answer:

  • what went wrong (loss)
  • where it went wrong (gradients)

But the network still did nothing with that information.

Gradients alone do not change a model.

To learn, we need updates.

That update mechanism is called Stochastic Gradient Descent (SGD).

The core idea of SGD (plain English)

SGD follows a very simple rule:

Move each weight a little bit in the direction that reduces the loss.

That’s it.

No magic.
No intelligence.
Just repeated small corrections.

The update rule (minimal math)

For a single weight:

new_weight = old_weight − learning_rate × gradient
  • gradient tells us which direction to move
  • learning rate tells us how far to move

What is the learning rate?

The learning rate controls step size.

  • Too large → unstable learning
  • Too small → very slow learning
  • Just right → steady improvement

Typical beginner values:

  • 0.1
  • 0.01
  • 0.001

We will start simple.

Step 1: Implement SGD

Create nn/optimizers.py:

Python
class SGD:
def __init__(self, lr=0.01):
self.lr = lr
def step(self, layers):
for layer in layers:
if hasattr(layer, "weights"):
for i in range(len(layer.weights)):
for j in range(len(layer.weights[i])):
layer.weights[i][j] -= self.lr * layer.grad_weights[i][j]
for i in range(len(layer.bias)):
layer.bias[i] -= self.lr * layer.grad_bias[i]

This optimizer:

  • looks at each layer
  • checks if it has trainable parameters
  • applies gradient updates

Step 2: Add training logic to NeuralNet

Update nn/core.py:

class NeuralNet:
def __init__(self, layers, loss, optimizer):
self.layers = layers
self.loss = loss
self.optimizer = optimizer
def forward(self, x):
for layer in self.layers:
x = layer.forward(x)
return x
def backward(self, grad):
for layer in reversed(self.layers):
grad = layer.backward(grad)
def train_step(self, x, y):
y_pred = self.forward(x)
loss_value = self.loss.forward(y_pred, y)
grad_loss = self.loss.backward()
self.backward(grad_loss)
self.optimizer.step(self.layers)
return loss_value

This is the heart of learning.

Step 3: The training loop

Now we repeat the same steps many times.

Create examples/04_training_demo.py:

Python
import random
from nn.layers import Dense
from nn.core import NeuralNet
from nn.losses import MSE
from nn.optimizers import SGD
random.seed(42)
net = NeuralNet(
layers=[Dense(1, 1)],
loss=MSE(),
optimizer=SGD(lr=0.1)
)
X = [[1.0], [2.0], [3.0], [4.0]]
y = [2.0, 4.0, 6.0, 8.0]
for epoch in range(20):
total_loss = 0.0
for x, target in zip(X, y):
loss = net.train_step(x, [target])
total_loss += loss
print(f"Epoch {epoch:02d} | Loss: {total_loss:.4f}")

Run:

python -m examples.04_training_demo

What you should observe

You should see something like:

Epoch 00 | Loss: 12.83
Epoch 01 | Loss: 4.91
Epoch 02 | Loss: 1.87
...
Epoch 19 | Loss: 0.02

The exact numbers may differ slightly.

What matters:

  • loss goes down
  • predictions improve
  • learning is happening

Why this works

Each epoch:

  1. Forward pass → prediction
  2. Loss → error size
  3. Backward pass → gradients
  4. SGD → weight update

Repeated many times:

  • random network
  • becomes structured
  • approximates the data

Important beginner insight

Training is just repetition with correction.

There is no intelligence inside SGD.
The power comes from:

  • gradients
  • repetition
  • small steps

Common beginner mistakes

“My loss explodes”

Learning rate is too high.

“My loss doesn’t change”

Learning rate is too low
or gradients are zero.

“Why do we update after each sample?”

This is stochastic gradient descent.
Batch training comes later.

What we achieved in Article 4

You now have:

  • a complete training loop
  • gradient-based learning
  • a real optimizer
  • a network that improves itself

This is a real neural network, not a demo.

What comes next (Article 5)

Next, we will:

  • clean up the API
  • add fit() and predict()
  • train a slightly bigger model
  • run a full end-to-end example

This is where the template starts to feel usable.

Series progress

Python source code is available on Github: https://github.com/Benard-Kemp/Building-a-Neural-Network-Template-in-Python