(Neural Networks From Scratch · Article 4)
This is where learning finally happens
In Article 3, something important changed.
For the first time, the network could answer:
- what went wrong (loss)
- where it went wrong (gradients)
But the network still did nothing with that information.
Gradients alone do not change a model.
To learn, we need updates.
That update mechanism is called Stochastic Gradient Descent (SGD).
The core idea of SGD (plain English)
SGD follows a very simple rule:
Move each weight a little bit in the direction that reduces the loss.
That’s it.
No magic.
No intelligence.
Just repeated small corrections.
The update rule (minimal math)
For a single weight:
new_weight = old_weight − learning_rate × gradient
- gradient tells us which direction to move
- learning rate tells us how far to move
What is the learning rate?
The learning rate controls step size.
- Too large → unstable learning
- Too small → very slow learning
- Just right → steady improvement
Typical beginner values:
0.10.010.001
We will start simple.
Step 1: Implement SGD
Create nn/optimizers.py:
class SGD: def __init__(self, lr=0.01): self.lr = lr def step(self, layers): for layer in layers: if hasattr(layer, "weights"): for i in range(len(layer.weights)): for j in range(len(layer.weights[i])): layer.weights[i][j] -= self.lr * layer.grad_weights[i][j] for i in range(len(layer.bias)): layer.bias[i] -= self.lr * layer.grad_bias[i]
This optimizer:
- looks at each layer
- checks if it has trainable parameters
- applies gradient updates
Step 2: Add training logic to NeuralNet
Update nn/core.py:
class NeuralNet: def __init__(self, layers, loss, optimizer): self.layers = layers self.loss = loss self.optimizer = optimizer def forward(self, x): for layer in self.layers: x = layer.forward(x) return x def backward(self, grad): for layer in reversed(self.layers): grad = layer.backward(grad) def train_step(self, x, y): y_pred = self.forward(x) loss_value = self.loss.forward(y_pred, y) grad_loss = self.loss.backward() self.backward(grad_loss) self.optimizer.step(self.layers) return loss_value
This is the heart of learning.
Step 3: The training loop
Now we repeat the same steps many times.
Create examples/04_training_demo.py:
import randomfrom nn.layers import Densefrom nn.core import NeuralNetfrom nn.losses import MSEfrom nn.optimizers import SGDrandom.seed(42)net = NeuralNet( layers=[Dense(1, 1)], loss=MSE(), optimizer=SGD(lr=0.1))X = [[1.0], [2.0], [3.0], [4.0]]y = [2.0, 4.0, 6.0, 8.0]for epoch in range(20): total_loss = 0.0 for x, target in zip(X, y): loss = net.train_step(x, [target]) total_loss += loss print(f"Epoch {epoch:02d} | Loss: {total_loss:.4f}")
Run:
python -m examples.04_training_demo
What you should observe
You should see something like:
Epoch 00 | Loss: 12.83Epoch 01 | Loss: 4.91Epoch 02 | Loss: 1.87...Epoch 19 | Loss: 0.02
The exact numbers may differ slightly.
What matters:
- loss goes down
- predictions improve
- learning is happening
Why this works
Each epoch:
- Forward pass → prediction
- Loss → error size
- Backward pass → gradients
- SGD → weight update
Repeated many times:
- random network
- becomes structured
- approximates the data
Important beginner insight
Training is just repetition with correction.
There is no intelligence inside SGD.
The power comes from:
- gradients
- repetition
- small steps
Common beginner mistakes
“My loss explodes”
Learning rate is too high.
“My loss doesn’t change”
Learning rate is too low
or gradients are zero.
“Why do we update after each sample?”
This is stochastic gradient descent.
Batch training comes later.
What we achieved in Article 4
You now have:
- a complete training loop
- gradient-based learning
- a real optimizer
- a network that improves itself
This is a real neural network, not a demo.
What comes next (Article 5)
Next, we will:
- clean up the API
- add
fit()andpredict() - train a slightly bigger model
- run a full end-to-end example
This is where the template starts to feel usable.
Series progress
- Article 1: Project setup & core abstractions ✅
- Article 2: Loss functions & error intuition ✅
- Article 3: Gradients & backward pass ✅
- Article 4: Training loop & SGD ✅
- Article 5: Your first complete model ⏭️
Python source code is available on Github: https://github.com/Benard-Kemp/Building-a-Neural-Network-Template-in-Python