
Neural Network Basics for Beginners: Build and Train Your First MLP
Key Terms in Deep Learning – Explained Simply
Term | Meaning |
---|---|
Neuron | A single computing unit that processes input by applying weights, adds a bias, then passes the result through an activation function. |
Activation Function | Makes the neuron output non-linear (like squashing, switching, or shaping the signal). Examples: ReLU, Sigmoid, Tanh. |
Weights | Numbers the network learns to decide how important each input feature is. |
Bias | An extra value added to help shift the output of a neuron. Helps the model fit data better. |
Loss Function | A formula that tells us how far the prediction is from the true answer (like a penalty score). |
Learning Rate (η) | A small number that controls how big the weight updates are during learning. Too high = unstable, too low = slow. |
Epoch | One full cycle of training where the model sees all the training data once. |
Batch | A smaller group of training examples used in one update step (e.g., 32 or 64 samples). |
Mini-Batch Gradient Descent | Updating weights using a small batch of data instead of the whole dataset or just one sample. |
Forward Pass | Data moves through the network, layer by layer, producing an output. |
Backpropagation | The process of calculating how much each weight contributed to the error, and adjusting them. |
Gradient | The direction and amount each weight needs to change to reduce the loss. |
Optimizer | An algorithm that adjusts weights to minimize the loss. Examples: SGD, Adam, RMSprop. |
Overfitting | When the model learns the training data too well (including noise), but performs poorly on new/unseen data. |
Underfitting | When the model is too simple to capture the underlying pattern in the data. |
Regularization | Techniques like dropout or L2 to prevent overfitting by discouraging the model from relying too heavily on any one feature. |
Dropout | Randomly turning off neurons during training to make the model more robust. |
Epoch vs Iteration | Epoch: 1 pass over the full dataset. Iteration: 1 update step (per batch). |
Loss vs Accuracy | Loss: How wrong the model is. Accuracy: How many predictions were correct. |
Training Set | The data used to teach the model. |
Validation Set | Data used to tune hyperparameters and check model performance during training. |
Test Set | Data never seen during training, used to evaluate final model performance. |
Softmax | Turns raw output scores into probabilities for multi-class classification. |
One-Hot Encoding | Represents class labels as vectors (e.g., class 2 of 4 → [0, 0, 1, 0] ). |
Confusion Matrix | A table showing how many predictions were correct or incorrect for each class. |
Precision & Recall | Metrics used to evaluate classification: |
- Precision: How many predicted positives were correct | |
- Recall: How many actual positives were found | |
F1 Score | A balance between precision and recall. |
Hyperparameters | Settings chosen before training (like learning rate, batch size, number of layers). |
Model Parameters | Weights and biases learned by the model. |
Gradient Clipping | Prevents gradients from getting too large and destabilizing training. |
Vanishing Gradient | When gradients become too small and stop learning, common in deep networks. |
Exploding Gradient | When gradients grow too large and cause unstable updates. |
Neural Network Basics: A Hands-On Guide to MLPs, Activation Functions, and Learning Dynamics
Introduction
Neural networks have become foundational to modern machine learning, powering everything from image recognition to language models.
In this notebook, we’ll break down the basics of neural networks, focusing on perceptrons, multi-layer perceptrons (MLPs), activation functions,
and core training principles. We'll also build a simple MLP using PyTorch, explore the role of weight initialization, and visualize decision
boundaries to understand what our networks learn.
1. Perceptron: The Building Block
The perceptron is the simplest type of neural network—a single neuron. It performs a weighted sum of inputs and passes the result through an activation function.
Equation:
𝑦 = 𝜙(𝑤⊤𝑥 + 𝑏)
x: input vector
w: weight vector
b: bias
ϕ: activation function (e.g., step function)
Limitations: A single-layer perceptron can only solve linearly separable problems like AND and OR—not XOR.
2. Multi-Layer Perceptron (MLP)
An MLP is a neural network with one or more hidden layers. It allows for the modeling of complex, non-linear functions.
Structure:
- Input layer: Receives raw input.
- Hidden layers: Transform input using learned weights and non-linear activation functions.
- Output layer: Produces the final prediction.
Each layer applies:
h = ϕ(Wx + b)
MLPs are universal function approximators—with enough neurons and layers, they can model any function.
3. Activation Functions
Activation functions introduce non-linearity, allowing networks to learn complex mappings.
Function | Formula | Use Case |
---|---|---|
Sigmoid | 𝜎(𝑥) = 1 / (1 + 𝑒^−𝑥) | Binary classification; suffers from vanishing gradients |
Tanh | tanh(𝑥) = (𝑒^𝑥 − 𝑒^−𝑥) / (𝑒^𝑥 + 𝑒^−𝑥) | Centered at 0, better than sigmoid, but still vanishes |
ReLU | ReLU(𝑥) = max(0,𝑥) | Fast and sparse; default for hidden layers |
Softmax | Softmax(𝑥𝑖) = 𝑒^𝑥𝑖 / ∑𝑗 𝑒^𝑥𝑗 | Multi-class classification |
4. Forward Pass and Backpropagation
Forward Pass: How Neural Networks Make Predictions
Imagine you’re feeding an image of a cat into a neural network. What happens next?
In an MLP:
- Input data is multiplied by weights and passed through an activation function (like ReLU or sigmoid).
- This process happens layer by layer, from input to output — this is called the forward pass.
- The final layer gives us a prediction (e.g., “80% chance this is a cat”).
Mathematically:
z = W*x + b
a = activation(z)
Where:
- x = input
- W = weight matrix
- b = bias
- a = activation output
Backpropagation: Learning From Mistakes
Now that the network made a prediction, we compare it to the true label and calculate the loss.
Backpropagation is how the network learns:
- Compute the gradient of the loss w.r.t. each weight using the chain rule.
- Adjust the weights slightly in the direction that reduces the loss.
W = W - learning_rate * ∇(loss)
Error is calculated (loss), and gradients are computed using the chain rule to update weights via gradient descent.
High-level Steps:
- Compute loss (e.g., cross-entropy)
- Compute gradients of loss w.r.t. weights
- Update weights:
Vanishing and Exploding Gradients
Backpropagation works well — until it doesn’t.
In deep networks, gradients can become:
- Very small (vanishing) ➜ early layers learn very slowly
- Very large (exploding) ➜ weights blow up, training becomes unstable
This is especially problematic with activation functions like sigmoid, which squash outputs into a small range.
Solution: Weight Initialization
Smart weight initialization helps keep gradients in a good range. Two popular strategies:
- Xavier (Glorot) Initialization – best for tanh activations
- He Initialization – ideal for ReLU
These methods set initial weights with a variance that depends on the number of inputs/outputs per layer.
5. Build an MLP in PyTorch
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
# Generate toy data
X, y = make_moons(n_samples=1000, noise=0.2, random_state=42)
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.long)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Define model
class SimpleMLP(nn.Module):
def __init__(self):
super().__init__()
self.net = nn.Sequential(
nn.Linear(2, 16),
nn.ReLU(),
nn.Linear(16, 2)
)
def forward(self, x):
return self.net(x)
model = SimpleMLP()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)
# Training loop
for epoch in range(100):
optimizer.zero_grad()
outputs = model(X_train)
loss = criterion(outputs, y_train)
loss.backward()
optimizer.step()
if epoch % 10 == 0:
print(f"Epoch {epoch}, Loss: {loss.item():.4f}")
---
Thanks for Reading
---
7 Reactions
0 Bookmarks