# Tensors with autograd
x = torch.tensor([1.0, 2.0], requires_grad=True)
y = x.pow(2).sum()  # y = x₁² + x₂²
y.backward()        # Compute gradients
print(x.grad)       # tensor([2., 4.]) = [2x₁, 2x₂]

Building Blocks

Neural Network Architecture

Layer composition using nn.Module - the elegant abstraction that makes complex networks simple to build and train.

Key Insight

nn.Sequential chains layers linearly, but real power comes from custom forward() methods where you control data flow.

Python

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, 3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2)
        )
    
    def forward(self, x):
        return self.features(x)

Core Pattern

Training Loop Mastery

The training loop is where theory meets practice - forward pass, loss computation, backward pass, and optimization step.

Key Insight

zero_grad() before backward() is crucial - PyTorch accumulates gradients by default, which is useful for gradient accumulation but dangerous if forgotten.

Python

for epoch in range(epochs):
    for batch_x, batch_y in dataloader:
        optimizer.zero_grad()       # Clear gradients
        predictions = model(batch_x) # Forward pass
        loss = criterion(predictions, batch_y)
        loss.backward()             # Compute gradients
        optimizer.step()            # Update weights

Production Technique

Transfer Learning

Leverage pre-trained models to achieve state-of-the-art results with minimal data. Fine-tune the final layers while freezing earlier features.

Key Insight

Early layers learn universal features (edges, textures). Only fine-tune later layers for your specific task to avoid catastrophic forgetting.

Python

# Load pre-trained ResNet
model = models.resnet50(pretrained=True)

# Freeze early layers
for param in model.parameters():
    param.requires_grad = False

# Replace final layer for your task
model.fc = nn.Linear(2048, num_classes)

# Only train the new layer
optimizer = optim.Adam(model.fc.parameters())

MLOps

Optuna Hyperparameter Tuning

Automated hyperparameter optimization using Bayesian search - finds optimal configurations that human intuition would miss.

Key Insight

Optuna's pruning feature stops unpromising trials early, making hyperparameter search 10x faster than grid search.

Python

def objective(trial):
    lr = trial.suggest_float('lr', 1e-5, 1e-1, log=True)
    batch = trial.suggest_int('batch_size', 16, 128)
    layers = trial.suggest_int('n_layers', 1, 5)
    
    model = build_model(layers)
    accuracy = train_and_eval(model, lr, batch)
    return accuracy

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=100)

Advanced Architecture

Skip Connections (ResNet)

Residual connections allow gradients to flow directly through the network, enabling training of 100+ layer networks.

Key Insight

The identity mapping (x + F(x)) means the network only needs to learn the residual - much easier than learning the full transformation.

Python

class ResidualBlock(nn.Module):
    def __init__(self, channels):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(channels, channels, 3, padding=1),
            nn.BatchNorm2d(channels),
            nn.ReLU(),
            nn.Conv2d(channels, channels, 3, padding=1),
            nn.BatchNorm2d(channels)
        )
    
    def forward(self, x):
        return F.relu(x + self.conv(x))  # Skip!

Real Results

3 min read

Measurable Impact

Actual metrics from my notebook experiments — not theoretical, but proven

DenseNet Transfer Learning

21-class land use classification with only 100 images per class

Before

76.53%

+14.38%

After

90.91%

MNIST Classifier

Digit recognition with fully connected neural network

Before

~85%

+14%

After

99%

Optuna HPO

Automated hyperparameter optimization with pruning

Before

Grid Search

Bayesian

After

10x Faster

Biggest Challenge

Understanding why my gradients kept exploding in deep networks — until I realized BatchNorm wasn't just optional. The moment skip connections clicked, everything changed.

What I'd Do Differently

Start with transfer learning from day one. I spent weeks training from scratch before discovering that a pre-trained backbone + small head gets you 90% of the way in 5 minutes.

Learning Path

8 min read

The Complete Journey

Click each module to explore what I learned, built, and discovered

Course 1

Complete

Neural Network Fundamentals

Built my first neural networks from scratch, understanding how tensors, gradients, and optimization work together.

12 notebooks

4 projects

Course 2

Complete

Production Optimization

Moved beyond accuracy to production concerns: hyperparameter tuning, transfer learning, NLP, and performance profiling.

19 notebooks

5 projects

Course 3

Complete

Advanced Architectures

State-of-the-art architectures, transformers, generative AI, and production deployment with MLOps.

13 notebooks

8 projects

Certificate Completed

January 2026

Explore the Code

All 44 notebooks, implementations, and experiments are open source

View Repository Back to Portfolio

PyTorch

Deep Learning

Transformers

MLOps

ONNX

From Tensors toProduction AI

The Building Blocks of Deep Learning

Learning Progression

Tensor Operations

Neural Network Architecture

Training Loop Mastery

Transfer Learning

Optuna Hyperparameter Tuning

Skip Connections (ResNet)

Measurable Impact

DenseNet Transfer Learning

MNIST Classifier

Optuna HPO

Biggest Challenge

What I'd Do Differently

The Complete Journey

Neural Network Fundamentals

Production Optimization

Advanced Architectures

Explore the Code

From Tensors to
Production AI