Homebrew GPT-2

A ground-up implementation of the GPT-2 decoder-only transformer architecture, trained on the Fineweb dataset. No pretrained weights, no shortcuts - just the architecture and a lot of compute.

Why Build This

Using language model APIs is easy. Understanding what’s actually happening inside them is harder. This project was about building that understanding by implementing every component from scratch.

Architecture

The standard GPT-2 decoder-only transformer:

Input Tokens
    ↓
Token Embeddings + Positional Embeddings
    ↓
┌─────────────────────────────────────┐
│  Transformer Block (×N layers)     │
│  ┌─────────────────────────────┐   │
│  │ Layer Norm                  │   │
│  │ Causal Self-Attention       │   │
│  │ Residual Connection         │   │
│  │ Layer Norm                  │   │
│  │ Feed-Forward Network        │   │
│  │ Residual Connection         │   │
│  └─────────────────────────────┘   │
└─────────────────────────────────────┘
    ↓
Layer Norm
    ↓
Linear → Vocabulary Logits

Components Implemented

Multi-Head Causal Self-Attention
- Query, Key, Value projections
- Scaled dot-product attention with causal masking
- Multiple attention heads with concatenation
Position-wise Feed-Forward Network
- Two linear transformations with GELU activation
- 4x expansion in hidden dimension
Training Infrastructure
- AdamW optimizer with weight decay
- Learning rate warmup and cosine decay
- Gradient clipping
- Mixed precision training

Training Data

Trained on Fineweb, a large-scale web text dataset. The dataset provides diverse, high-quality text for language modeling.

Key Learnings

Attention is expensive: The O(n²) scaling with sequence length is very real
Initialization matters: Bad initialization can make training unstable or prevent convergence entirely
The devil is in the details: Small implementation bugs (wrong dimension, missing normalization) cause silent failures

Tech Stack

Python, PyTorch
Fineweb dataset
Custom training loop (no Trainer abstractions)

Repo Link

https://github.com/Mkrolick/Homebrew-GPT-2