Scalar C++ Transformer Implementation — Verified Against PyTorch

Hi @karpathy,

Following your lead, I've built a fully transparent implementation of a GPT-style transformer in C++ using only scalar operations and explicit nested loops—no matrix libraries, no SIMD, no BLAS. Every multiply and addition is visible in the source code.

I have been trying to accomplish this for about a year by various means other than coding it myself. As a long-retired software engineer I did not feel able to accomplish it on my own. For this attempt, implementation credit goes to claude code.

The motivation is the same as nanoGPT: to make the data flow through a transformer completely traceable. If someone wants to understand exactly what happens inside a transformer at the lowest level—as if stepping through it on 1980s hardware—this code is designed to be read line-by-line.

**Key details:**
- Both forward pass (inference) and backward pass (training with SGD) are implemented
- Verified against PyTorch to ~1e-6 precision
- Uses the same architecture as real transformers (4 layers, multi-head attention, residual connections, layer norm, MLP)
- Includes verification scripts that train in both PyTorch and C++ and compare loss curves
- Minimal dependencies: just a C++ compiler and Python/PyTorch for verification
- Public domain license

I think this could be useful as a companion to nanoGPT—where nanoGPT shows the algorithm in clean Python, this shows what the algorithm looks like at the absolute hardware level.

**Repo:** https://github.com/dratman/train_a_tiny_GPT_in_cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scalar C++ Transformer Implementation — Verified Against PyTorch #50

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Scalar C++ Transformer Implementation — Verified Against PyTorch #50

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions