Hi @karpathy,
Following your lead, I've built a fully transparent implementation of a GPT-style transformer in C++ using only scalar operations and explicit nested loops—no matrix libraries, no SIMD, no BLAS. Every multiply and addition is visible in the source code.
I have been trying to accomplish this for about a year by various means other than coding it myself. As a long-retired software engineer I did not feel able to accomplish it on my own. For this attempt, implementation credit goes to claude code.
The motivation is the same as nanoGPT: to make the data flow through a transformer completely traceable. If someone wants to understand exactly what happens inside a transformer at the lowest level—as if stepping through it on 1980s hardware—this code is designed to be read line-by-line.
Key details:
- Both forward pass (inference) and backward pass (training with SGD) are implemented
- Verified against PyTorch to ~1e-6 precision
- Uses the same architecture as real transformers (4 layers, multi-head attention, residual connections, layer norm, MLP)
- Includes verification scripts that train in both PyTorch and C++ and compare loss curves
- Minimal dependencies: just a C++ compiler and Python/PyTorch for verification
- Public domain license
I think this could be useful as a companion to nanoGPT—where nanoGPT shows the algorithm in clean Python, this shows what the algorithm looks like at the absolute hardware level.
Repo: https://github.com/dratman/train_a_tiny_GPT_in_cpp
Hi @karpathy,
Following your lead, I've built a fully transparent implementation of a GPT-style transformer in C++ using only scalar operations and explicit nested loops—no matrix libraries, no SIMD, no BLAS. Every multiply and addition is visible in the source code.
I have been trying to accomplish this for about a year by various means other than coding it myself. As a long-retired software engineer I did not feel able to accomplish it on my own. For this attempt, implementation credit goes to claude code.
The motivation is the same as nanoGPT: to make the data flow through a transformer completely traceable. If someone wants to understand exactly what happens inside a transformer at the lowest level—as if stepping through it on 1980s hardware—this code is designed to be read line-by-line.
Key details:
I think this could be useful as a companion to nanoGPT—where nanoGPT shows the algorithm in clean Python, this shows what the algorithm looks like at the absolute hardware level.
Repo: https://github.com/dratman/train_a_tiny_GPT_in_cpp