Skip to content

Scalar C++ Transformer Implementation — Verified Against PyTorch #50

@dratman

Description

@dratman

Hi @karpathy,

Following your lead, I've built a fully transparent implementation of a GPT-style transformer in C++ using only scalar operations and explicit nested loops—no matrix libraries, no SIMD, no BLAS. Every multiply and addition is visible in the source code.

I have been trying to accomplish this for about a year by various means other than coding it myself. As a long-retired software engineer I did not feel able to accomplish it on my own. For this attempt, implementation credit goes to claude code.

The motivation is the same as nanoGPT: to make the data flow through a transformer completely traceable. If someone wants to understand exactly what happens inside a transformer at the lowest level—as if stepping through it on 1980s hardware—this code is designed to be read line-by-line.

Key details:

  • Both forward pass (inference) and backward pass (training with SGD) are implemented
  • Verified against PyTorch to ~1e-6 precision
  • Uses the same architecture as real transformers (4 layers, multi-head attention, residual connections, layer norm, MLP)
  • Includes verification scripts that train in both PyTorch and C++ and compare loss curves
  • Minimal dependencies: just a C++ compiler and Python/PyTorch for verification
  • Public domain license

I think this could be useful as a companion to nanoGPT—where nanoGPT shows the algorithm in clean Python, this shows what the algorithm looks like at the absolute hardware level.

Repo: https://github.com/dratman/train_a_tiny_GPT_in_cpp

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions