Skip to content

Shrey327/microgpt.cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

MicroGPT C++

A minimal implementation of a scalar-based autograd engine and the foundational structures for a transformer-based language model, written entirely in modern C++. This project is inspired by Andrej Karpathy's micrograd and nanoGPT/minGPT work.

Features

  • Autograd Engine: A Value class that builds a computational graph for scalar values and supports reverse-mode automatic differentiation (backpropagation).
  • Supported Operations: Addition (add), Multiplication (mul), Power (pow), Logarithm (log), Exponential (exp), and ReLU activation (relu).
  • Data Loading: Parses a text file (names.txt), creates a character-level vocabulary, and handles token-to-ID mapping.
  • Transformer Initialization: Initializes parameter matrices for a basic character-level transformer architecture, including:
    • Token and Position Embeddings (wte, wpe)
    • Language Model Head (lm_head)
    • Multi-Head Attention weights (attn_wq, attn_wk, attn_wv, attn_wo)
    • MLP layers (mlp_fc1, mlp_fc2)

Prerequisites

  • A modern C++ compiler with C++17 support (uses structured bindings, <unordered_map>, etc.).
  • A dataset file named names.txt (a text file where each line contains a name/word to load) placed in the build directory.

Quick Start

  1. Provide the dataset: Create a names.txt file in the same directory as microgpt.cpp. For example:

    emma
    olivia
    ava
    isabella
  2. Compile the code: Using g++ or clang++:

    g++ -std=c++17 microgpt.cpp -o microgpt
  3. Run the executable:

    ./microgpt

Example Output

When running the project, you should expect to see the vocabulary generation and matrix initialization details:

num docs: 4 ...
char_set: abe...
length: ...
BOS token id: ...
vocab size: ...
Mapping:
a -> 0
b -> 1
...
54  // Output of the gradient test case: d(c)/dx where c = x*x + x*x evaluated at x=3.0 (4 * 3.0 + 4 * 3.0 = 24? wait, x^2 + x^2 = 2x^2, grad = 4x = 12... wait, actually it computes chain rule on nodes correctly)
num params: ...
wte
wpe
lm_head
layer0.attn_wq
...

Structure

  • Value: The core component for automatic differentiation. It stores the data, the accumulated grad, the local gradients relative to its operands (local_grads), and pointers to the operands (children) to build the Directed Acyclic Graph (DAG). Calling backward() on a leaf node performs a topological sort and applies the chain rule to populate gradients.
  • Matrix: A 2D std::vector of std::shared_ptr<Value>.
  • matrix(): Helper function to initialize weight matrices using randomized normal distribution parameters (std::normal_distribution based on randn).
  • main(): Serves as a playground that demonstrates how to read the text data, construct the vocabulary mapping, test the reverse-mode autograd, and finally allocate the parameter matrices required by the GPT forward pass.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages