MicroGPT C++

A minimal implementation of a scalar-based autograd engine and the foundational structures for a transformer-based language model, written entirely in modern C++. This project is inspired by Andrej Karpathy's micrograd and nanoGPT/minGPT work.

Features

Autograd Engine: A Value class that builds a computational graph for scalar values and supports reverse-mode automatic differentiation (backpropagation).
Supported Operations: Addition (add), Multiplication (mul), Power (pow), Logarithm (log), Exponential (exp), and ReLU activation (relu).
Data Loading: Parses a text file (names.txt), creates a character-level vocabulary, and handles token-to-ID mapping.
Transformer Initialization: Initializes parameter matrices for a basic character-level transformer architecture, including:
- Token and Position Embeddings (wte, wpe)
- Language Model Head (lm_head)
- Multi-Head Attention weights (attn_wq, attn_wk, attn_wv, attn_wo)
- MLP layers (mlp_fc1, mlp_fc2)

Prerequisites

A modern C++ compiler with C++17 support (uses structured bindings, <unordered_map>, etc.).
A dataset file named names.txt (a text file where each line contains a name/word to load) placed in the build directory.

Quick Start

Provide the dataset: Create a names.txt file in the same directory as microgpt.cpp. For example:
```
emma
olivia
ava
isabella
```
Compile the code: Using g++ or clang++:
```
g++ -std=c++17 microgpt.cpp -o microgpt
```
Run the executable:
```
./microgpt
```

Example Output

When running the project, you should expect to see the vocabulary generation and matrix initialization details:

num docs: 4 ...
char_set: abe...
length: ...
BOS token id: ...
vocab size: ...
Mapping:
a -> 0
b -> 1
...
54  // Output of the gradient test case: d(c)/dx where c = x*x + x*x evaluated at x=3.0 (4 * 3.0 + 4 * 3.0 = 24? wait, x^2 + x^2 = 2x^2, grad = 4x = 12... wait, actually it computes chain rule on nodes correctly)
num params: ...
wte
wpe
lm_head
layer0.attn_wq
...

Structure

Value: The core component for automatic differentiation. It stores the data, the accumulated grad, the local gradients relative to its operands (local_grads), and pointers to the operands (children) to build the Directed Acyclic Graph (DAG). Calling backward() on a leaf node performs a topological sort and applies the chain rule to populate gradients.
Matrix: A 2D std::vector of std::shared_ptr<Value>.
matrix(): Helper function to initialize weight matrices using randomized normal distribution parameters (std::normal_distribution based on randn).
main(): Serves as a playground that demonstrates how to read the text data, construct the vocabulary mapping, test the reverse-mode autograd, and finally allocate the parameter matrices required by the GPT forward pass.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
microgpt.cpp		microgpt.cpp
names.txt		names.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MicroGPT C++

Features

Prerequisites

Quick Start

Example Output

Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MicroGPT C++

Features

Prerequisites

Quick Start

Example Output

Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages