Lots and lots of exploratory code and implementation of everything in LLM from scratch
- GPT2 Model Implemented
- Online Softmax
- Speculative decoding
- More Efficient GPT2
- Batch Onlne Softmax
- Flash Attention
- Rotary Embeddings
- Perceptron, Linear, LayerNorm
- Automatic Differentiation for scalar vars
- Character level tokenizer
- MultiHead Latent Attention
- End to End Char Level training pipeline