LLM inference engine based on the C language.
.ggufParser
- Dequantizer
- Inference Logic for LLM
- Positional Encoder
- Feed-Forward Layer
- Multi-Head Attention
- Normalization Layer
- Actavation function
- SIMD
- KV cache management
- Sampling algorithms
- Tokenizer
- Look-up algorithm