AMD ROCm (gfx1030) inference fork with RotorQuant/TurboQuant KV compression, PHANTOM-X zero-copy draft speculation, EAGLE3 speculative decoding, 12 RDNA2 crash fixes, and PrismML Bonsai Q1_0_G128 1-bit GGUF support.
triton hip bonsai rocm amd-gpu gguf speculative-decoding sglang rdna2 eagle3 turboquant prismml gfx1030 p-eagle radix-cache
-
Updated
Apr 13, 2026 - Python