Skip to content

IST-DASLab/Quartet-II

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

104 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation

This is the official code for the Quartet II NVFP4 training paper arXiv

image

Quickstart

Create a conda environment and install dependencies (we recommend Python 3.11):

conda create -n env python=3.11
conda activate env
pip install -r requirements.txt

Reproduce Quartet II sweeps in SLURM:

cd scripts
sbatch quartetv2_sweep.sh

Inspect the scheme implementation at:

[quartet_2.py](./src/models/quantization/schemes/quartet_2.py)

NVFP4 Kernels

We provide the kernels tuned for RTX 5090 (sm120a) in ./kernels. They require CUDA 12.8 or newer and close to latest (~2.9.0) pytorch. Install them with

cd kernels
pip install --no-build-isolation .

You can then use the provided drop-in NVFP4 nn.Linear replacement as follows:

from quartet2.linear import Quartet_II_linear

linear = Quartet_II_linear(
    in_dim,
    out_dim,
    device="cuda",
    dtype=torch.bfloat16,
)
...

You can further benchmark the kernels agains BF16, FP8 and Quartet with

cd test
pythpn bench_linear.py

Cite This Work

@misc{panferov2026quartetiiaccuratellm,
      title={Quartet II: Accurate LLM Pre-Training in NVFP4 by Improved Unbiased Gradient Estimation}, 
      author={Andrei Panferov and Erik Schultheis and Soroush Tabesh and Dan Alistarh},
      year={2026},
      eprint={2601.22813},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2601.22813}, 
}

About

Quartet II Official Code

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Contributors 3

  •  
  •  
  •