Skip to content

ContextualAI/BlitzRank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BlitzRank

Principled Zero-shot Ranking Agents with Tournament Graphs

arXiv Website License: MIT

BlitzRank uses tournament graphs to extract maximal information from each LLM call, a principled framework achieving Pareto optimality across 14 benchmarks and 5 LLMs with 25–40% fewer queries.

BlitzRank vs Sliding Window animation
Algorithm visualization on the 25 horses puzzle: find the 3 fastest horses from 25, racing 5 at a time.
BlitzRank converges in 7 rounds vs Sliding Window's 11 rounds.

Installation

uv pip install blitzrank

From source:

git clone https://github.com/ContextualAI/BlitzRank.git
cd BlitzRank
uv pip install -e .

Install with additional dependencies for baselines (AcuRank, TourRank):

uv pip install "blitzrank[all]"
# or from source:
uv pip install -e ".[all]"

Quick Start

from blitzrank import BlitzRank, rank

ranker = BlitzRank()

query = "capital of France"
docs = [
    "Berlin is the capital of Germany.",
    "Paris is the capital of France.",
    "Tokyo is the capital of Japan.",
]

# Any LiteLLM-compatible model works — just set the appropriate API keys as env variables
indices = rank(ranker, model="openai/gpt-4.1", query=query, docs=docs, topk=2)  # [1, 0]
top_docs = [docs[i] for i in indices]

Evaluate on a Benchmark

from blitzrank import BlitzRank, evaluate

ranker = BlitzRank()
rankings, metrics = evaluate(ranker, dataset="msmarco/dl19/bm25", model="openai/gpt-4.1")

print(metrics)   # {"ndcg@10": 0.72, "map@10": 0.51}
print(rankings)  # [{"query": "...", "ranking": [3, 0, 7, ...]}, ...]

Dataset names follow the format collection/split/retriever.

Category Datasets
MSMARCO msmarco/dl19/bm25, msmarco/dl20/bm25, msmarco/dl21/bm25, msmarco/dl22/bm25, msmarco/dl23/bm25, msmarco/dlhard/bm25
BEIR beir/nfcorpus/bm25, beir/fiqa/bm25, beir/trec-covid/bm25, beir/nq/bm25, beir/hotpotqa/bm25, beir/scifact/bm25, beir/arguana/bm25, beir/quora/bm25, beir/scidocs/bm25, beir/fever/bm25, beir/climate-fever/bm25, beir/dbpedia-entity/bm25, beir/robust04/bm25, beir/signal1m/bm25, beir/trec-news/bm25, beir/webis-touche2020/bm25
BRIGHT bright/aops/infx, bright/biology/infx, bright/earth_science/infx, bright/economics/infx, bright/leetcode/infx, bright/pony/infx, bright/psychology/infx, bright/robotics/infx, bright/stackoverflow/infx, bright/sustainable_living/infx, bright/theoremqa_questions/infx, bright/theoremqa_theorems/infx

Baselines

All methods share the same interface. Create a ranker (with optional parameters), pass the model to rank/evaluate.

from blitzrank import BlitzRank, SlidingWindow, SetWise, PairWise, TourRank, AcuRank, rank

query = "capital of France"
docs = ["Berlin is in Germany", "Paris is in France", "Tokyo is in Japan"]

for Method in [BlitzRank, SlidingWindow, SetWise, PairWise, TourRank, AcuRank]:
    indices = rank(Method(), model="openai/gpt-4.1", query=query, docs=docs, topk=2)

Available methods: BlitzRank, SlidingWindow, SetWise, PairWise, TourRank, AcuRank

📖 Full parameter reference →

Reproducing Paper Results

Run all methods across all 14 datasets and 5 LLMs from the paper (Table 3):

from blitzrank import BlitzRank, SlidingWindow, SetWise, PairWise, TourRank, AcuRank, evaluate

# 6 TREC-DL + 8 BEIR = 14 benchmarks
DATASETS = [
    # TREC-DL
    "msmarco/dl19/bm25", "msmarco/dl20/bm25", "msmarco/dl21/bm25",
    "msmarco/dl22/bm25", "msmarco/dl23/bm25", "msmarco/dlhard/bm25",
    # BEIR
    "beir/trec-covid/bm25", "beir/nfcorpus/bm25", "beir/signal1m/bm25",
    "beir/trec-news/bm25", "beir/robust04/bm25", "beir/webis-touche2020/bm25",
    "beir/dbpedia-entity/bm25", "beir/scifact/bm25",
]
MODELS = [
    "openai/gpt-4.1",
    "vertex_ai/gemini-3-flash-preview",
    "openrouter/deepseek/deepseek-v3.2",
    "openrouter/qwen/qwen3-235b-a22b-2507",
    "openrouter/z-ai/glm-4.7",
]
RANKERS = {
    "Blitz-k20": BlitzRank(window_size=20),
    "Blitz-k10": BlitzRank(window_size=10),
    "SW": SlidingWindow(),
    "SW-R2": SlidingWindow(num_rounds=2),
    "Setwise": SetWise(),
    "Pairwise": PairWise(),
    "TourRank": TourRank(),
    "TourRank-R2": TourRank(num_rounds=2),
    "AcuRank": AcuRank(),
    "AcuRank-H": AcuRank(tol=1e-4),
}

for dataset in DATASETS:
    for model in MODELS:
        for name, ranker in RANKERS.items():
            rankings, metrics = evaluate(ranker, dataset=dataset, model=model)
            print(f"{name:>12} | {dataset:<28} | {model:<40} | nDCG@10={metrics['ndcg@10']:.3f}")

📖 Custom datasets and methods →

Acknowledgements

This project builds upon the following open-source repositories: RankGPT, LLM-Rankers, AcuRank, Pyserini, and LiteLLM.

Citation

@article{blitzrank2026,
  title={BlitzRank: Principled Zero-shot Ranking Agents with Tournament Graphs},
  author={Agrawal, Sheshansh and Nguyen, Thien Hang and Kiela, Douwe},
  journal={arXiv preprint arXiv:2602.05448},
  year={2026}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages