Principled Zero-shot Ranking Agents with Tournament Graphs
BlitzRank uses tournament graphs to extract maximal information from each LLM call, a principled framework achieving Pareto optimality across 14 benchmarks and 5 LLMs with 25–40% fewer queries.
Algorithm visualization on the 25 horses puzzle: find the 3 fastest horses from 25, racing 5 at a time.
BlitzRank converges in 7 rounds vs Sliding Window's 11 rounds.
uv pip install blitzrankFrom source:
git clone https://github.com/ContextualAI/BlitzRank.git
cd BlitzRank
uv pip install -e .Install with additional dependencies for baselines (AcuRank, TourRank):
uv pip install "blitzrank[all]"
# or from source:
uv pip install -e ".[all]"from blitzrank import BlitzRank, rank
ranker = BlitzRank()
query = "capital of France"
docs = [
"Berlin is the capital of Germany.",
"Paris is the capital of France.",
"Tokyo is the capital of Japan.",
]
# Any LiteLLM-compatible model works — just set the appropriate API keys as env variables
indices = rank(ranker, model="openai/gpt-4.1", query=query, docs=docs, topk=2) # [1, 0]
top_docs = [docs[i] for i in indices]from blitzrank import BlitzRank, evaluate
ranker = BlitzRank()
rankings, metrics = evaluate(ranker, dataset="msmarco/dl19/bm25", model="openai/gpt-4.1")
print(metrics) # {"ndcg@10": 0.72, "map@10": 0.51}
print(rankings) # [{"query": "...", "ranking": [3, 0, 7, ...]}, ...]Dataset names follow the format collection/split/retriever.
| Category | Datasets |
|---|---|
| MSMARCO | msmarco/dl19/bm25, msmarco/dl20/bm25, msmarco/dl21/bm25, msmarco/dl22/bm25, msmarco/dl23/bm25, msmarco/dlhard/bm25 |
| BEIR | beir/nfcorpus/bm25, beir/fiqa/bm25, beir/trec-covid/bm25, beir/nq/bm25, beir/hotpotqa/bm25, beir/scifact/bm25, beir/arguana/bm25, beir/quora/bm25, beir/scidocs/bm25, beir/fever/bm25, beir/climate-fever/bm25, beir/dbpedia-entity/bm25, beir/robust04/bm25, beir/signal1m/bm25, beir/trec-news/bm25, beir/webis-touche2020/bm25 |
| BRIGHT | bright/aops/infx, bright/biology/infx, bright/earth_science/infx, bright/economics/infx, bright/leetcode/infx, bright/pony/infx, bright/psychology/infx, bright/robotics/infx, bright/stackoverflow/infx, bright/sustainable_living/infx, bright/theoremqa_questions/infx, bright/theoremqa_theorems/infx |
All methods share the same interface. Create a ranker (with optional parameters), pass the model to rank/evaluate.
from blitzrank import BlitzRank, SlidingWindow, SetWise, PairWise, TourRank, AcuRank, rank
query = "capital of France"
docs = ["Berlin is in Germany", "Paris is in France", "Tokyo is in Japan"]
for Method in [BlitzRank, SlidingWindow, SetWise, PairWise, TourRank, AcuRank]:
indices = rank(Method(), model="openai/gpt-4.1", query=query, docs=docs, topk=2)Available methods: BlitzRank, SlidingWindow, SetWise, PairWise, TourRank, AcuRank
Run all methods across all 14 datasets and 5 LLMs from the paper (Table 3):
from blitzrank import BlitzRank, SlidingWindow, SetWise, PairWise, TourRank, AcuRank, evaluate
# 6 TREC-DL + 8 BEIR = 14 benchmarks
DATASETS = [
# TREC-DL
"msmarco/dl19/bm25", "msmarco/dl20/bm25", "msmarco/dl21/bm25",
"msmarco/dl22/bm25", "msmarco/dl23/bm25", "msmarco/dlhard/bm25",
# BEIR
"beir/trec-covid/bm25", "beir/nfcorpus/bm25", "beir/signal1m/bm25",
"beir/trec-news/bm25", "beir/robust04/bm25", "beir/webis-touche2020/bm25",
"beir/dbpedia-entity/bm25", "beir/scifact/bm25",
]
MODELS = [
"openai/gpt-4.1",
"vertex_ai/gemini-3-flash-preview",
"openrouter/deepseek/deepseek-v3.2",
"openrouter/qwen/qwen3-235b-a22b-2507",
"openrouter/z-ai/glm-4.7",
]
RANKERS = {
"Blitz-k20": BlitzRank(window_size=20),
"Blitz-k10": BlitzRank(window_size=10),
"SW": SlidingWindow(),
"SW-R2": SlidingWindow(num_rounds=2),
"Setwise": SetWise(),
"Pairwise": PairWise(),
"TourRank": TourRank(),
"TourRank-R2": TourRank(num_rounds=2),
"AcuRank": AcuRank(),
"AcuRank-H": AcuRank(tol=1e-4),
}
for dataset in DATASETS:
for model in MODELS:
for name, ranker in RANKERS.items():
rankings, metrics = evaluate(ranker, dataset=dataset, model=model)
print(f"{name:>12} | {dataset:<28} | {model:<40} | nDCG@10={metrics['ndcg@10']:.3f}")📖 Custom datasets and methods →
This project builds upon the following open-source repositories: RankGPT, LLM-Rankers, AcuRank, Pyserini, and LiteLLM.
@article{blitzrank2026,
title={BlitzRank: Principled Zero-shot Ranking Agents with Tournament Graphs},
author={Agrawal, Sheshansh and Nguyen, Thien Hang and Kiela, Douwe},
journal={arXiv preprint arXiv:2602.05448},
year={2026}
}