Skip to content

Add retrieval_bench: pipeline evaluation for ViDoRe V3 and BRIGHT leaderboards#1498

Open
oliverholworthy wants to merge 24 commits intoNVIDIA:mainfrom
oliverholworthy:oholworthy/agentic-pipeline-evaluation
Open

Add retrieval_bench: pipeline evaluation for ViDoRe V3 and BRIGHT leaderboards#1498
oliverholworthy wants to merge 24 commits intoNVIDIA:mainfrom
oliverholworthy:oholworthy/agentic-pipeline-evaluation

Conversation

@oliverholworthy
Copy link

Add retrieval_bench, a self-contained benchmarking package for evaluating
dense and agentic retrieval pipelines against the ViDoRe V3 and BRIGHT
leaderboards.

What's included:

  • Pluggable dense retrieval pipeline supporting four backends: llama-nv-embed-reasoning-3b, llama-nemoretriever-colembed-3b-v1, llama-nemotron-embed-vl-1b-v2, nemotron-colembed-vl-8b-v2
  • Agentic retrieval pipeline that augments dense retrieval with an LLM agent for iterative query refinement
  • Per-query tracing and result caching
  • CLI via retrieval-bench evaluate dense-retrieval and retrieval-bench evaluate agentic-retrieval
  • Utility commands for listing datasets/backends and comparing results

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file. N/A

@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 6, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

oliverholworthy and others added 23 commits March 6, 2026 13:38
…uation

Incorporates agentic and dense retrieval pipeline benchmarking toolkit
as a standalone package. Includes LLM agent loop with tool-calling,
pluggable retriever backends, and evaluation framework with pytrec_eval
metrics.

Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Co-authored-by: Radek Osmulski <2444926+radekosmulski@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
@oliverholworthy oliverholworthy force-pushed the oholworthy/agentic-pipeline-evaluation branch from 5562153 to c0efb90 Compare March 6, 2026 13:39
@oliverholworthy oliverholworthy self-assigned this Mar 9, 2026
@oliverholworthy oliverholworthy marked this pull request as ready for review March 9, 2026 16:30
@oliverholworthy oliverholworthy requested a review from a team as a code owner March 9, 2026 16:30
@oliverholworthy oliverholworthy requested a review from nkmcalli March 9, 2026 16:30
@oliverholworthy oliverholworthy requested a review from a team March 9, 2026 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants