Skip to content

Retriever harness session tooling#1495

Merged
jioffe502 merged 3 commits intoNVIDIA:mainfrom
jioffe502:retriever-harness-session-tooling
Mar 6, 2026
Merged

Retriever harness session tooling#1495
jioffe502 merged 3 commits intoNVIDIA:mainfrom
jioffe502:retriever-harness-session-tooling

Conversation

@jioffe502
Copy link
Copy Markdown
Collaborator

TLDR

This updates the nemo_retriever harness to make dataset/query path resolution more reliable and adds lightweight session review commands for harness runs.
It also validates the new artifact flow with focused unit coverage and a real jp20 e2e recall run.

Description

  • stabilize harness config resolution for relative query_csv values and /raid/$USER dataset fallbacks, and enable the default FinanceBench query fixture
  • add repeatable --tag support plus retriever harness summary and retriever harness compare commands for session inspection
  • clean up harness artifact metric naming and remove the python -m nemo_retriever.harness runtime warning via a small CLI refactor

Test plan

  • pytest -q nemo_retriever/tests/test_harness_parsers.py nemo_retriever/tests/test_harness_config.py nemo_retriever/tests/test_harness_run.py nemo_retriever/tests/test_harness_reporting.py nemo_retriever/tests/test_harness_recall_adapters.py nemo_retriever/tests/test_recall_core.py
  • python -m nemo_retriever.harness nightly --dry-run --tag nightly
  • python -m nemo_retriever.harness run --dataset jp20 --preset single_gpu --run-name jp20_integration_check_cleaned

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@jioffe502 jioffe502 requested a review from a team as a code owner March 5, 2026 22:28
@jioffe502 jioffe502 requested a review from drobison00 March 5, 2026 22:28
@jdye64 jdye64 removed the request for review from drobison00 March 6, 2026 18:13
- stabilize query_csv resolution and financebench defaults
- add tags plus summary and compare session commands
- clean recall metric keys and validate jp20 recall e2e

Signed-off-by: Jacob Ioffe <jioffe@nvidia.com>
- remove cwd fallback from relative query_csv resolution
- simplify /raid dataset fallback to use current user only

Signed-off-by: Jacob Ioffe <jioffe@nvidia.com>
- Mock financebench query_csv fixture path in Path.exists stub
- Prevent CI-only validation failures from missing local fixture file
- Keep test focused on dataset_dir fallback behavior

Signed-off-by: Jacob Ioffe <jioffe@nvidia.com>
@jioffe502 jioffe502 force-pushed the retriever-harness-session-tooling branch from ec3ef3c to c301bbe Compare March 6, 2026 19:10
@jioffe502 jioffe502 merged commit 6d5d983 into NVIDIA:main Mar 6, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants