Add support and documentation for AnyModel checkpoints with Nemo evaluator by j-rausch · Pull Request #894 · NVIDIA/Model-Optimizer

j-rausch · 2026-02-16T09:29:24Z

This PR adds Nemo Evaluator support to the AnyModel branch. It includes documentation and a deployment script that allow for evaluation of AnyModel Puzzletron checkpoints with Nemo Evaluator.

We assume development on a GPU node, following the current tutorial style, so we don't rely on Slurm-based deployment/evaluation, but instead use direct evaluation via eval-factory run_eval.

…uator Signed-off-by: jrausch <jrausch@nvidia.com>

coderabbitai · 2026-02-16T09:29:35Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (3)

main
release/.*
feature/.*

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch jrausch/any_model_nveval_squash

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

.pre-commit-config.yaml

danielkorzekwa · 2026-02-16T10:17:10Z

examples/puzzletron/README.md

+### Local Evaluation with NeMo Evaluator (AnyModel)

-#### Example Results
+AnyModel checkpoints are currently supported via the patched NeMo Evaluator deployable


Via the patched NeMo Evaluator deployable - this is too technical, keep the tutorial simple for and users, e.g. just write about: deploys a local OpenAI-style completions endpoint that evaluation can be run against

details have under some subpage with details for evaluation

Simplified in latest commit

danielkorzekwa · 2026-02-16T10:19:45Z

examples/puzzletron/README.md

+> **Note:** This flow requires Ray. If it is missing, install it in the container/venv:
+>
+> ```bash
+> pip install ray


there is a pip install section on top of this tutorial, integrate?

I'll move it into requirements.txt

danielkorzekwa · 2026-02-16T10:22:40Z

examples/puzzletron/README.md

-The plot shows how token accuracy changes with different compression rates. Higher compression (0.5 = 50% of original memory) reduces accuracy, while lower compression maintains accuracy closer to the teacher model.
+```bash
+# Repo root (not puzzle_dir)
+export MODELOPT_WORKDIR=/path/to/Model-Optimizer


keep it one liner in the doc, just a script to run?

danielkorzekwa · 2026-02-16T10:28:01Z

examples/puzzletron/README.md

+  --model_id anymodel-hf \
+  --model_type completions \
+  --model_url http://0.0.0.0:8083/v1/completions/ \
+  --output_dir $PUZZLE_DIR/evals/mmlu_anymodel \


why is PUZZLE_DIR needed? a user already specifies it see above:
"...Specify the puzzle_dir, input_hf_model_path, dataset_path, intermediate_size_list, and target_memory arguments in the llama-3_1-8B_pruneffn_memory.yaml configuration file...."

feel free to propose the best way based on your puzzletron repo experience.

I've simplified the readme to use paths instead of those placeholders for the tutorial

danielkorzekwa · 2026-02-16T10:28:54Z

examples/puzzletron/README.md

+  --model_type completions \
+  --model_url http://0.0.0.0:8083/v1/completions/ \
+  --output_dir $PUZZLE_DIR/evals/mmlu_anymodel \
+  --overrides "config.params.parallelism=2,config.params.task=mmlu,config.params.extra.tokenizer=$CHECKPOINT_PATH,config.params.extra.tokenizer_backend=huggingface,config.params.request_timeout=6000"


can we make those overrides optional or use some defaults, to keep it simple?

I've removed most overrides

Signed-off-by: jrausch <jrausch@nvidia.com>

examples/puzzletron/README.md

Add support and documentation for AnyModel checkpoints with Nemo eval…

26f37ae

…uator Signed-off-by: jrausch <jrausch@nvidia.com>

j-rausch requested review from a team as code owners February 16, 2026 09:29

j-rausch requested review from kevalmorabia97 and removed request for a team February 16, 2026 09:29

danielkorzekwa reviewed Feb 16, 2026

View reviewed changes

Simplify puzzletron eval readme

32d69ab

Signed-off-by: jrausch <jrausch@nvidia.com>

danielkorzekwa approved these changes Feb 17, 2026

View reviewed changes

examples/puzzletron/README.md Show resolved Hide resolved

j-rausch merged commit f69fc7a into dkorzekwa/any_model Feb 17, 2026
7 checks passed

j-rausch deleted the jrausch/any_model_nveval_squash branch February 17, 2026 11:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support and documentation for AnyModel checkpoints with Nemo evaluator#894

Add support and documentation for AnyModel checkpoints with Nemo evaluator#894
j-rausch merged 2 commits intodkorzekwa/any_modelfrom
jrausch/any_model_nveval_squash

j-rausch commented Feb 16, 2026

Uh oh!

coderabbitai bot commented Feb 16, 2026

Review skipped

Uh oh!

Uh oh!

danielkorzekwa Feb 16, 2026

Uh oh!

j-rausch Feb 16, 2026

Uh oh!

danielkorzekwa Feb 16, 2026

Uh oh!

j-rausch Feb 16, 2026

Uh oh!

danielkorzekwa Feb 16, 2026

Uh oh!

danielkorzekwa Feb 16, 2026

Uh oh!

j-rausch Feb 16, 2026

Uh oh!

danielkorzekwa Feb 16, 2026

Uh oh!

j-rausch Feb 16, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

j-rausch commented Feb 16, 2026

Uh oh!

coderabbitai bot commented Feb 16, 2026

Review skipped

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants