Skip to content

Add support and documentation for AnyModel checkpoints with Nemo evaluator#894

Merged
j-rausch merged 2 commits intodkorzekwa/any_modelfrom
jrausch/any_model_nveval_squash
Feb 17, 2026
Merged

Add support and documentation for AnyModel checkpoints with Nemo evaluator#894
j-rausch merged 2 commits intodkorzekwa/any_modelfrom
jrausch/any_model_nveval_squash

Conversation

@j-rausch
Copy link

This PR adds Nemo Evaluator support to the AnyModel branch. It includes documentation and a deployment script that allow for evaluation of AnyModel Puzzletron checkpoints with Nemo Evaluator.

We assume development on a GPU node, following the current tutorial style, so we don't rely on Slurm-based deployment/evaluation, but instead use direct evaluation via eval-factory run_eval.

…uator

Signed-off-by: jrausch <jrausch@nvidia.com>
@j-rausch j-rausch requested review from a team as code owners February 16, 2026 09:29
@j-rausch j-rausch requested review from kevalmorabia97 and removed request for a team February 16, 2026 09:29
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 16, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (3)
  • main
  • release/.*
  • feature/.*

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch jrausch/any_model_nveval_squash

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

### Local Evaluation with NeMo Evaluator (AnyModel)

#### Example Results
AnyModel checkpoints are currently supported via the patched NeMo Evaluator deployable

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Via the patched NeMo Evaluator deployable - this is too technical, keep the tutorial simple for and users, e.g. just write about: deploys a local OpenAI-style completions endpoint that evaluation can be run against

details have under some subpage with details for evaluation

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified in latest commit

> **Note:** This flow requires Ray. If it is missing, install it in the container/venv:
>
> ```bash
> pip install ray

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a pip install section on top of this tutorial, integrate?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move it into requirements.txt

The plot shows how token accuracy changes with different compression rates. Higher compression (0.5 = 50% of original memory) reduces accuracy, while lower compression maintains accuracy closer to the teacher model.
```bash
# Repo root (not puzzle_dir)
export MODELOPT_WORKDIR=/path/to/Model-Optimizer

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep it one liner in the doc, just a script to run?

--model_id anymodel-hf \
--model_type completions \
--model_url http://0.0.0.0:8083/v1/completions/ \
--output_dir $PUZZLE_DIR/evals/mmlu_anymodel \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is PUZZLE_DIR needed? a user already specifies it see above:
"...Specify the puzzle_dir, input_hf_model_path, dataset_path, intermediate_size_list, and target_memory arguments in the llama-3_1-8B_pruneffn_memory.yaml configuration file...."

feel free to propose the best way based on your puzzletron repo experience.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've simplified the readme to use paths instead of those placeholders for the tutorial

--model_type completions \
--model_url http://0.0.0.0:8083/v1/completions/ \
--output_dir $PUZZLE_DIR/evals/mmlu_anymodel \
--overrides "config.params.parallelism=2,config.params.task=mmlu,config.params.extra.tokenizer=$CHECKPOINT_PATH,config.params.extra.tokenizer_backend=huggingface,config.params.request_timeout=6000"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make those overrides optional or use some defaults, to keep it simple?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed most overrides

Signed-off-by: jrausch <jrausch@nvidia.com>
@j-rausch j-rausch merged commit f69fc7a into dkorzekwa/any_model Feb 17, 2026
7 checks passed
@j-rausch j-rausch deleted the jrausch/any_model_nveval_squash branch February 17, 2026 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants