Add support and documentation for AnyModel checkpoints with Nemo evaluator#894
Conversation
…uator Signed-off-by: jrausch <jrausch@nvidia.com>
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. 🗂️ Base branches to auto review (3)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
examples/puzzletron/README.md
Outdated
| ### Local Evaluation with NeMo Evaluator (AnyModel) | ||
|
|
||
| #### Example Results | ||
| AnyModel checkpoints are currently supported via the patched NeMo Evaluator deployable |
There was a problem hiding this comment.
Via the patched NeMo Evaluator deployable - this is too technical, keep the tutorial simple for and users, e.g. just write about: deploys a local OpenAI-style completions endpoint that evaluation can be run against
details have under some subpage with details for evaluation
examples/puzzletron/README.md
Outdated
| > **Note:** This flow requires Ray. If it is missing, install it in the container/venv: | ||
| > | ||
| > ```bash | ||
| > pip install ray |
There was a problem hiding this comment.
there is a pip install section on top of this tutorial, integrate?
There was a problem hiding this comment.
I'll move it into requirements.txt
examples/puzzletron/README.md
Outdated
| The plot shows how token accuracy changes with different compression rates. Higher compression (0.5 = 50% of original memory) reduces accuracy, while lower compression maintains accuracy closer to the teacher model. | ||
| ```bash | ||
| # Repo root (not puzzle_dir) | ||
| export MODELOPT_WORKDIR=/path/to/Model-Optimizer |
There was a problem hiding this comment.
keep it one liner in the doc, just a script to run?
examples/puzzletron/README.md
Outdated
| --model_id anymodel-hf \ | ||
| --model_type completions \ | ||
| --model_url http://0.0.0.0:8083/v1/completions/ \ | ||
| --output_dir $PUZZLE_DIR/evals/mmlu_anymodel \ |
There was a problem hiding this comment.
why is PUZZLE_DIR needed? a user already specifies it see above:
"...Specify the puzzle_dir, input_hf_model_path, dataset_path, intermediate_size_list, and target_memory arguments in the llama-3_1-8B_pruneffn_memory.yaml configuration file...."
feel free to propose the best way based on your puzzletron repo experience.
There was a problem hiding this comment.
I've simplified the readme to use paths instead of those placeholders for the tutorial
examples/puzzletron/README.md
Outdated
| --model_type completions \ | ||
| --model_url http://0.0.0.0:8083/v1/completions/ \ | ||
| --output_dir $PUZZLE_DIR/evals/mmlu_anymodel \ | ||
| --overrides "config.params.parallelism=2,config.params.task=mmlu,config.params.extra.tokenizer=$CHECKPOINT_PATH,config.params.extra.tokenizer_backend=huggingface,config.params.request_timeout=6000" |
There was a problem hiding this comment.
can we make those overrides optional or use some defaults, to keep it simple?
Signed-off-by: jrausch <jrausch@nvidia.com>
This PR adds Nemo Evaluator support to the AnyModel branch. It includes documentation and a deployment script that allow for evaluation of AnyModel Puzzletron checkpoints with Nemo Evaluator.
We assume development on a GPU node, following the current tutorial style, so we don't rely on Slurm-based deployment/evaluation, but instead use direct evaluation via
eval-factory run_eval.