Update: latest nemo-rl#1273
Conversation
Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Wei Du <wedu@nvidia.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRemoved the local multi-stage Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labelsrun GPU tests Suggested reviewers
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
nemo_skills/pipeline/nemo_rl/grpo.py (1)
193-193:⚠️ Potential issue | 🔴 CriticalSame stale
/opt/NeMo-RLpath issue as insft.py— checkpoint conversion and averaging will fail.Both
get_checkpoint_convert_cmd(line 193) andget_checkpoint_average_cmd(line 218) still setUV_PROJECT=/opt/NeMo-RLwhile the training command was updated to/opt/nemo-rl.Proposed fix
def get_checkpoint_convert_cmd(output_dir, final_hf_path, step, backend, max_position_embeddings=None): - cmd = "export PYTHONPATH=$PYTHONPATH:/nemo_run/code && export UV_PROJECT=/opt/NeMo-RL && cd /nemo_run/code && " + cmd = "export PYTHONPATH=$PYTHONPATH:/nemo_run/code && export UV_PROJECT=/opt/nemo-rl && cd /nemo_run/code && "def get_checkpoint_average_cmd(output_dir, average_steps, backend, remove_checkpoints_after_average): - cmd = "export PYTHONPATH=$PYTHONPATH:/nemo_run/code && export UV_PROJECT=/opt/NeMo-RL && cd /nemo_run/code && " + cmd = "export PYTHONPATH=$PYTHONPATH:/nemo_run/code && export UV_PROJECT=/opt/nemo-rl && cd /nemo_run/code && "🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/pipeline/nemo_rl/grpo.py` at line 193, The shell command string assigned to cmd in get_checkpoint_convert_cmd (and likewise in get_checkpoint_average_cmd) uses the outdated UV_PROJECT=/opt/NeMo-RL which breaks checkpoint conversion/averaging; update the command to set UV_PROJECT=/opt/nemo-rl instead (preserving the rest of the export and cd sequence and the existing PYTHONPATH addition) so both functions construct the same project path used by the training command.nemo_skills/pipeline/nemo_rl/sft.py (1)
176-176:⚠️ Potential issue | 🔴 Critical
UV_PROJECTstill references the old/opt/NeMo-RLpath — will break checkpoint conversion.
get_cmd()(line 120) was updated to/opt/nemo-rl, butget_checkpoint_convert_cmdon line 176 andget_checkpoint_average_cmdon line 201 still setUV_PROJECT=/opt/NeMo-RL. Since the Dockerfile now uses a pre-built image where the project lives at/opt/nemo-rl, these commands will fail at runtime.Proposed fix
def get_checkpoint_convert_cmd(output_dir, final_hf_path, step, backend, max_position_embeddings=None): - cmd = "export PYTHONPATH=$PYTHONPATH:/nemo_run/code && export UV_PROJECT=/opt/NeMo-RL && cd /nemo_run/code && " + cmd = "export PYTHONPATH=$PYTHONPATH:/nemo_run/code && export UV_PROJECT=/opt/nemo-rl && cd /nemo_run/code && "def get_checkpoint_average_cmd(output_dir, average_steps, backend, remove_checkpoints_after_average): - cmd = "export PYTHONPATH=$PYTHONPATH:/nemo_run/code && export UV_PROJECT=/opt/NeMo-RL && cd /nemo_run/code && " + cmd = "export PYTHONPATH=$PYTHONPATH:/nemo_run/code && export UV_PROJECT=/opt/nemo-rl && cd /nemo_run/code && "🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/pipeline/nemo_rl/sft.py` at line 176, The UV_PROJECT environment variable is still set to the old /opt/NeMo-RL in get_checkpoint_convert_cmd and get_checkpoint_average_cmd causing runtime failures; update both commands to use the new path /opt/nemo-rl to match get_cmd's change. Locate the string assembly in the functions get_checkpoint_convert_cmd and get_checkpoint_average_cmd and replace UV_PROJECT=/opt/NeMo-RL with UV_PROJECT=/opt/nemo-rl so the export line and subsequent cd use the correct project directory.
🧹 Nitpick comments (3)
dockerfiles/Dockerfile.nemo-rl (1)
6-8: Nightly tag is mutable — builds are not reproducible.
nvcr.io/nvidian/nemo-rl:nightlywill resolve to different images over time. Consider pinning to a specific digest or versioned tag for reproducible builds, or at minimum document the expected nightly version in the PR/commit message.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dockerfiles/Dockerfile.nemo-rl` around lines 6 - 8, The Dockerfile uses a mutable nightly image via ARG NEMO_RL_IMAGE and FROM ${NEMO_RL_IMAGE}; to make builds reproducible, change the ARG default to a fixed versioned tag or an immutable digest (e.g., set ARG NEMO_RL_IMAGE=<repo>:<version> or <repo>@sha256:<digest>) and update any documentation/PR/commit message to record the chosen version/digest; ensure the FROM line continues to reference ${NEMO_RL_IMAGE} so overrides remain possible and note the pinned value in the PR for traceability.nemo_skills/training/nemo_rl/configs/grpo.yaml (1)
34-39: Misleading indentation on the comment at line 39.The comment
# Reinforce++-baseline specific...is indented deeper thanminus_baseline, making it look like a child property. While YAML ignores comment indentation for parsing, this is visually confusing. Align it with the otheradv_estimatorfields.Suggested fix
adv_estimator: name: "grpo" # Use "reinforce_plus_plus" for Reinforce++ estimator normalize_rewards: true use_leave_one_out_baseline: false minus_baseline: true - # Reinforce++-baseline specific: subtract per-prompt mean baseline + # Reinforce++-baseline specific: subtract per-prompt mean baseline🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/training/nemo_rl/configs/grpo.yaml` around lines 34 - 39, The inline comment under adv_estimator is mis-indented and visually appears as a child of minus_baseline; move the comment so it aligns with the other adv_estimator fields (same indentation level as name, normalize_rewards, use_leave_one_out_baseline, minus_baseline) and reference the Reinforce++ baseline context next to minus_baseline to make it clear it's describing that flag (check adv_estimator and minus_baseline in the diff).nemo_skills/training/nemo_rl/configs/sft.yaml (1)
86-86: Minor inconsistency:noneis unquoted here but quoted ingrpo.yaml.On Line 86
moe_router_load_balancing_type: noneis unquoted, whereas the same key ingrpo.yaml(line 128) uses"none". While YAML treats unquotednoneas a string (unlikenull), quoting it consistently avoids ambiguity and aligns with the GRPO config.Suggested fix
- moe_router_load_balancing_type: none + moe_router_load_balancing_type: "none"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemo_skills/training/nemo_rl/configs/sft.yaml` at line 86, The YAML key moe_router_load_balancing_type is set to an unquoted none; make it a quoted string to match the GRPO config and avoid ambiguity by changing the value to "none" for consistency with the other config (search for moe_router_load_balancing_type to locate the setting).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/gpu-tests/test-local.yaml`:
- Line 25: The container image for the nemo-rl entry references a personal
registry path
("gitlab-master.nvidia.com/.../igitman/nemo-skills-nemo-rl:latest"); update the
nemo-rl image reference in tests/gpu-tests/test-local.yaml to a shared/team or
official registry path (or a public image) that your project controls (e.g.,
replace the "igitman" namespace with the team/project namespace or a canonical
image name) so the test config points to a stable, accessible image.
- Line 32: Remove the hard-coded personal mount '/home/wedu:/home/wedu' from
tests/gpu-tests/test-local.yaml; edit the mounts list to delete that entry and,
if needed for local testing, replace it with a documented placeholder (e.g. a
commented example or an env-var-based value) so the shared test config contains
no user-specific paths.
---
Outside diff comments:
In `@nemo_skills/pipeline/nemo_rl/grpo.py`:
- Line 193: The shell command string assigned to cmd in
get_checkpoint_convert_cmd (and likewise in get_checkpoint_average_cmd) uses the
outdated UV_PROJECT=/opt/NeMo-RL which breaks checkpoint conversion/averaging;
update the command to set UV_PROJECT=/opt/nemo-rl instead (preserving the rest
of the export and cd sequence and the existing PYTHONPATH addition) so both
functions construct the same project path used by the training command.
In `@nemo_skills/pipeline/nemo_rl/sft.py`:
- Line 176: The UV_PROJECT environment variable is still set to the old
/opt/NeMo-RL in get_checkpoint_convert_cmd and get_checkpoint_average_cmd
causing runtime failures; update both commands to use the new path /opt/nemo-rl
to match get_cmd's change. Locate the string assembly in the functions
get_checkpoint_convert_cmd and get_checkpoint_average_cmd and replace
UV_PROJECT=/opt/NeMo-RL with UV_PROJECT=/opt/nemo-rl so the export line and
subsequent cd use the correct project directory.
---
Nitpick comments:
In `@dockerfiles/Dockerfile.nemo-rl`:
- Around line 6-8: The Dockerfile uses a mutable nightly image via ARG
NEMO_RL_IMAGE and FROM ${NEMO_RL_IMAGE}; to make builds reproducible, change the
ARG default to a fixed versioned tag or an immutable digest (e.g., set ARG
NEMO_RL_IMAGE=<repo>:<version> or <repo>@sha256:<digest>) and update any
documentation/PR/commit message to record the chosen version/digest; ensure the
FROM line continues to reference ${NEMO_RL_IMAGE} so overrides remain possible
and note the pinned value in the PR for traceability.
In `@nemo_skills/training/nemo_rl/configs/grpo.yaml`:
- Around line 34-39: The inline comment under adv_estimator is mis-indented and
visually appears as a child of minus_baseline; move the comment so it aligns
with the other adv_estimator fields (same indentation level as name,
normalize_rewards, use_leave_one_out_baseline, minus_baseline) and reference the
Reinforce++ baseline context next to minus_baseline to make it clear it's
describing that flag (check adv_estimator and minus_baseline in the diff).
In `@nemo_skills/training/nemo_rl/configs/sft.yaml`:
- Line 86: The YAML key moe_router_load_balancing_type is set to an unquoted
none; make it a quoted string to match the GRPO config and avoid ambiguity by
changing the value to "none" for consistency with the other config (search for
moe_router_load_balancing_type to locate the setting).
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
dockerfiles/Dockerfile.nemo-rlnemo_skills/pipeline/nemo_rl/grpo.pynemo_skills/pipeline/nemo_rl/sft.pynemo_skills/training/nemo_rl/configs/grpo.yamlnemo_skills/training/nemo_rl/configs/sft.yamlnemo_skills/training/nemo_rl/start_sft.pytests/gpu-tests/test-local.yaml
💤 Files with no reviewable changes (1)
- nemo_skills/training/nemo_rl/start_sft.py
|
@Kipok @gwarmstrong |
|
@wedu-nvidia can we just use their commit id directly? Looks like nightly images are tagged with something like this nvcr.io/nvidian/nemo-rl:9148186-44694499. This way we can just reference this tag in the cluster config and we don't need to have nemo-rl dockerfile on our side at all if we aren't making changes |
Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: root <wedu@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
6742a9e to
63f043d
Compare
I removed our docker file and use their docker id directly now. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
dockerfiles/README.md (1)
37-40: Clarify wherevllm[audio]is introduced.Line [39] is slightly ambiguous about how the extra dependency is added. A tiny wording tweak can make this unambiguous for new contributors.
✏️ Suggested wording tweak
-## Building vllm image - -We use official `vllm/vllm-openai:v0.10.2` image with the additional `vllm[audio]` dependencies. +## Building vllm image + +`dockerfiles/Dockerfile.vllm` is based on official `vllm/vllm-openai:v0.10.2` and adds `vllm[audio]`.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dockerfiles/README.md` around lines 37 - 40, The README line is ambiguous about how the extra dependency is added; update the sentence referencing "vllm/vllm-openai:v0.10.2" and "vllm[audio]" to explicitly state that the base image is vllm/vllm-openai:v0.10.2 and that the vllm[audio] extras are installed into the image (e.g., via pip install or in the Dockerfile). Locate the string "vllm/vllm-openai:v0.10.2" and the mention of "vllm[audio]" in the README and replace the line with a clear wording such as: the image is based on vllm/vllm-openai:v0.10.2 and the vllm[audio] extras are installed into the image (describe method used in the Dockerfile). Ensure the new sentence explicitly references the installation mechanism (Dockerfile/pip) so contributors know where the extra is introduced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@dockerfiles/README.md`:
- Around line 37-40: The README line is ambiguous about how the extra dependency
is added; update the sentence referencing "vllm/vllm-openai:v0.10.2" and
"vllm[audio]" to explicitly state that the base image is
vllm/vllm-openai:v0.10.2 and that the vllm[audio] extras are installed into the
image (e.g., via pip install or in the Dockerfile). Locate the string
"vllm/vllm-openai:v0.10.2" and the mention of "vllm[audio]" in the README and
replace the line with a clear wording such as: the image is based on
vllm/vllm-openai:v0.10.2 and the vllm[audio] extras are installed into the image
(describe method used in the Dockerfile). Ensure the new sentence explicitly
references the installation mechanism (Dockerfile/pip) so contributors know
where the extra is introduced.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
cluster_configs/example-local.yamldockerfiles/Dockerfile.nemo-rldockerfiles/README.mdnemo_skills/__init__.pytests/gpu-tests/test-local.yaml
💤 Files with no reviewable changes (1)
- dockerfiles/Dockerfile.nemo-rl
🚧 Files skipped from review as they are similar to previous changes (1)
- nemo_skills/init.py
Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Wei Du <wedu@nvidia.com>
|
@wedu-nvidia let's merge this monday so we don't make a large change to the containers right over the weekend |
Sure |
Signed-off-by: Igor Gitman <igitman@nvidia.com>
|
looks like that nightly container was removed? Maybe we switch to latest release instead @wedu-nvidia |
Summary by CodeRabbit
Chores
Configuration
Documentation