Skip to content

Update: latest nemo-rl#1273

Open
wedu-nvidia wants to merge 16 commits intomainfrom
wedu/nemo-rl-latest
Open

Update: latest nemo-rl#1273
wedu-nvidia wants to merge 16 commits intomainfrom
wedu/nemo-rl-latest

Conversation

@wedu-nvidia
Copy link
Copy Markdown
Collaborator

@wedu-nvidia wedu-nvidia commented Feb 24, 2026

Summary by CodeRabbit

  • Chores

    • Switched Nemo RL to a prebuilt NGC container image and removed the local multi-stage Dockerfile; updated runtime references and test/cluster configs.
    • Standardized runtime/project paths for NeMo RL and simplified SFT training invocation and task-spec handling.
  • Configuration

    • Updated training configs: validation-at-end toggle, smaller micro-batch/logprob sizes, new reward/loss tuning fields, and additional MoE/dtensor options.
  • Documentation

    • Added guidance for using the recommended prebuilt images.

Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Wei Du <wedu@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Removed the local multi-stage nemo-rl Dockerfile, switched runtime references to NVIDIA's nvcr.io/nvidian/nemo-rl:9148186-44694499, updated runtime paths to /opt/nemo-rl and PYTHONPATH entries, adjusted SFT/GRPO configs and commands, and removed an sft_task_spec argument at one callsite.

Changes

Cohort / File(s) Summary
Docker image & docs
dockerfiles/Dockerfile.nemo-rl, dockerfiles/README.md, nemo_skills/__init__.py, tests/gpu-tests/test-local.yaml, cluster_configs/example-local.yaml
Deleted local Dockerfile.nemo-rl; replaced references with nvcr.io/nvidian/nemo-rl:9148186-44694499; added README guidance about using the NGC image and vllm build notes.
Pipeline environment / commands
nemo_skills/pipeline/nemo_rl/grpo.py, nemo_skills/pipeline/nemo_rl/sft.py
Adjusted constructed shell commands: changed /opt/NeMo-RL/opt/nemo-rl, added /nemo_run/code to PYTHONPATH, and simplified SFT uv run invocation (removed conditional rebuild flag).
Training configurations
nemo_skills/training/nemo_rl/configs/grpo.yaml, nemo_skills/training/nemo_rl/configs/sft.yaml
Added val_at_end; introduced MoE-related keys (moe_enable_deepep, moe_token_dispatcher_type, moe_shared_expert_overlap); GRPO batch sizes reduced (4→1) and new reward/loss/truncation fields; SFT max_grad_norm changed 0.0→1.0.
Training script callsite
nemo_skills/training/nemo_rl/start_sft.py
Removed the sft_task_spec argument from the sft_train callsite.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • #891
  • #969
  • #1198

Suggested labels

run GPU tests

Suggested reviewers

  • Kipok
  • activatedgeek
🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Update: latest nemo-rl' is vague and generic, using non-descriptive language that lacks specific detail about the primary changes in the changeset. Consider a more specific title that captures the main change, such as 'Use upstream NGC nemo-rl image instead of local Dockerfile' or 'Replace local nemo-rl Dockerfile with upstream NGC image'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch wedu/nemo-rl-latest

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
nemo_skills/pipeline/nemo_rl/grpo.py (1)

193-193: ⚠️ Potential issue | 🔴 Critical

Same stale /opt/NeMo-RL path issue as in sft.py — checkpoint conversion and averaging will fail.

Both get_checkpoint_convert_cmd (line 193) and get_checkpoint_average_cmd (line 218) still set UV_PROJECT=/opt/NeMo-RL while the training command was updated to /opt/nemo-rl.

Proposed fix
 def get_checkpoint_convert_cmd(output_dir, final_hf_path, step, backend, max_position_embeddings=None):
-    cmd = "export PYTHONPATH=$PYTHONPATH:/nemo_run/code && export UV_PROJECT=/opt/NeMo-RL && cd /nemo_run/code && "
+    cmd = "export PYTHONPATH=$PYTHONPATH:/nemo_run/code && export UV_PROJECT=/opt/nemo-rl && cd /nemo_run/code && "
 def get_checkpoint_average_cmd(output_dir, average_steps, backend, remove_checkpoints_after_average):
-    cmd = "export PYTHONPATH=$PYTHONPATH:/nemo_run/code && export UV_PROJECT=/opt/NeMo-RL && cd /nemo_run/code && "
+    cmd = "export PYTHONPATH=$PYTHONPATH:/nemo_run/code && export UV_PROJECT=/opt/nemo-rl && cd /nemo_run/code && "
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/pipeline/nemo_rl/grpo.py` at line 193, The shell command string
assigned to cmd in get_checkpoint_convert_cmd (and likewise in
get_checkpoint_average_cmd) uses the outdated UV_PROJECT=/opt/NeMo-RL which
breaks checkpoint conversion/averaging; update the command to set
UV_PROJECT=/opt/nemo-rl instead (preserving the rest of the export and cd
sequence and the existing PYTHONPATH addition) so both functions construct the
same project path used by the training command.
nemo_skills/pipeline/nemo_rl/sft.py (1)

176-176: ⚠️ Potential issue | 🔴 Critical

UV_PROJECT still references the old /opt/NeMo-RL path — will break checkpoint conversion.

get_cmd() (line 120) was updated to /opt/nemo-rl, but get_checkpoint_convert_cmd on line 176 and get_checkpoint_average_cmd on line 201 still set UV_PROJECT=/opt/NeMo-RL. Since the Dockerfile now uses a pre-built image where the project lives at /opt/nemo-rl, these commands will fail at runtime.

Proposed fix
 def get_checkpoint_convert_cmd(output_dir, final_hf_path, step, backend, max_position_embeddings=None):
-    cmd = "export PYTHONPATH=$PYTHONPATH:/nemo_run/code && export UV_PROJECT=/opt/NeMo-RL && cd /nemo_run/code && "
+    cmd = "export PYTHONPATH=$PYTHONPATH:/nemo_run/code && export UV_PROJECT=/opt/nemo-rl && cd /nemo_run/code && "
 def get_checkpoint_average_cmd(output_dir, average_steps, backend, remove_checkpoints_after_average):
-    cmd = "export PYTHONPATH=$PYTHONPATH:/nemo_run/code && export UV_PROJECT=/opt/NeMo-RL && cd /nemo_run/code && "
+    cmd = "export PYTHONPATH=$PYTHONPATH:/nemo_run/code && export UV_PROJECT=/opt/nemo-rl && cd /nemo_run/code && "
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/pipeline/nemo_rl/sft.py` at line 176, The UV_PROJECT environment
variable is still set to the old /opt/NeMo-RL in get_checkpoint_convert_cmd and
get_checkpoint_average_cmd causing runtime failures; update both commands to use
the new path /opt/nemo-rl to match get_cmd's change. Locate the string assembly
in the functions get_checkpoint_convert_cmd and get_checkpoint_average_cmd and
replace UV_PROJECT=/opt/NeMo-RL with UV_PROJECT=/opt/nemo-rl so the export line
and subsequent cd use the correct project directory.
🧹 Nitpick comments (3)
dockerfiles/Dockerfile.nemo-rl (1)

6-8: Nightly tag is mutable — builds are not reproducible.

nvcr.io/nvidian/nemo-rl:nightly will resolve to different images over time. Consider pinning to a specific digest or versioned tag for reproducible builds, or at minimum document the expected nightly version in the PR/commit message.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dockerfiles/Dockerfile.nemo-rl` around lines 6 - 8, The Dockerfile uses a
mutable nightly image via ARG NEMO_RL_IMAGE and FROM ${NEMO_RL_IMAGE}; to make
builds reproducible, change the ARG default to a fixed versioned tag or an
immutable digest (e.g., set ARG NEMO_RL_IMAGE=<repo>:<version> or
<repo>@sha256:<digest>) and update any documentation/PR/commit message to record
the chosen version/digest; ensure the FROM line continues to reference
${NEMO_RL_IMAGE} so overrides remain possible and note the pinned value in the
PR for traceability.
nemo_skills/training/nemo_rl/configs/grpo.yaml (1)

34-39: Misleading indentation on the comment at line 39.

The comment # Reinforce++-baseline specific... is indented deeper than minus_baseline, making it look like a child property. While YAML ignores comment indentation for parsing, this is visually confusing. Align it with the other adv_estimator fields.

Suggested fix
   adv_estimator:
     name: "grpo"  # Use "reinforce_plus_plus" for Reinforce++ estimator
     normalize_rewards: true
     use_leave_one_out_baseline: false
     minus_baseline: true
-      # Reinforce++-baseline specific: subtract per-prompt mean baseline
+    # Reinforce++-baseline specific: subtract per-prompt mean baseline
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/training/nemo_rl/configs/grpo.yaml` around lines 34 - 39, The
inline comment under adv_estimator is mis-indented and visually appears as a
child of minus_baseline; move the comment so it aligns with the other
adv_estimator fields (same indentation level as name, normalize_rewards,
use_leave_one_out_baseline, minus_baseline) and reference the Reinforce++
baseline context next to minus_baseline to make it clear it's describing that
flag (check adv_estimator and minus_baseline in the diff).
nemo_skills/training/nemo_rl/configs/sft.yaml (1)

86-86: Minor inconsistency: none is unquoted here but quoted in grpo.yaml.

On Line 86 moe_router_load_balancing_type: none is unquoted, whereas the same key in grpo.yaml (line 128) uses "none". While YAML treats unquoted none as a string (unlike null), quoting it consistently avoids ambiguity and aligns with the GRPO config.

Suggested fix
-    moe_router_load_balancing_type: none
+    moe_router_load_balancing_type: "none"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/training/nemo_rl/configs/sft.yaml` at line 86, The YAML key
moe_router_load_balancing_type is set to an unquoted none; make it a quoted
string to match the GRPO config and avoid ambiguity by changing the value to
"none" for consistency with the other config (search for
moe_router_load_balancing_type to locate the setting).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/gpu-tests/test-local.yaml`:
- Line 25: The container image for the nemo-rl entry references a personal
registry path
("gitlab-master.nvidia.com/.../igitman/nemo-skills-nemo-rl:latest"); update the
nemo-rl image reference in tests/gpu-tests/test-local.yaml to a shared/team or
official registry path (or a public image) that your project controls (e.g.,
replace the "igitman" namespace with the team/project namespace or a canonical
image name) so the test config points to a stable, accessible image.
- Line 32: Remove the hard-coded personal mount '/home/wedu:/home/wedu' from
tests/gpu-tests/test-local.yaml; edit the mounts list to delete that entry and,
if needed for local testing, replace it with a documented placeholder (e.g. a
commented example or an env-var-based value) so the shared test config contains
no user-specific paths.

---

Outside diff comments:
In `@nemo_skills/pipeline/nemo_rl/grpo.py`:
- Line 193: The shell command string assigned to cmd in
get_checkpoint_convert_cmd (and likewise in get_checkpoint_average_cmd) uses the
outdated UV_PROJECT=/opt/NeMo-RL which breaks checkpoint conversion/averaging;
update the command to set UV_PROJECT=/opt/nemo-rl instead (preserving the rest
of the export and cd sequence and the existing PYTHONPATH addition) so both
functions construct the same project path used by the training command.

In `@nemo_skills/pipeline/nemo_rl/sft.py`:
- Line 176: The UV_PROJECT environment variable is still set to the old
/opt/NeMo-RL in get_checkpoint_convert_cmd and get_checkpoint_average_cmd
causing runtime failures; update both commands to use the new path /opt/nemo-rl
to match get_cmd's change. Locate the string assembly in the functions
get_checkpoint_convert_cmd and get_checkpoint_average_cmd and replace
UV_PROJECT=/opt/NeMo-RL with UV_PROJECT=/opt/nemo-rl so the export line and
subsequent cd use the correct project directory.

---

Nitpick comments:
In `@dockerfiles/Dockerfile.nemo-rl`:
- Around line 6-8: The Dockerfile uses a mutable nightly image via ARG
NEMO_RL_IMAGE and FROM ${NEMO_RL_IMAGE}; to make builds reproducible, change the
ARG default to a fixed versioned tag or an immutable digest (e.g., set ARG
NEMO_RL_IMAGE=<repo>:<version> or <repo>@sha256:<digest>) and update any
documentation/PR/commit message to record the chosen version/digest; ensure the
FROM line continues to reference ${NEMO_RL_IMAGE} so overrides remain possible
and note the pinned value in the PR for traceability.

In `@nemo_skills/training/nemo_rl/configs/grpo.yaml`:
- Around line 34-39: The inline comment under adv_estimator is mis-indented and
visually appears as a child of minus_baseline; move the comment so it aligns
with the other adv_estimator fields (same indentation level as name,
normalize_rewards, use_leave_one_out_baseline, minus_baseline) and reference the
Reinforce++ baseline context next to minus_baseline to make it clear it's
describing that flag (check adv_estimator and minus_baseline in the diff).

In `@nemo_skills/training/nemo_rl/configs/sft.yaml`:
- Line 86: The YAML key moe_router_load_balancing_type is set to an unquoted
none; make it a quoted string to match the GRPO config and avoid ambiguity by
changing the value to "none" for consistency with the other config (search for
moe_router_load_balancing_type to locate the setting).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6da2219 and 9a3da6b.

📒 Files selected for processing (7)
  • dockerfiles/Dockerfile.nemo-rl
  • nemo_skills/pipeline/nemo_rl/grpo.py
  • nemo_skills/pipeline/nemo_rl/sft.py
  • nemo_skills/training/nemo_rl/configs/grpo.yaml
  • nemo_skills/training/nemo_rl/configs/sft.yaml
  • nemo_skills/training/nemo_rl/start_sft.py
  • tests/gpu-tests/test-local.yaml
💤 Files with no reviewable changes (1)
  • nemo_skills/training/nemo_rl/start_sft.py

Comment thread tests/gpu-tests/test-local.yaml Outdated
Comment thread tests/gpu-tests/test-local.yaml Outdated
@wedu-nvidia
Copy link
Copy Markdown
Collaborator Author

@Kipok @gwarmstrong
Do you think we can use their nightly container directly, or we still keep our previous logic so that we cna track commit?

@Kipok
Copy link
Copy Markdown
Collaborator

Kipok commented Feb 25, 2026

@wedu-nvidia can we just use their commit id directly? Looks like nightly images are tagged with something like this nvcr.io/nvidian/nemo-rl:9148186-44694499. This way we can just reference this tag in the cluster config and we don't need to have nemo-rl dockerfile on our side at all if we aren't making changes

wedu-nvidia and others added 2 commits February 24, 2026 20:09
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: root <wedu@nvidia.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
@wedu-nvidia
Copy link
Copy Markdown
Collaborator Author

wedu-nvidia commented Feb 25, 2026

@wedu-nvidia can we just use their commit id directly? Looks like nightly images are tagged with something like this nvcr.io/nvidian/nemo-rl:9148186-44694499. This way we can just reference this tag in the cluster config and we don't need to have nemo-rl dockerfile on our side at all if we aren't making changes

I removed our docker file and use their docker id directly now.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
dockerfiles/README.md (1)

37-40: Clarify where vllm[audio] is introduced.

Line [39] is slightly ambiguous about how the extra dependency is added. A tiny wording tweak can make this unambiguous for new contributors.

✏️ Suggested wording tweak
-## Building vllm image
-
-We use official `vllm/vllm-openai:v0.10.2` image with the additional `vllm[audio]` dependencies.
+## Building vllm image
+
+`dockerfiles/Dockerfile.vllm` is based on official `vllm/vllm-openai:v0.10.2` and adds `vllm[audio]`.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dockerfiles/README.md` around lines 37 - 40, The README line is ambiguous
about how the extra dependency is added; update the sentence referencing
"vllm/vllm-openai:v0.10.2" and "vllm[audio]" to explicitly state that the base
image is vllm/vllm-openai:v0.10.2 and that the vllm[audio] extras are installed
into the image (e.g., via pip install or in the Dockerfile). Locate the string
"vllm/vllm-openai:v0.10.2" and the mention of "vllm[audio]" in the README and
replace the line with a clear wording such as: the image is based on
vllm/vllm-openai:v0.10.2 and the vllm[audio] extras are installed into the image
(describe method used in the Dockerfile). Ensure the new sentence explicitly
references the installation mechanism (Dockerfile/pip) so contributors know
where the extra is introduced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@dockerfiles/README.md`:
- Around line 37-40: The README line is ambiguous about how the extra dependency
is added; update the sentence referencing "vllm/vllm-openai:v0.10.2" and
"vllm[audio]" to explicitly state that the base image is
vllm/vllm-openai:v0.10.2 and that the vllm[audio] extras are installed into the
image (e.g., via pip install or in the Dockerfile). Locate the string
"vllm/vllm-openai:v0.10.2" and the mention of "vllm[audio]" in the README and
replace the line with a clear wording such as: the image is based on
vllm/vllm-openai:v0.10.2 and the vllm[audio] extras are installed into the image
(describe method used in the Dockerfile). Ensure the new sentence explicitly
references the installation mechanism (Dockerfile/pip) so contributors know
where the extra is introduced.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6742a9e and fb0b06d.

📒 Files selected for processing (5)
  • cluster_configs/example-local.yaml
  • dockerfiles/Dockerfile.nemo-rl
  • dockerfiles/README.md
  • nemo_skills/__init__.py
  • tests/gpu-tests/test-local.yaml
💤 Files with no reviewable changes (1)
  • dockerfiles/Dockerfile.nemo-rl
🚧 Files skipped from review as they are similar to previous changes (1)
  • nemo_skills/init.py

Copy link
Copy Markdown
Collaborator

@gwarmstrong gwarmstrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor comment

Comment thread dockerfiles/README.md Outdated
wedu-nvidia and others added 2 commits February 26, 2026 08:22
Co-authored-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: Wei Du <wedu@nvidia.com>
@gwarmstrong
Copy link
Copy Markdown
Collaborator

@wedu-nvidia let's merge this monday so we don't make a large change to the containers right over the weekend

@wedu-nvidia
Copy link
Copy Markdown
Collaborator Author

@wedu-nvidia let's merge this monday so we don't make a large change to the containers right over the weekend

Sure

Signed-off-by: Igor Gitman <igitman@nvidia.com>
@Kipok
Copy link
Copy Markdown
Collaborator

Kipok commented Mar 23, 2026

looks like that nightly container was removed? Maybe we switch to latest release instead @wedu-nvidia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants