Fix alert triage agent: switch to nemotron-3-nano model and improve prompts by hsin-c · Pull Request #1750 · NVIDIA/NeMo-Agent-Toolkit

hsin-c · 2026-03-05T06:44:27Z

Summary

Switch LLM model from nvidia/llama-3.3-nemotron-super-49b-v1.5 to nvidia/nemotron-3-nano-30b-a3b across all LLM configs in config_offline_mode.yml, with temperature: 0 and max_tokens: 16384 for deterministic and more complete outputs.
Improve triage agent prompts to produce more accurate diagnoses, especially for broad alerts like InstanceDown where the root cause could span hardware, network, or software — the agent is now instructed to use all available tools before drawing conclusions.
Clarify categorizer prompt definitions for false_positive and need_investigation categories to reduce misclassification — false_positive now requires all diagnostic checks to indicate a healthy system, and need_investigation is reserved for genuinely incomplete or conflicting data.
Fix warning in categorizer.py: replace result.text() (method call) with result.text (property access).
Fix typo: "CPR" → "CPU" in host_performance_check tool description.

Test plan

Ran pytest -v --run_slow --run_integration examples/advanced_agents/alert_triage_agent/tests/test_alert_triage_agent_workflow.py::test_full_workflow 10 times — all 10 runs passed with no intermittent failures.

Summary by CodeRabbit

Bug Fixes
- Enhanced alert categorization guidance with clearer instructions on diagnostic tool usage and deeper root cause analysis
- Improved categorization definitions to reduce false positives and better identify alerts requiring further investigation
Improvements
- Updated LLM model configurations with optimized parameters for enhanced performance

…e and a high max_token count; Prompt updates for the agent to work better with nemotron2-nano Signed-off-by: Hsin Chen <hsinc@nvidia.com>

copy-pr-bot · 2026-03-05T06:44:30Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-03-05T06:44:46Z

Walkthrough

The pull request updates an alert triage agent system by correcting a method call to property access in the categorizer logic, migrating LLM models to a different variant with adjusted hyperparameters in the offline configuration, and refining prompt guidance for improved diagnostic tool usage and classification accuracy.

Changes

Cohort / File(s)	Summary
Code Logic Fix `examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/categorizer.py`	Changed `result.text()` method call to `result.text` property access when computing report_content.
LLM Configuration Updates `examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/configs/config_offline_mode.yml`	Updated 6 LLM configurations: migrated model from nvidia/llama-3.3-nemotron-super-49b-v1.5 to nvidia/nemotron-3-nano-30b-a3b; set temperature to 0; increased max_tokens to 16384; removed top_p setting where applicable. Affects ata_agent_llm, tool_reasoning_llm, telemetry_metrics_analysis_agent_llm, maintenance_check_llm, categorizer_llm, and nim_rag_eval_llm.
Prompt Refinements `examples/advanced_agents/alert_triage_agent/src/nat_alert_triage_agent/prompts.py`	Expanded diagnostic tool guidance emphasizing comprehensive tool usage for broad alerts; clarified that network unreachability is a symptom not root cause; refined CategorizerPrompts definitions with more explicit, evidence-based explanations for false_positive and need_investigation categories.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main changes: switching to nemotron-3-nano model and improving prompts, with all three file modifications reflected.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mnajafian-nv

LGTM!

mnajafian-nv · 2026-03-05T16:07:30Z

/merge

willkill07 · 2026-03-05T21:36:21Z

/ok to test 27489a2

dagardner-nv · 2026-03-05T21:56:55Z

/ok to test 27489a2

dagardner-nv · 2026-03-05T21:57:37Z

/ok to test 27489a2

Update alert triage agent to use nemotron3-nano with a low temperatur…

27489a2

…e and a high max_token count; Prompt updates for the agent to work better with nemotron2-nano Signed-off-by: Hsin Chen <hsinc@nvidia.com>

hsin-c requested a review from a team as a code owner March 5, 2026 06:44

hsin-c added bug Something isn't working non-breaking Non-breaking change labels Mar 5, 2026

mnajafian-nv approved these changes Mar 5, 2026

View reviewed changes

dagardner-nv closed this Mar 5, 2026

dagardner-nv reopened this Mar 5, 2026

rapids-bot bot merged commit 4d5a991 into NVIDIA:release/1.5 Mar 5, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix alert triage agent: switch to nemotron-3-nano model and improve prompts#1750

Fix alert triage agent: switch to nemotron-3-nano model and improve prompts#1750
rapids-bot[bot] merged 1 commit intoNVIDIA:release/1.5from
hsin-c:fix/alert-triage-nemotron-config-and-prompt-update

hsin-c commented Mar 5, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Mar 5, 2026

Uh oh!

coderabbitai bot commented Mar 5, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

mnajafian-nv left a comment

Uh oh!

mnajafian-nv commented Mar 5, 2026

Uh oh!

willkill07 commented Mar 5, 2026

Uh oh!

dagardner-nv commented Mar 5, 2026 •

edited

Loading

Uh oh!

dagardner-nv commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hsin-c commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Mar 5, 2026

Uh oh!

coderabbitai bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

mnajafian-nv left a comment

Choose a reason for hiding this comment

Uh oh!

mnajafian-nv commented Mar 5, 2026

Uh oh!

willkill07 commented Mar 5, 2026

Uh oh!

dagardner-nv commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dagardner-nv commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hsin-c commented Mar 5, 2026 •

edited

Loading

coderabbitai bot commented Mar 5, 2026 •

edited

Loading

dagardner-nv commented Mar 5, 2026 •

edited

Loading