NeMoCode

Agentic coding CLI for NVIDIA NIM. Reads your code, makes edits, runs commands — powered by any model on the NIM API or your own GPU via vLLM, SGLang, or TensorRT-LLM.

Community project — not affiliated with or endorsed by NVIDIA.

Install

From PyPI:

pip install nemocode

Or from source (editable):

pip install -e .

Quick Start

Run the guided setup wizard:

nemo setup

The wizard defaults to hosted NVIDIA NIM, prompts for NVIDIA_API_KEY, and can also configure a local vllm, sglang, or trt-llm backend for you.

Hosted NVIDIA NIM (default)

Get a free API key from build.nvidia.com:

export NVIDIA_API_KEY="nvapi-..."
nemo code

Hosted Nemotron endpoints use NVIDIA_API_KEY by default. The setup wizard can store it in your system keyring.

Local vLLM, SGLang, or TensorRT-LLM

Serve a model locally on any NVIDIA GPU:

# vLLM
vllm serve nvidia/NVIDIA-Nemotron-Nano-9B-v2 \
  --host 0.0.0.0 --port 8000
nemo code -e local-vllm-nano9b

# SGLang (best for Nemotron 3 Super long context on DGX Spark)
python -m sglang.launch_server \
  --model nvidia/nemotron-3-super-120b-a12b \
  --host 0.0.0.0 --port 8000
nemo code -e local-sglang-super

# TensorRT-LLM
docker run --rm -it --gpus all --ipc host --network host \
  -e HF_TOKEN=$HF_TOKEN \
  -v $HOME/.cache/huggingface/:/root/.cache/huggingface/ \
  nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc6 \
  trtllm-serve nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8 \
  --trust_remote_code --port 8000
nemo code -e local-trt-llm-nano4b

TensorRT-LLM launch flags can vary by image release and whether a model needs a prebuilt engine. The bundled NeMoCode presets assume the OpenAI-compatible trtllm-serve path with nvidia/nemotron-3-super-120b-a12b and nvidia/NVIDIA-Nemotron-3-Nano-4B-FP8.

No GPU? Rent one via Brev:

nemo setup brev

Usage

nemo code                              # interactive REPL
nemo code "fix the bug in auth.py" -y  # one-shot, auto-approve tools
nemo chat "explain this error"         # chat, no tools
cat log.txt | nemo code "diagnose"     # pipe input
nemo code -f super-nano "refactor"     # multi-model formation
nemo code --tui                        # full-screen TUI

Plan Mode

Plan mode is a read-only planning phase with an approval gate before execution.

Read-only: Plan mode only reads files, searches code, and explores — no writes, shell commands, or commits.
Approval gate: The planner proposes a concrete plan. You review and approve, revise with feedback, or cancel.
Execution: Once approved, a build agent executes the plan with full tool access.

Switch modes in the REPL with Tab or /mode:

Mode	Behavior
`code`	Ask before tool calls (default)
`plan`	Read-only planning + approval gate
`auto`	Auto-approve everything

Launch directly in plan mode:

nemo code --agent plan "implement user auth"

The plan agent can also spawn read-only research subagents to help with exploration.

Endpoints

Works with any OpenAI-compatible API. Pre-configured:

Endpoint	Model	Access
`nim-super`	Nemotron 3 Super (12B/120B MoE, 1M ctx)	NIM API key
`nim-nano`	Nemotron 3 Nano (3B/30B MoE, 1M ctx)	NIM API key
`nim-nano-9b`	Nemotron Nano 9B v2	NIM API key
`nim-nano-4b`	Nemotron Nano 4B v1.1	NIM API key
`nim-vlm`	Nemotron Nano 12B VL (vision)	NIM API key
`nim-embed`	Nemotron Embed 1B v2	NIM API key
`nim-rerank`	Nemotron Rerank 1B v2	NIM API key
`openrouter-super`	Super via OpenRouter	OpenRouter key
`together-super`	Super via Together AI	Together key
`local-trt-llm-super`	Nemotron 3 Super 120B via TensorRT-LLM	GPU + Docker + TensorRT-LLM
`local-trt-llm-nano4b`	Nemotron 3 Nano 4B FP8 via TensorRT-LLM	GPU + Docker + TensorRT-LLM
`local-vllm-*`	Any model on local vLLM	GPU + vLLM
`local-sglang-*`	Any model on local SGLang	GPU + SGLang
`local-nim-*`	Local NIM container	GPU + Docker

Formations

Multi-model pipelines — Super plans, Nano executes, Super reviews:

nemo code -f super-nano "implement caching"

Formation	Pipeline
`solo`	Super does everything (default)
`super-nano`	Super plans + reviews, Nano executes
`spark`	All-local on DGX Spark (Super + Nano 9B)
`spark-sglang`	Super via SGLang on Spark (best long context)
`spark-trt-llm`	Super 120B + Nano 4B via TensorRT-LLM on Spark
`vision`	VLM reads screenshots, Super writes code
`local`	Nano on local GPU, no internet needed

Agents & Sub-agent Orchestration

NeMoCode supports named agent profiles for top-level sessions and delegated sub-agents.

Primary agents: build (default full-access), plan (read-only planning)
Sub-agents: general, explore, review, debug, test, doc, code-search, fast
Inspect them with nemo agent ls and nemo agent show <name>
Switch primary agents with nemo code --agent <name> or /agent <name> in the REPL/TUI

Sub-agent tools

In coding sessions, these orchestration tools are available:

Tool	Purpose
`delegate`	Spawn a sub-agent and wait for the result
`spawn_agent`	Spawn a background sub-agent for parallel work
`wait_agent`	Wait for a spawned sub-agent to finish
`close_agent`	Close or cancel a sub-agent handle
`resume_agent`	Reopen a previously closed sub-agent handle

Sub-agents inherit read-only mode when delegated from plan mode.

Custom agents

Define custom agents in .nemocode.yaml under agents: or as markdown files under .nemocode/agents/*.md:

---
description: Review code for bugs and regressions
mode: subagent
role: reviewer
prefer_tiers:
  - super
tools:
  - fs_read
  - git_read
  - rg
---

Review the requested changes. Focus on correctness, regressions, and missing tests.

Setup Commands

nemo setup          # guided wizard
nemo setup --list   # show all setup topics
nemo setup wizard   # force the interactive wizard
nemo setup trt-llm  # TensorRT-LLM serving guide
nemo setup vllm     # vLLM serving guide
nemo setup sglang   # SGLang serving guide
nemo setup nim      # NIM container guide
nemo setup brev     # rent a cloud GPU

More Commands

nemo endpoint ls / test     # manage endpoints
nemo model ls / show        # inspect model manifests
nemo formation ls / show    # inspect formations
nemo agent ls / show        # inspect agent profiles
nemo hardware recommend     # GPU-based recommendations
nemo doctor                 # run diagnostics to check setup
nemo session ls             # past conversations
nemo obs pricing            # token pricing
nemo init                   # create .nemocode.yaml without overriding user defaults

Contributing

pip install -e ".[dev]"
ruff check src/ tests/ && ruff format --check src/ tests/
pytest tests/ -v

License

MIT. NVIDIA, Nemotron, and NIM are trademarks of NVIDIA Corporation.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
docs		docs
scripts		scripts
src/nemocode		src/nemocode
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
defaults.yaml		defaults.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeMoCode

Install

Quick Start

Hosted NVIDIA NIM (default)

Local vLLM, SGLang, or TensorRT-LLM

Usage

Plan Mode

Endpoints

Formations

Agents & Sub-agent Orchestration

Sub-agent tools

Custom agents

Setup Commands

More Commands

Contributing

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NeMoCode

Install

Quick Start

Hosted NVIDIA NIM (default)

Local vLLM, SGLang, or TensorRT-LLM

Usage

Plan Mode

Endpoints

Formations

Agents & Sub-agent Orchestration

Sub-agent tools

Custom agents

Setup Commands

More Commands

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages