Reasoning Coder is a demonstration of a coding agen for reasoning-aware code generation powered by the open-source NVIDIA Nemotron Nano 9B v2 model. The agent combines the strengths of large language model coding capabilities with a reasoning budget control mechanism, enabling more transparent and efficient problem-solving.
It is designed to showcase how developers can integrate self-hosted vLLM deployments to run advanced code assistants locally or on their own infrastructure. The demo highlights how NVIDIA Nemotron Nano 9B v2 reasoning features can be applied to software development workflows, making it easier to experiment with streaming, non-streaming, and reasoning-driven code generation in a reproducible environment.
- Reasoning Budget Control: Toggle reasoning on/off and control token budget
- Streaming Support: Real-time streaming of responses
- Code Generation: AI-powered code generation for various programming languages
- File Upload Context: Upload files to provide context for better code generation
- vLLM server running with Nemotron Nano 9B v2 model
- Hugging Face token to download the model. Get one here.
- Python 3.8+ environment
- Docker (optional)
pip install -U "vllm>=0.10.1"Alternativly, you can use Docker to launch a vLLM server. See the instructions below.
vllm serve nvidia/NVIDIA-Nemotron-Nano-9B-v2 \
--trust-remote-code \
--mamba_ssm_cache_dtype float32 \
--max-num-seqs 64 \
--max-model-len 131072 \
--host 0.0.0.0 \
--port 8888Or, if you are using Docker:
export HF_CACHE_DIR=<your_local_HF_directory>
export HF_TOKEN=<your_HF_token>
export TP_SIZE=1
docker run --runtime nvidia --gpus all --ipc=host \
-v "$HF_CACHE_DIR":/hf_cache \
-e HF_HOME=/hf_cache \
-e "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
-e PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
-p 8888:8888 \
vllm/vllm-openai:v0.10.1 \
--model nvidia/NVIDIA-Nemotron-Nano-9B-v2 \
--tensor-parallel-size ${TP_SIZE} \
--trust-remote-code \
--mamba_ssm_cache_dtype float32 \
--max-num-seqs 64 \
--max-model-len 131072 \
--host 0.0.0.0 \
--port 8888If you're running vLLM on a different port or host, update the DEFAULT_LOCAL_API constant in reasoning_coder.py.
Clone this repository
git clone https://github.com/NVIDIA/GenerativeAIExamples.git
cd GenerativeAIExamples/community/reasoning-coderActivate virtual environment and install dependencies.
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txtstreamlit run reasoning_coder.pyThe UI should open in the browser under http://localhost:8501/.
Try these built-in examples:
- "Write a Python function to find the longest palindromic substring in a string"
- "Create a recursive function to solve the Tower of Hanoi puzzle"
- "Implement a binary search tree with insertion and search operations"
- "Write a function to validate email addresses using regex"
- "Create a simple web scraper using Python requests and BeautifulSoup"