Running vLLM locally what is the best model?

### Question

On my 5090, I've tested two models so far with locally hosted vLLM:

1. `Qwen3-32B-AWQ` 

- Will actually call `bash` commands and CRUD files
- Spends 20k tokens trying to install flutter dependencies, it's 50/50 if it will even build a test project.

2.  `Qwen2.5-Coder-32B-Instruct-AWQ`

- Does not run any `bash` or CRUD
- Will only output .json

<img width="1064" height="411" alt="Image" src="https://github.com/user-attachments/assets/ceed882b-13de-41a6-9f3b-bdf508eacff8" />

### 1. Tool-Use Capability
Which open-weight models reliably support tool calling (bash execution, filesystem CRUD, API calls) when served through vLLM?

### 2. Model Behavior Differences
Why would `Qwen3-32B-AWQ` execute shell commands while `Qwen2.5-Coder-32B-Instruct-AWQ` only output structured JSON responses? Is there a way to change any configs to allow qwen coder to use opencode properly? 

### 3. Best Models for Local Coding Agents
Which locally runnable models to work with opencode under roughly 40B parameters (5090 32gb) currently perform best for coding agents that must plan tasks, edit files, and execute shell commands?

### 4. Quantization Impact
Does AWQ quantization significantly degrade planning, reasoning, or tool-calling ability in coding models compared with FP16 or GPTQ?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running vLLM locally what is the best model? #16390

Question

1. Tool-Use Capability

2. Model Behavior Differences

3. Best Models for Local Coding Agents

4. Quantization Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Running vLLM locally what is the best model? #16390

Description

Question

1. Tool-Use Capability

2. Model Behavior Differences

3. Best Models for Local Coding Agents

4. Quantization Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions