Skip to content

Add BYOM guide: deploy open-source models on Vast.ai with Ollama#80

Open
wbrennan899 wants to merge 1 commit intomainfrom
examples/byom-guide
Open

Add BYOM guide: deploy open-source models on Vast.ai with Ollama#80
wbrennan899 wants to merge 1 commit intomainfrom
examples/byom-guide

Conversation

@wbrennan899
Copy link
Collaborator

Summary

  • New guide: BYOM: Bring Your Own Vast Hosted Model to Claude (examples/ai-agents/claude-code-byom.mdx)
  • Added page entry to docs.json

What the guide covers

Step-by-step walkthrough for deploying an open-source model on a Vast.ai GPU instance via Ollama and connecting Claude Code to it using the Anthropic Messages API (/v1/messages).

Two models documented:

Model Size GPU Requirement
Qwen3-Coder-Next 80B MoE (~57 GB VRAM) A100 80GB / H100
GPT-OSS-20B 20B 4-bit (~14 GB VRAM) RTX 3090 / RTX 4090

Includes: CLI setup, GPU search filters, instance creation, endpoint verification (model listing, chat, tool calling), Claude Code connection, and cleanup.

Verified

Both models were deployed on Vast.ai and tested end-to-end:

  • Ollama serves the Anthropic Messages API correctly
  • Basic chat and tool calling work for both models
  • Claude Code connects and operates interactively

Covers deploying Qwen3-Coder-Next (80B MoE) and GPT-OSS-20B via Ollama
on Vast.ai GPU instances, with verified endpoint testing and connection
instructions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant