Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion inference/models.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ W&B Inference provides access to several open-source foundation models. Each mod
| Meta Llama 3.1 8B | `meta-llama/Llama-3.1-8B-Instruct` | Text | 128K | 8B (Total) | Efficient conversational model optimized for responsive multilingual chatbot interactions. |
| Microsoft Phi 4 Mini 3.8B | `microsoft/Phi-4-mini-instruct` | Text | 128K | 3.8B (Total) | Compact, efficient model ideal for fast responses in resource-constrained environments. |
| MiniMax M2.5 | `MiniMaxAI/MiniMax-M2.5` | Text | 197K | 10B-230B (Active-Total) | MoE model with a highly sparse architecture designed for high-throughput and low latency with strong coding capabilities. |
| Moonshot AI Kimi K2.5 | `moonshotai/Kimi-K2.5` | Text, Vision | 262K | 32B-1T (Active-Total) | Kimi K2.5 is a multimodal Mixture-of-Experts language model featuring 32 billion activated parameters and a total of 1 trillion parameters. |
| Moonshot AI Kimi K2.5 | `moonshotai/Kimi-K2.5` | Text, Vision | 262K | 32B-1T (Active-Total) | Kimi K2.5 is a multimodal Mixture-of-Experts language model featuring 32 billion activated parameters and a total of 1 trillion parameters. We serve Blackwell hardware optimized model. |
| NVIDIA Nemotron 3 Super 120B | `nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8` | Text | 262K | 12B-120B (Active-Total) | Nemotron 3 is a LatentMoE model designed to deliver strong agentic, reasoning, and conversational capabilities. |
| OpenAI GPT OSS 120B | `openai/gpt-oss-120b` | Text | 131K | 5.1B-117B (Active-Total) | Efficient Mixture-of-Experts model designed for high-reasoning, agentic and general-purpose use cases. |
| OpenAI GPT OSS 20B | `openai/gpt-oss-20b` | Text | 131K | 3.6B-20B (Active-Total) | Lower latency Mixture-of-Experts model trained on OpenAI's Harmony response format with reasoning capabilities. |
Expand Down
Loading