wandb · anastasiaguspan · Mar 13, 2026 · Mar 13, 2026
@@ -19,7 +19,7 @@ W&B Inference provides access to several open-source foundation models. Each mod
 | Meta Llama 3.1 8B             | `meta-llama/Llama-3.1-8B-Instruct`          | Text         | 128K           | 8B (Total)                | Efficient conversational model optimized for responsive multilingual chatbot interactions.                                                  |
 | Microsoft Phi 4 Mini 3.8B     | `microsoft/Phi-4-mini-instruct`             | Text         | 128K           | 3.8B (Total)              | Compact, efficient model ideal for fast responses in resource-constrained environments.                                                     |
 | MiniMax M2.5                  | `MiniMaxAI/MiniMax-M2.5`                    | Text         | 197K           | 10B-230B (Active-Total)   | MoE model with a highly sparse architecture designed for high-throughput and low latency with strong coding capabilities.                   |
-| Moonshot AI Kimi K2.5         | `moonshotai/Kimi-K2.5`                      | Text, Vision | 262K           | 32B-1T (Active-Total)     | Kimi K2.5 is a multimodal Mixture-of-Experts language model featuring 32 billion activated parameters and a total of 1 trillion parameters. |
+| Moonshot AI Kimi K2.5         | `moonshotai/Kimi-K2.5`                      | Text, Vision | 262K           | 32B-1T (Active-Total)     | Kimi K2.5 is a multimodal Mixture-of-Experts language model featuring 32 billion activated parameters and a total of 1 trillion parameters. We serve Blackwell hardware optimized model. |
 | NVIDIA Nemotron 3 Super 120B  | `nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-FP8`  | Text         | 262K           | 12B-120B (Active-Total)   | Nemotron 3 is a LatentMoE model designed to deliver strong agentic, reasoning, and conversational capabilities.                             |
 | OpenAI GPT OSS 120B           | `openai/gpt-oss-120b`                       | Text         | 131K           | 5.1B-117B (Active-Total)  | Efficient Mixture-of-Experts model designed for high-reasoning, agentic and general-purpose use cases.                                      |
 | OpenAI GPT OSS 20B            | `openai/gpt-oss-20b`                        | Text         | 131K           | 3.6B-20B (Active-Total)   | Lower latency Mixture-of-Experts model trained on OpenAI's Harmony response format with reasoning capabilities.                             |