Skip to content

Conversation

@leslieeilsel
Copy link

Summary

Add comprehensive support for Alibaba Cloud's Qwen models via the DashScope native API (/api/v1), providing full feature parity with other major providers.

Capabilities

Feature Status Details
Text Generation Multi-step tool calling, generation parameters
Multi-Modal (VL) Automatic endpoint routing for vision-language models
Streaming DashScope SSE protocol, reasoning/thinking tokens
Structured Output JSON Object mode + JSON Schema mode (strict)
Embeddings Configurable dimensions
Image Generation qwen-image-max, qwen-image-plus
Image Editing Single-image editing & multi-image fusion
Audio (TTS/STT) Not supported — DashScope uses WebSocket/async protocols

Architecture

  • Uses DashScope's native API (not the OpenAI-compatible mode) for the most complete and up-to-date feature coverage
  • Dynamic endpoint routing: Handlers automatically switch between text-generation and multimodal-generation endpoints based on message content (images present → multimodal endpoint)
  • DashScope SSE streaming: Custom SSE parsing for DashScope's distinct event format (X-DashScope-SSE header, incremental_output mode)
  • Structured output: Supports both response_format: {"type": "json_object"} (broad model support) and response_format: {"type": "json_schema", ...} (strict schema enforcement for newer models)
  • Region-aware configuration: Supports International (Singapore), China Mainland (Beijing), Global/US (Virginia) deployment modes with clear documentation on API key and URL requirements

Files

  • Provider source: src/Providers/Qwen/ — 14 files (handlers, maps, concerns)
  • Configuration: config/prism.php, src/PrismManager.php, src/Enums/Provider.php
  • Tests: tests/Providers/Qwen/ — 9 test files, 52 tests with 176 assertions
  • Fixtures: tests/Fixtures/qwen/ — 14 fixture files recorded from real Qwen API responses
  • Documentation: docs/providers/qwen.md + updates to structured-output.md, text-generation.md, image-generation.md, introduction.md, ProviderSupport.vue

Design Decisions

  1. Native API over OpenAI-compatible mode: The DashScope native API is more actively maintained by Alibaba Cloud and provides better feature coverage (multimodal, image editing, structured output modes). The OpenAI-compatible mode lacks some features and has different error response formats.

  2. Dynamic multimodal endpoint routing: Rather than requiring users to manually select endpoints, the handlers inspect message content and automatically route to the correct DashScope endpoint. This keeps the user-facing API clean and consistent with other Prism providers.

  3. No audio support: DashScope's TTS (CosyVoice) uses WebSocket and STT (Paraformer) uses async REST with task polling — neither is compatible with Prism's synchronous HTTP interface. This is clearly documented with alternatives.

Test Plan

  • All 52 Qwen tests pass (176 assertions)
  • Text generation: basic prompt, system prompt, multi-step tool calling, generation parameters
  • Multi-modal: VL model with images, automatic endpoint routing
  • Streaming: basic text, tools, system prompt, max tokens, reasoning/thinking tokens, step events
  • Structured output: JSON Object mode, JSON Schema mode, auto mode fallback
  • Embeddings: basic input, request format, dimensions option
  • Images: generation, provider options, editing (single/multi/base64), error handling
  • Error handling: Arrearage, DataInspectionFailed, rate limits (429), overloaded (503)
  • Message mapping: user, assistant, tool results, images (URL/path/base64), system prompts
  • Tool mapping: standard tools, strict mode, tool choice validation
  • Code formatted with Pint
  • Fixtures recorded from real Qwen API responses

Made with Cursor

Add comprehensive support for Alibaba Cloud's Qwen models via the
DashScope native API (/api/v1), covering text generation, streaming,
structured output, embeddings, image generation, and image editing.

Key features:
- Text generation with multi-step tool calling
- Multi-modal (VL) support with automatic endpoint routing
- Streaming with DashScope SSE protocol and reasoning/thinking tokens
- Structured output with both JSON Object and JSON Schema modes
- Embeddings with configurable dimensions
- Image generation (qwen-image-max/plus) and editing (qwen-image-edit)
- Region-aware configuration (International, China, US deployments)
- 52 tests with real API fixtures (176 assertions)

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant