fix(webgpu): stabilize qwen streaming and multimodal fallback by leehack · Pull Request #4 · leehack/llama-web-bridge

leehack · 2026-03-08T14:14:20Z

This pull request introduces several improvements to the llama_webgpu_bridge.js and llama_webgpu_core.cpp files, focusing on more robust handling of text output, safer CPU fallback logic for multimodal models, and better normalization of media markers. The changes enhance stability, correctness, and compatibility, especially when working with multimodal models and streaming text.

Text output and streaming stability

Added the trimUnstableUtf8Tail function to ensure streamed text does not end with incomplete or unstable UTF-8 sequences, improving the reliability of token emission during generation. (js/llama_webgpu_bridge.js)
Refactored the streaming logic to track stable emitted text and only emit new, stable text segments, preventing duplicate or partial token emissions. (js/llama_webgpu_bridge.js) [1] [2] [3]

Multimodal model CPU fallback and option sanitization

Introduced _createCpuSafeMultimodalLoadOptions to sanitize model loading options for CPU fallback, limiting context size, thread count, and batch size for safer operation. (js/llama_webgpu_bridge.js)
Updated model loading and fallback logic to use the new CPU-safe options and improved detection of when a CPU fallback is necessary, including warning logic for users. (js/llama_webgpu_bridge.js) [1] [2] [3]

Media marker normalization

Enhanced media marker normalization to handle additional vision-related markers, improving prompt preprocessing for multimodal models. (src/llama_webgpu_core.cpp)

Token handling and generation correctness

Changed token handling to end generation when a control token is encountered and to avoid emitting control tokens in the output, improving correctness and preventing unwanted artifacts in generated text. (src/llama_webgpu_core.cpp)

fix(webgpu): stabilize qwen streaming and multimodal fallback

a92936f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(webgpu): stabilize qwen streaming and multimodal fallback#4

fix(webgpu): stabilize qwen streaming and multimodal fallback#4
leehack wants to merge 1 commit intomainfrom
fix/webgpu-utf8-streaming

leehack commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

leehack commented Mar 8, 2026

Text output and streaming stability

Multimodal model CPU fallback and option sanitization

Media marker normalization

Token handling and generation correctness

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant