You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fixed the LFM 1.2B Thinking (Liquid AI) local model — it was completely non-functional due to multiple issues in the worker and main thread handler.
5
+
6
+
## Changes
7
+
8
+
### `public/ai-worker-lfm.js`
9
+
-**Upgraded ONNX Runtime Web** from v1.22.0 → v1.24.3 — old version crashed with WASM errors on LFM's hybrid SSM+Transformer operators
10
+
-**Fixed HEAD_DIM constant** from 256 → 64 (= hidden_size 2048 ÷ 32 attention heads) — wrong value caused KV cache shape mismatch during inference
11
+
-**Replaced greedy decoding with temperature sampling** — top-k=40, temp=0.7 produces more detailed responses instead of ultra-terse one-liners
12
+
-**Increased minimum token budget** to 2048 (4096 for thinking mode) — LFM always generates `<think>` reasoning which consumes tokens before the answer
13
+
-**Added detail prompt hint** for chat/generate/qa/explain tasks to encourage comprehensive responses
14
+
-**Fixed error handling** — ONNX Runtime throws raw WASM memory pointers (numbers) instead of Error objects; now safely extracts error messages with `String()` fallback
15
+
-**Improved download status message** — shows "Downloading LFM 1.2B Thinking weights — this may take a few minutes..." instead of misleading "Downloading model_q4.onnx..."
16
+
17
+
### `js/ai-assistant.js`
18
+
-**Added missing `case 'token'` handler** in the local worker message listener — streaming tokens from LFM (and Qwen) were silently dropped
19
+
-**Changed `complete` handler** from `handleAiResponse` → `handleGroqComplete` — prevents duplicate response bubbles when streaming tokens are followed by a complete message
0 commit comments