Textagent
diff --git a/‎README.md‎
Lines changed: 3 additions & 3 deletions b/‎README.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎changelogs/CHANGELOG-tts-ux.md‎
Lines changed: 45 additions & 0 deletions b/‎changelogs/CHANGELOG-tts-ux.md‎
Lines changed: 45 additions & 0 deletions
diff --git a/‎css/tts.css‎
Lines changed: 48 additions & 11 deletions b/‎css/tts.css‎
Lines changed: 48 additions & 11 deletions
@@ -28,7 +28,7 @@
 | **🎬 Media Embedding** | Video playback via `![alt](video.mp4)` image syntax (`.mp4`, `.webm`, `.ogg`, `.mov`, `.m4v`); YouTube/Vimeo embeds auto-detected; `embed` code block for responsive media grids (`cols=1-4`, `height=N`); Video.js v10 lazy-loaded with native `<video>` fallback; website URLs render as rich link preview cards with favicon + "Open ↗" button |
 | **🤖 AI Assistant** | 3 local Qwen 3.5 sizes (0.8B / 2B / 4B via WebGPU/WASM), Gemini 3.1 Flash Lite, Groq Llama 3.3 70B, OpenRouter — summarize, expand, rephrase, grammar-fix, explain, simplify, auto-complete; AI writing tags (Polish, Formalize, Elaborate, Shorten, Image); enhanced context menu; per-card model selection; concurrent block generation; inline review with accept/reject/regenerate; AI-powered image generation; **smart model loading UX** — cache vs download detection (📦/⬇️), HuggingFace source location display, delete cached models from browser storage; all models hosted on [`textagent` HuggingFace org](https://huggingface.co/textagent) with automatic fallback |
 | **🎤 Voice Dictation** | Dual-engine speech-to-text: **Voxtral Mini 3B** (WebGPU, primary, 13 languages, ~2.7 GB) or **Whisper Large V3 Turbo** (WASM fallback, ~800 MB) with consensus scoring; download consent popup with model info before first use; 50+ Markdown-aware voice commands — natural phrases ("heading one", "bold…end bold", "add table", "undo"); auto-punctuation via AI refinement or built-in fallback; streaming partial results |
-| **🔊 Text-to-Speech** | Hybrid Kokoro TTS engine — English/Chinese via [Kokoro 82M v1.1-zh ONNX](https://huggingface.co/textagent/Kokoro-82M-v1.1-zh-ONNX) (~80 MB, off-thread WebWorker), Japanese & 10+ languages via Web Speech API fallback; TTS card with separate ▶ Run (generate audio) / ▷ Play (replay) / 💾 Save (WAV download) buttons; hover any preview text and click 🔊 to hear pronunciation; voice auto-selection by language |
+| **🔊 Text-to-Speech** | Hybrid Kokoro TTS engine — 9 languages (English, Japanese, Chinese, Spanish, French, Hindi, Italian, Portuguese) via [Kokoro 82M v1.0 ONNX](https://huggingface.co/textagent/Kokoro-82M-v1.0-ONNX) (~80 MB, off-thread WebWorker), Korean, German & others via Web Speech API fallback; TTS card with separate ▶ Run (generate audio) / ▷ Play (replay) / 💾 Save (WAV download) buttons; hover any preview text and click 🔊 to hear pronunciation; voice auto-selection by language |
 | **Import** | MD, DOCX, XLSX/XLS, CSV, HTML, JSON, XML, PDF — drag & drop or click to import |
 | **Export** | Markdown, self-contained styled HTML, PDF (smart page-breaks, shared rendering pipeline), LLM Memory (5 formats: XML, JSON, Compact JSON, Markdown, Plain Text + shareable link) |
 | **Sharing** | AES-256-GCM encrypted sharing via Firebase; read-only shared links, optional passphrase protection — decryption key stays in URL fragment (never sent to server) |
@@ -62,7 +62,7 @@ TextAgent includes a built-in AI assistant panel with **three local model sizes*
 | **Gemini 3.1 Flash Lite** | Google (free tier) | ☁️ Cloud — 1M tokens/min | 🚀 Very Fast |
 | **Llama 3.3 70B** | Groq (free tier) | ☁️ Cloud — ultra-low latency | ⚡ Ultra Fast |
 | **Auto · Best Free** | OpenRouter (free tier) | ☁️ Cloud — multi-model routing | 🧠 Powerful |
-| **Kokoro TTS (82M)** | Local (WebWorker) | 🔒 Private — English + Chinese · ~80 MB | 🔊 Speech |
+| **Kokoro TTS (82M)** | Local (WebWorker) | 🔒 Private — 9 Languages · ~80 MB | 🔊 Speech |
 | **Voxtral STT (3B)** | Local (WebGPU) | 🔒 Private — 13 languages · ~2.7 GB | 🎤 Dictation |
 | **Granite Docling (258M)** | Local (WebGPU/WASM) | 🔒 Private — document OCR · ~500 MB | 📄 Document |
 | **Florence-2 (230M)** | Local (WebGPU/WASM) | 🔒 Private — OCR + captioning · ~230 MB | 📷 Vision |
@@ -479,7 +479,7 @@ TextAgent has undergone significant evolution since its inception. What started
 | **2026-03-12** | `f7ca256` | 🎤 **Voxtral STT** — [Voxtral Mini 3B](https://huggingface.co/textagent/Voxtral-Mini-3B-2507-ONNX) as primary speech-to-text engine on WebGPU (~2.7 GB, q4, 13 languages, streaming partial output via `TextStreamer`); Whisper Large V3 Turbo as WASM fallback (~800 MB, q8); `voxtral-worker.js` new WebWorker with `VoxtralForConditionalGeneration` + `VoxtralProcessor`; `speechToText.js` WebGPU detection + dual-worker routing; download consent popup (`showSttConsentPopup`) with model name/size/privacy info before first download; `STT_CONSENTED` localStorage key; model duplicated to `textagent/` HuggingFace org with `onnx-community/` fallback |
 | **2026-03-12** | `0f58296` | 🛡️ **Code Audit Fixes** — sandboxed `jsAdapter` in `exec-sandbox.js` (was raw `eval()` on main thread, now iframe-sandboxed); `mirror-models.sh` model IDs updated to `textagent`, Kokoro v1.0→v1.1-zh, GitLab refs removed; Whisper speech worker forwarded user's language selection instead of hardcoded English; shared `ai-worker-common.js` module extracts `TOKEN_LIMITS` + `buildMessages()` from 3 workers; cloud workers load as ES modules |
 | **2026-03-12** | `591467b` | 🏠 **Model Hosting Migration** — all 7 ONNX models (Qwen 3.5 0.8B/2B/4B, Qwen 3 4B Thinking, Whisper Large V3 Turbo, Kokoro 82M v1.0/v1.1-zh) duplicated to self-owned [`textagent` HuggingFace org](https://huggingface.co/textagent); model IDs updated from `onnx-community/` to `textagent/` across all workers; automatic fallback to `onnx-community/` namespace if textagent models unavailable; GitLab mirror removed from runtime code |
-| **2026-03-12** | `7b9f846` | 🔊 **Kokoro TTS** — hybrid text-to-speech engine: English/Chinese via [Kokoro 82M v1.1-zh ONNX](https://huggingface.co/textagent/Kokoro-82M-v1.1-zh-ONNX) (~80 MB, off-thread WebWorker via `kokoro-js`), Japanese & 10+ languages via Web Speech API fallback; hover preview text → click 🔊 for pronunciation; voice auto-selection by language; `textToSpeech.js` main module + `tts-worker.js` WebWorker + `tts.css` styling; model-hosts.js for configurable hosting with auto-fallback |
+| **2026-03-12** | `7b9f846` | 🔊 **Kokoro TTS** — hybrid text-to-speech engine: 9 languages (English, Japanese, Chinese, Spanish, French, Hindi, Italian, Portuguese) via [Kokoro 82M v1.0 ONNX](https://huggingface.co/textagent/Kokoro-82M-v1.0-ONNX) (~80 MB, off-thread WebWorker via `kokoro-js`), Korean, German & others via Web Speech API fallback; hover preview text → click 🔊 for pronunciation; voice auto-selection by language; `textToSpeech.js` main module + `tts-worker.js` WebWorker + `tts.css` styling; model-hosts.js for configurable hosting with auto-fallback |
 | **2026-03-12** | `7b9f846` | 📷 **OCR Tag** — new `{{@OCR:}}` document tag for image-to-text extraction; amber-accented card with mode pills (Text/Math/Table); 📎 image upload with `@upload:` editor sync; Qwen model default; vision-capable model flags (`supportsVision`) on Qwen 3.5 Flash, 35B-A3B, and DeepSeek V3.2 |
 | **2026-03-12** | `7b9f846`, `1ec8b90` | 🏗️ **Model Architecture** — ai-worker.js refactored for architecture-aware loading (`qwen3` text-only vs `qwen3_5` vision); `setModelId` accepts `architecture` + `dtype` params; automatic fallback to HuggingFace when primary host fails; `moonshine-medium-worker.js` deleted (replaced by unified `speech-worker.js`); Language Learning template with TTS pronunciation tips; SQLite-compatible SQL in Technical template |
 | **2026-03-11** | `7b9f846` | ▶ **Run All Notebook Engine** — one-click `▶ Run All` button executes every code/tag block in document order; 11 runtime adapters (bash, math, python, html, js, sql, docgen-ai, docgen-image, docgen-agent, api, linux-script); Block Registry with FNV-1a stable IDs; Execution Controller with fixed-bottom progress bar, per-block status badges (pending/running/done/error), and abort support; SQLite `_exec_results` context store for cross-block data sharing; DocGen/API adapters use auto-accept mode (skip review panel); Linux adapter submits to Judge0 CE; deferred adapter queue for module loading order; `exec-engine.css` styling; 12 new Playwright tests (191 total) |
 
@@ -0,0 +1,45 @@
+# Changelog: TTS Card UX & Multilingual Routing
+
+**Date:** 2026-03-15
+
+## Summary
+
+Major overhaul of the TTS card user experience: merged Play/Stop into a single toggle button, added a generating state that disables all buttons during synthesis, added detailed timestamped console logs, and fixed a critical bug where non-Latin languages (Japanese, Chinese, Hindi) couldn't be phonemized by Kokoro's espeak-ng WASM. These languages now route to Web Speech API for proper pronunciation.
+
+## Changes
+
+### `js/textToSpeech.js` (+146 lines)
+- **`_isGenerating` state** — tracks whether audio synthesis is in progress
+- **`onGenerateComplete` callback** — one-shot callback for UI to know when generation finishes
+- **`_ttsT()` timestamped logs** — every TTS log now shows elapsed time since page load (`🔊 [TTS +12.3s]`)
+- **Fixed synthesizing status bug** — `loadingPhase: 'synthesizing'` from worker no longer resets `modelReady=false` (was breaking all subsequent TTS)
+- **Moved Japanese, Chinese, Hindi out of `KOKORO_LANGS`** — espeak-ng WASM can't phonemize CJK/Devanagari scripts; routes to Web Speech API instead
+- **`generate()` handles Web Speech API** — polls `speechSynthesis.speaking` for completion, fires callback to re-enable UI buttons
+- **Added `hi-IN`, `ja-JP` to `WEB_SPEECH_LANG_MAP`** — proper BCP-47 codes for Hindi and Japanese
+
+### `js/tts-worker.js` (+108 lines)
+- **Synthesis timing logs** — logs when speak request is received, voice selected, and synthesis duration
+- **`loadingPhase: 'synthesizing'` status** — progress message during audio generation
+
+### `js/ai-docgen.js` (+117 lines)
+- **Play/Stop → single toggle button** (`ai-tts-play-toggle`) — ▷ Play ↔ ■ Stop with auto-reset on playback finish
+- **Run button generating state** — text changes to "⏳ Generating…", all other buttons disabled during synthesis
+- **`onGenerateComplete` integration** — restores UI state when generation completes or errors
+- **Web Speech API toast** — shows "Spoken via Web Speech API" for non-Kokoro languages
+
+### `css/tts.css` (+59 lines)
+- **Play/Stop toggle styles** — purple (Play) ↔ red (Stop) with smooth transitions
+- **Generating state animation** — pulsing amber border + disabled button styles
+- **Dark mode support** — updated selectors for toggle states
+
+### `js/ai-models.js` (minor)
+- Updated Kokoro model description to "9 Languages" and changed model ID from `v1.1-zh-ONNX` to `v1.0-ONNX`
+
+### `js/model-hosts.js` (minor)
+- Updated comment reference from `v1.1-zh-ONNX` to `v1.0-ONNX`
+
+### `scripts/mirror-models.sh` (minor)
+- Updated mirror script for Kokoro v1.0 model ID
+
+### `README.md` (minor)
+- Updated TTS feature description and model table for 9-language Kokoro + Web Speech API hybrid
@@ -112,32 +112,38 @@
     box-shadow: 0 0 0 2px rgba(139, 92, 246, 0.15);
 }
 
-/* Play / Stop buttons */
-.ai-tts-play,
-.ai-tts-stop {
+/* Play/Stop toggle button */
+.ai-tts-play-toggle {
     font-weight: 600;
     font-size: 0.72rem;
     letter-spacing: 0.02em;
-}
-
-.ai-tts-play {
     color: #8b5cf6 !important;
     border-color: rgba(139, 92, 246, 0.3) !important;
+    transition: color 0.2s ease, border-color 0.2s ease, background 0.2s ease;
 }
 
-.ai-tts-play:hover {
+.ai-tts-play-toggle:hover:not(:disabled) {
     background: rgba(139, 92, 246, 0.12) !important;
 }
 
-.ai-tts-stop {
+/* Playing state — red Stop */
+.ai-tts-play-toggle.ai-tts-playing {
     color: #ef4444 !important;
     border-color: rgba(239, 68, 68, 0.3) !important;
 }
 
-.ai-tts-stop:hover {
+.ai-tts-play-toggle.ai-tts-playing:hover:not(:disabled) {
     background: rgba(239, 68, 68, 0.12) !important;
 }
 
+/* Run button */
+.ai-tts-run {
+    font-weight: 600;
+    font-size: 0.72rem;
+    letter-spacing: 0.02em;
+    transition: opacity 0.2s ease;
+}
+
 /* Speaking state — pulse animation */
 .ai-tts-speaking {
     box-shadow: 0 0 0 2px rgba(139, 92, 246, 0.2);
@@ -149,6 +155,33 @@
     50% { box-shadow: 0 0 0 4px rgba(139, 92, 246, 0.25); }
 }
 
+/* Generating state — pulsing border while synthesizing */
+.ai-tts-generating {
+    border-left-color: #f59e0b !important;
+    animation: tts-generating-pulse 1s ease-in-out infinite;
+}
+
+@keyframes tts-generating-pulse {
+    0%, 100% { box-shadow: 0 0 0 2px rgba(245, 158, 11, 0.15); }
+    50% { box-shadow: 0 0 0 4px rgba(245, 158, 11, 0.3); }
+}
+
+/* Disabled buttons during generation */
+.ai-tts-card .ai-placeholder-btn:disabled,
+.ai-tts-card select:disabled {
+    opacity: 0.4;
+    cursor: not-allowed;
+    pointer-events: none;
+}
+
+.ai-tts-card .ai-tts-run:disabled {
+    opacity: 0.7;
+    cursor: wait;
+    pointer-events: none;
+    color: #f59e0b !important;
+    border-color: rgba(245, 158, 11, 0.3) !important;
+}
+
 /* Toolbar TTS button accent */
 .fmt-tts-btn {
     color: #8b5cf6 !important;
@@ -169,14 +202,18 @@
     border-color: rgba(167, 139, 250, 0.3);
 }
 
-[data-theme="dark"] .ai-tts-play {
+[data-theme="dark"] .ai-tts-play-toggle {
     color: #a78bfa !important;
 }
 
-[data-theme="dark"] .ai-tts-stop {
+[data-theme="dark"] .ai-tts-play-toggle.ai-tts-playing {
     color: #f87171 !important;
 }
 
+[data-theme="dark"] .ai-tts-run:disabled {
+    color: #fbbf24 !important;
+}
+
 /* ============================================
    TTS Download Button
    ============================================ */