Skip to content

Commit b26e6df

Browse files
committed
feat: Run All engine overhaul, TTS card split (Run/Play/Save), preflight dialog, model pre-loading
1 parent 6adceb0 commit b26e6df

19 files changed

Lines changed: 2304 additions & 225 deletions

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,13 @@
2828
| **🎬 Media Embedding** | Video playback via `![alt](video.mp4)` image syntax (`.mp4`, `.webm`, `.ogg`, `.mov`, `.m4v`); YouTube/Vimeo embeds auto-detected; `embed` code block for responsive media grids (`cols=1-4`, `height=N`); Video.js v10 lazy-loaded with native `<video>` fallback; website URLs render as rich link preview cards with favicon + "Open ↗" button |
2929
| **🤖 AI Assistant** | 3 local Qwen 3.5 sizes (0.8B / 2B / 4B via WebGPU/WASM), Gemini 3.1 Flash Lite, Groq Llama 3.3 70B, OpenRouter — summarize, expand, rephrase, grammar-fix, explain, simplify, auto-complete; AI writing tags (Polish, Formalize, Elaborate, Shorten, Image); enhanced context menu; per-card model selection; concurrent block generation; inline review with accept/reject/regenerate; AI-powered image generation; **smart model loading UX** — cache vs download detection (📦/⬇️), HuggingFace source location display, delete cached models from browser storage; all models hosted on [`textagent` HuggingFace org](https://huggingface.co/textagent) with automatic fallback |
3030
| **🎤 Voice Dictation** | Dual-engine speech-to-text: **Voxtral Mini 3B** (WebGPU, primary, 13 languages, ~2.7 GB) or **Whisper Large V3 Turbo** (WASM fallback, ~800 MB) with consensus scoring; download consent popup with model info before first use; 50+ Markdown-aware voice commands — natural phrases ("heading one", "bold…end bold", "add table", "undo"); auto-punctuation via AI refinement or built-in fallback; streaming partial results |
31-
| **🔊 Text-to-Speech** | Hybrid Kokoro TTS engine — English/Chinese via [Kokoro 82M v1.1-zh ONNX](https://huggingface.co/textagent/Kokoro-82M-v1.1-zh-ONNX) (~80 MB, off-thread WebWorker), Japanese & 10+ languages via Web Speech API fallback; hover any preview text and click 🔊 to hear pronunciation; voice auto-selection by language; ⬇ Save button to download generated audio as WAV file |
31+
| **🔊 Text-to-Speech** | Hybrid Kokoro TTS engine — English/Chinese via [Kokoro 82M v1.1-zh ONNX](https://huggingface.co/textagent/Kokoro-82M-v1.1-zh-ONNX) (~80 MB, off-thread WebWorker), Japanese & 10+ languages via Web Speech API fallback; TTS card with separate ▶ Run (generate audio) / ▷ Play (replay) / 💾 Save (WAV download) buttons; hover any preview text and click 🔊 to hear pronunciation; voice auto-selection by language |
3232
| **Import** | MD, DOCX, XLSX/XLS, CSV, HTML, JSON, XML, PDF — drag & drop or click to import |
3333
| **Export** | Markdown, self-contained styled HTML, PDF (smart page-breaks, shared rendering pipeline), LLM Memory (5 formats: XML, JSON, Compact JSON, Markdown, Plain Text + shareable link) |
3434
| **Sharing** | AES-256-GCM encrypted sharing via Firebase; read-only shared links, optional passphrase protection — decryption key stays in URL fragment (never sent to server) |
3535
| **Presentation** | Slide mode using `---` separators, keyboard navigation, multiple layouts & transitions, speaker notes, overview grid, 20+ PPT templates with image backgrounds |
3636
| **Desktop** | Native app via Neutralino.js with system tray and offline support |
37-
| **Code Execution** | 7 languages in-browser: Bash ([just-bash](https://justbash.dev/)), Math (Nerdamer), LaTeX (MathJax + Nerdamer evaluation), Python ([Pyodide](https://pyodide.org/)), HTML (sandboxed iframe, `html-autorun` for widgets/quizzes), JavaScript (sandboxed iframe), SQL ([sql.js](https://sql.js.org/) SQLite) · 25+ compiled languages via [Judge0 CE](https://ce.judge0.com): C, C++, Rust, Go, Java, TypeScript, Kotlin, Scala, Ruby, Swift, Haskell, Dart, C#, and more · **▶ Run All** notebook engine — one-click sequential execution of all blocks with progress bar, abort, per-block status badges, and SQLite shared context store |
37+
| **Code Execution** | 7 languages in-browser: Bash ([just-bash](https://justbash.dev/)), Math (Nerdamer), LaTeX (MathJax + Nerdamer evaluation), Python ([Pyodide](https://pyodide.org/)), HTML (sandboxed iframe, `html-autorun` for widgets/quizzes), JavaScript (sandboxed iframe), SQL ([sql.js](https://sql.js.org/) SQLite) · 25+ compiled languages via [Judge0 CE](https://ce.judge0.com): C, C++, Rust, Go, Java, TypeScript, Kotlin, Scala, Ruby, Swift, Haskell, Dart, C#, and more · **▶ Run All** notebook engine — one-click sequential execution with preflight dialog (block table with model/status), pre-execution model loading (AI + TTS auto-loaded before blocks run), progress bar, abort, per-block status badges, detailed console logging, and SQLite shared context store |
3838
| **Security** | Content Security Policy (CSP), SRI integrity hashes, XSS sanitization (DOMPurify), ReDoS protection, Firestore write-token ownership, API keys via HTTP headers, postMessage origin validation, 8-char passphrase minimum, sandboxed code execution |
3939
| **AI Document Tags** | `{{@AI:}}` text generation, `{{@Think:}}` deep reasoning, `{{@Image:}}` image generation (Gemini Imagen), `{{@OCR:}}` image-to-text extraction (Text/Math/Table modes via Granite Docling 258M or Florence-2 230M, PDF page rendering via pdf.js), `{{@TTS:}}` text-to-speech playback (Kokoro TTS per card, language selector, ▶ Play / ⬇ Save WAV), `{{@STT:}}` speech-to-text dictation (engine selector: Whisper/Voxtral/Web Speech API, 11 languages, Record/Stop/Insert/Clear), `{{@Translate:}}` translation (target language selector, integrated TTS pronunciation, cloud model routing), `{{@Game:}}` game builder (AI-generated or pre-built, Canvas 2D/Three.js/P5.js, import/export HTML) — `@` prefix syntax on all tag types + metadata fields (`@name`, `@use`, `@think`, `@search`, `@prompt`, `@step`, `@upload`, `@model`, `@engine`, `@lang`, `@prebuilt`); `@model:` field persists selected model per card with intelligent defaults (OCR→`granite-docling`, TTS→`kokoro-tts`, STT→`voxtral-stt`, Image→`imagen-ultra`); editable `@prompt:` textarea and `@step:` inputs in preview cards; description/prompt separation (bare text = label, `@prompt:` = AI instruction); 📎 image/PDF upload for multimodal vision analysis; per-card model selector with document-portable model persistence, concurrent block operations |
4040
| **🔌 API Calls** | `{{API:}}` REST API integration — GET/POST/PUT/DELETE methods, custom headers, JSON body, response stored in `$(api_varName)` variables; inline review panel; toolbar GET/POST buttons |
@@ -459,6 +459,7 @@ TextAgent has undergone significant evolution since its inception. What started
459459

460460
| Date | Commits | Feature / Update |
461461
|------|---------|-----------------|
462+
| **2026-03-15** | | 🚀 **Run All Engine & TTS UX** — pre-execution model readiness check auto-loads all required models (AI + Kokoro TTS) before block execution starts; detailed `[RunAll]` console logging with `console.table` block summary, per-block timing, variable resolution status (✅/⚠), and completion summary; Stop button now works during model loading via `M._execAborted` cross-module flag; `ensureModelReadyAsync()` rewritten with fail-fast on missing consent/API key; compact preflight dialog (960px, smaller fonts, all 8 columns visible); `waitForModelReady()` handles Kokoro TTS via `M.tts.isKokoroReady()`; TTS card split into 3 buttons: ▶ Run (generate audio only), ▷ Play (replay stored audio), 💾 Save (download WAV); new `M.tts.generate()`, `playLastAudio()`, `isKokoroReady()`, `initKokoro()` APIs; AI model fallback in `run-requirements.js` correctly defaults to text models |
462463
| **2026-03-14** | | 🔗 **AI Variable Controls** — new unified 🔗 Vars button on AI and Agent cards opens combined dropdown with 📤 Output Variable (text input to name the block's result) and 📥 Input Variables (checkbox picker listing declared `@var:` names from other blocks + runtime vars); variable chaining enables multi-block AI pipelines (`@var: research``@input: research`); declared variables appear before execution with "declared" badge; Doc Variables Panel (`{•} Vars` toolbar button) now shows ⏳ Pending Vars section for declared-but-unexecuted variables; `@var:` and `@input:` directives stripped from displayed prompt text |
463464
| **2026-03-14** | | 🧠 **Think Mode Refinement & Multi-Select Search** — Think mode (`@think: Yes` / 🧠 toggle) now uses two-pass generation: first generates with thinking enabled, then passes the draft back to the model to add important details, examples, and missing information; removed complex ReAct pattern in favor of simple refinement; multi-select search provider dropdown on AI Generate and Agent Flow cards (checkbox pills, activate multiple engines simultaneously); search results fetched in parallel and merged |
464465
| **2026-03-14** | | 🔑 **API Key Re-entry & Git UX** — fixed bug where incorrect cloud API keys couldn't be re-entered (dropdown re-click now re-shows key modal); "Change API Key" link in error status bar for auth failures; 🔑 key icon button on cloud model cards in DocGen setup panel with "Key Set"/"Key Required" badges; 🐙 Git toolbar button now shows centered confirmation dialog warning that local models have small context windows and cloud models (Groq, Gemini, OpenRouter) are recommended for repo analysis; Git analysis auto-opens API key modal on key/model-not-ready errors |

changelogs/CHANGELOG-runall-tts.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Run All Engine, Preflight Dialog, TTS & Model Loading — 2026-03-15
2+
3+
## Summary
4+
5+
Major overhaul of the Run All execution engine, preflight dialog, TTS card UX, and AI model loading for automated execution. 17 files modified, 2228 insertions, 223 deletions.
6+
7+
## Changes
8+
9+
### Run All — Pre-Execution Model Loading
10+
- **Pre-flight model readiness check**: Before executing any blocks, `executeBlocks()` now collects all required models (AI + TTS), checks their readiness, and auto-triggers loading. Waits for each model to become ready before starting the block loop.
11+
- **Kokoro TTS pre-loading**: Detects `kokoro-tts` as a separate model type. Triggers `M.tts.initKokoro()`, polls `M.tts.isKokoroReady()` with 180s timeout (Kokoro is 121 MB).
12+
- **AI model auto-loading**: For local AI models (qwen-local, etc.), checks consent, auto-triggers `switchToModel()` or `initAiWorker()`, and polls `getLocalState()` until loaded.
13+
- **Cloud model handling**: Checks API key, shows key modal if needed, auto-initializes cloud worker.
14+
- **Abort-aware polling**: All model loading loops check `_abortRequested` each second so the Stop button works during model loading.
15+
16+
### Run All — Detailed Console Logging
17+
- **Session start log**: `console.group` with `console.table` showing all blocks (#, Type, Lang, Var, Model, Input, Think, Label).
18+
- **Per-block log**: Collapsible groups with label, runtime, model, output var, input vars with resolution status (✅ resolved / ⚠ empty), think mode, search, memory, language.
19+
- **Timing**: Per-block elapsed time, overall execution time.
20+
- **Variable tracking**: Shows stored variable values after each block and final variable summary at end.
21+
- **Error logging**: Error message, stack trace, and elapsed time on failure.
22+
- **Completion summary**: `console.table` of all block timings with status and errors.
23+
24+
### Run All — Stop Button Fix
25+
- **`M._execAborted` flag**: Exposed on `M` for cross-module abort checking. Set on abort, cleared on start/completion.
26+
- **`ensureModelReadyAsync()` abort check**: Now checks `M._execAborted` each poll iteration instead of blocking for up to 120s.
27+
- **Fail-fast on missing consent/API key**: Instead of polling forever, immediately throws with a clear error message.
28+
- **Fail-fast on no worker**: If no AI worker starts within 5s, throws immediately.
29+
30+
### Run All — AI Model Fallback
31+
- **`run-requirements.js`**: Fixed effective model resolution for AI/Agent/Translate blocks. Now correctly defaults to `qwen-local` instead of potentially using `kokoro-tts` or other specialized models.
32+
33+
### Preflight Dialog
34+
- **Compact layout**: Widened dialog to 960px, reduced font sizes (0.76rem), tightened padding (4-8px), added text-overflow ellipsis on block names.
35+
- **`run-preflight.css`**: New stylesheet for the preflight dialog with optimized column widths for all 8 columns (#, Block Name, Type, Output Var, Model, Features, Reads, Status).
36+
- **Model accuracy**: AI/Agent/Translate blocks now correctly show their text model (e.g., Qwen 3.5) instead of the last-used model (e.g., Kokoro TTS).
37+
38+
### TTS Card UX — Separate Run/Play/Save Buttons
39+
- **▶ Run button**: Generates audio via Kokoro TTS (synthesize + store), does NOT auto-play.
40+
- **▷ Play button**: Replays the last generated audio without re-synthesizing.
41+
- **💾 Save button**: Downloads the last generated audio as WAV (unchanged).
42+
- **`generate()` function**: New `M.tts.generate()` method — sets `_generateOnly` flag, sends text to TTS worker, stores audio without playing.
43+
- **`playLastAudio()` function**: New `M.tts.playLastAudio()` method — replays stored audio from `lastAudioData`.
44+
- **TTS module API expansion**: Added `isKokoroReady()`, `isKokoroLoading()`, `initKokoro()` to `M.tts` for external status checking and loading.
45+
46+
### `waitForModelReady()` Enhancement
47+
- Now checks `M.tts.isKokoroReady()` for `kokoro-tts` model ID, in addition to AI local state and `isCurrentModelReady()`.
48+
49+
### `flushPendingRender()` (User-added)
50+
- New helper function that flushes any debounced `renderMarkdown()` before applying status badges, ensuring badges appear on the final DOM.
51+
52+
### Templates
53+
- Fixed `agents.js` template to use valid JS syntax (escaped template literals inside string concatenation).
54+
- Fixed Vite import analysis error in `agents.js` by escaping `{{@AI:` syntax inside JS strings.
55+
56+
## Files Modified
57+
- `css/run-preflight.css` — NEW: Compact preflight dialog styles
58+
- `js/run-requirements.js` — NEW: Block scanning, model resolution, requirements analysis
59+
- `js/exec-controller.js` — Pre-execution model check, logging, abort flag, TTS model switching, flushPendingRender
60+
- `js/ai-docgen-generate.js` — ensureModelReadyAsync rewrite with abort check and fail-fast
61+
- `js/ai-docgen.js` — TTS card buttons split (Run/Play/Save), click handlers
62+
- `js/textToSpeech.js` — generate(), playLastAudio(), isKokoroReady/Loading(), initKokoro, _generateOnly flag
63+
- `js/tts-worker.js` — TTS worker improvements
64+
- `js/doc-vars-panel.js` — Variable panel enhancements
65+
- `js/exec-registry.js` — Block registry updates
66+
- `js/ai-assistant.js` — AI assistant refinements
67+
- `js/templates/agents.js` — Template syntax fix
68+
- `js/templates/ai.js` — Template updates
69+
- `js/templates/creative.js` — Template updates
70+
- `js/templates/documentation.js` — Template updates
71+
- `js/templates/project.js` — Template updates
72+
- `js/templates/technical.js` — Template updates
73+
- `src/main.js` — Module loading updates

0 commit comments

Comments
 (0)