|
28 | 28 | | **🎬 Media Embedding** | Video playback via `` image syntax (`.mp4`, `.webm`, `.ogg`, `.mov`, `.m4v`); YouTube/Vimeo embeds auto-detected; `embed` code block for responsive media grids (`cols=1-4`, `height=N`); Video.js v10 lazy-loaded with native `<video>` fallback; website URLs render as rich link preview cards with favicon + "Open ↗" button | |
29 | 29 | | **🤖 AI Assistant** | 3 local Qwen 3.5 sizes (0.8B / 2B / 4B via WebGPU/WASM), Gemini 3.1 Flash Lite, Groq Llama 3.3 70B, OpenRouter — summarize, expand, rephrase, grammar-fix, explain, simplify, auto-complete; AI writing tags (Polish, Formalize, Elaborate, Shorten, Image); enhanced context menu; per-card model selection; concurrent block generation; inline review with accept/reject/regenerate; AI-powered image generation; **smart model loading UX** — cache vs download detection (📦/⬇️), HuggingFace source location display, delete cached models from browser storage; all models hosted on [`textagent` HuggingFace org](https://huggingface.co/textagent) with automatic fallback | |
30 | 30 | | **🎤 Voice Dictation** | Dual-engine speech-to-text: **Voxtral Mini 3B** (WebGPU, primary, 13 languages, ~2.7 GB) or **Whisper Large V3 Turbo** (WASM fallback, ~800 MB) with consensus scoring; download consent popup with model info before first use; 50+ Markdown-aware voice commands — natural phrases ("heading one", "bold…end bold", "add table", "undo"); auto-punctuation via AI refinement or built-in fallback; streaming partial results | |
31 | | -| **🔊 Text-to-Speech** | Hybrid Kokoro TTS engine — English/Chinese via [Kokoro 82M v1.1-zh ONNX](https://huggingface.co/textagent/Kokoro-82M-v1.1-zh-ONNX) (~80 MB, off-thread WebWorker), Japanese & 10+ languages via Web Speech API fallback; TTS card with separate ▶ Run (generate audio) / ▷ Play (replay) / 💾 Save (WAV download) buttons; hover any preview text and click 🔊 to hear pronunciation; voice auto-selection by language | |
| 31 | +| **🔊 Text-to-Speech** | Hybrid Kokoro TTS engine — 9 languages (English, Japanese, Chinese, Spanish, French, Hindi, Italian, Portuguese) via [Kokoro 82M v1.0 ONNX](https://huggingface.co/textagent/Kokoro-82M-v1.0-ONNX) (~80 MB, off-thread WebWorker), Korean, German & others via Web Speech API fallback; TTS card with separate ▶ Run (generate audio) / ▷ Play (replay) / 💾 Save (WAV download) buttons; hover any preview text and click 🔊 to hear pronunciation; voice auto-selection by language | |
32 | 32 | | **Import** | MD, DOCX, XLSX/XLS, CSV, HTML, JSON, XML, PDF — drag & drop or click to import | |
33 | 33 | | **Export** | Markdown, self-contained styled HTML, PDF (smart page-breaks, shared rendering pipeline), LLM Memory (5 formats: XML, JSON, Compact JSON, Markdown, Plain Text + shareable link) | |
34 | 34 | | **Sharing** | AES-256-GCM encrypted sharing via Firebase; read-only shared links, optional passphrase protection — decryption key stays in URL fragment (never sent to server) | |
@@ -62,7 +62,7 @@ TextAgent includes a built-in AI assistant panel with **three local model sizes* |
62 | 62 | | **Gemini 3.1 Flash Lite** | Google (free tier) | ☁️ Cloud — 1M tokens/min | 🚀 Very Fast | |
63 | 63 | | **Llama 3.3 70B** | Groq (free tier) | ☁️ Cloud — ultra-low latency | ⚡ Ultra Fast | |
64 | 64 | | **Auto · Best Free** | OpenRouter (free tier) | ☁️ Cloud — multi-model routing | 🧠 Powerful | |
65 | | -| **Kokoro TTS (82M)** | Local (WebWorker) | 🔒 Private — English + Chinese · ~80 MB | 🔊 Speech | |
| 65 | +| **Kokoro TTS (82M)** | Local (WebWorker) | 🔒 Private — 9 Languages · ~80 MB | 🔊 Speech | |
66 | 66 | | **Voxtral STT (3B)** | Local (WebGPU) | 🔒 Private — 13 languages · ~2.7 GB | 🎤 Dictation | |
67 | 67 | | **Granite Docling (258M)** | Local (WebGPU/WASM) | 🔒 Private — document OCR · ~500 MB | 📄 Document | |
68 | 68 | | **Florence-2 (230M)** | Local (WebGPU/WASM) | 🔒 Private — OCR + captioning · ~230 MB | 📷 Vision | |
@@ -479,7 +479,7 @@ TextAgent has undergone significant evolution since its inception. What started |
479 | 479 | | **2026-03-12** | `f7ca256` | 🎤 **Voxtral STT** — [Voxtral Mini 3B](https://huggingface.co/textagent/Voxtral-Mini-3B-2507-ONNX) as primary speech-to-text engine on WebGPU (~2.7 GB, q4, 13 languages, streaming partial output via `TextStreamer`); Whisper Large V3 Turbo as WASM fallback (~800 MB, q8); `voxtral-worker.js` new WebWorker with `VoxtralForConditionalGeneration` + `VoxtralProcessor`; `speechToText.js` WebGPU detection + dual-worker routing; download consent popup (`showSttConsentPopup`) with model name/size/privacy info before first download; `STT_CONSENTED` localStorage key; model duplicated to `textagent/` HuggingFace org with `onnx-community/` fallback | |
480 | 480 | | **2026-03-12** | `0f58296` | 🛡️ **Code Audit Fixes** — sandboxed `jsAdapter` in `exec-sandbox.js` (was raw `eval()` on main thread, now iframe-sandboxed); `mirror-models.sh` model IDs updated to `textagent`, Kokoro v1.0→v1.1-zh, GitLab refs removed; Whisper speech worker forwarded user's language selection instead of hardcoded English; shared `ai-worker-common.js` module extracts `TOKEN_LIMITS` + `buildMessages()` from 3 workers; cloud workers load as ES modules | |
481 | 481 | | **2026-03-12** | `591467b` | 🏠 **Model Hosting Migration** — all 7 ONNX models (Qwen 3.5 0.8B/2B/4B, Qwen 3 4B Thinking, Whisper Large V3 Turbo, Kokoro 82M v1.0/v1.1-zh) duplicated to self-owned [`textagent` HuggingFace org](https://huggingface.co/textagent); model IDs updated from `onnx-community/` to `textagent/` across all workers; automatic fallback to `onnx-community/` namespace if textagent models unavailable; GitLab mirror removed from runtime code | |
482 | | -| **2026-03-12** | `7b9f846` | 🔊 **Kokoro TTS** — hybrid text-to-speech engine: English/Chinese via [Kokoro 82M v1.1-zh ONNX](https://huggingface.co/textagent/Kokoro-82M-v1.1-zh-ONNX) (~80 MB, off-thread WebWorker via `kokoro-js`), Japanese & 10+ languages via Web Speech API fallback; hover preview text → click 🔊 for pronunciation; voice auto-selection by language; `textToSpeech.js` main module + `tts-worker.js` WebWorker + `tts.css` styling; model-hosts.js for configurable hosting with auto-fallback | |
| 482 | +| **2026-03-12** | `7b9f846` | 🔊 **Kokoro TTS** — hybrid text-to-speech engine: 9 languages (English, Japanese, Chinese, Spanish, French, Hindi, Italian, Portuguese) via [Kokoro 82M v1.0 ONNX](https://huggingface.co/textagent/Kokoro-82M-v1.0-ONNX) (~80 MB, off-thread WebWorker via `kokoro-js`), Korean, German & others via Web Speech API fallback; hover preview text → click 🔊 for pronunciation; voice auto-selection by language; `textToSpeech.js` main module + `tts-worker.js` WebWorker + `tts.css` styling; model-hosts.js for configurable hosting with auto-fallback | |
483 | 483 | | **2026-03-12** | `7b9f846` | 📷 **OCR Tag** — new `{{@OCR:}}` document tag for image-to-text extraction; amber-accented card with mode pills (Text/Math/Table); 📎 image upload with `@upload:` editor sync; Qwen model default; vision-capable model flags (`supportsVision`) on Qwen 3.5 Flash, 35B-A3B, and DeepSeek V3.2 | |
484 | 484 | | **2026-03-12** | `7b9f846`, `1ec8b90` | 🏗️ **Model Architecture** — ai-worker.js refactored for architecture-aware loading (`qwen3` text-only vs `qwen3_5` vision); `setModelId` accepts `architecture` + `dtype` params; automatic fallback to HuggingFace when primary host fails; `moonshine-medium-worker.js` deleted (replaced by unified `speech-worker.js`); Language Learning template with TTS pronunciation tips; SQLite-compatible SQL in Technical template | |
485 | 485 | | **2026-03-11** | `7b9f846` | ▶ **Run All Notebook Engine** — one-click `▶ Run All` button executes every code/tag block in document order; 11 runtime adapters (bash, math, python, html, js, sql, docgen-ai, docgen-image, docgen-agent, api, linux-script); Block Registry with FNV-1a stable IDs; Execution Controller with fixed-bottom progress bar, per-block status badges (pending/running/done/error), and abort support; SQLite `_exec_results` context store for cross-block data sharing; DocGen/API adapters use auto-accept mode (skip review panel); Linux adapter submits to Judge0 CE; deferred adapter queue for module loading order; `exec-engine.css` styling; 12 new Playwright tests (191 total) | |
|
0 commit comments